Data sources |
A systematic search on articles published between Jan 1, 1965, and Oct 30, 2018. in the databases Embase, PubMed, Global Index Medicus, Popline, and Web of Science.
Following full text review, we extracted data from each study using the following variables: study characteristics (study and sample collection dates, study locations i.e., city, subnational [an area, region, state, or province in a country], or national level), participant characteristics (age range, sex, year, and population group), and prevalence of the HBV marker, type of laboratory tests, and number of participants the HBV marker prevalence was based on.
Data of eligible articles were entered into a Microsoft EXCEL® and/or Distiller databank by two reviewers independently. Information was extracted for author name, year, age, gender, marker, laboratory test used, number of individuals tested, prevalence of each marker when reported, the population group (general population, HCWs, or blood donors) and whether the data reported was for a city, sub-national (an area, region, state or province in a country) or national level, GDP per capita. In addition to HBsAg, HBeAg was recorded, as available for individuals when HBsAg was also reported. In order to record information on methodological quality and study bias resulting from non-representativeness, an additional variable was used: samples likely to be representative for the country/area specified were coded as 0 and others, e.g. convenience samples in certain communities or tribes in the country were assigned a 1, supplemented by additional information. The risk of bias/non-representativeness information was applied if the population was neither HCW nor blood donor (see description below).3 In the following, variables extracted from the studies and assumptions made are described in detail:
- Author, Date
- Year start/end of study conduct: Year of study begin and end was extracted. If this information was not available from the studies, we used the commonly used assumption that the study was conducted two years prior to the year of publication (e.g. author, 2000, year of study conduct: 1998).
- Sex: Sex-specific values were extracted. If only an overall (all) estimate was provided, the share of females in the study was specified in the column additional information.
- Age start/end: The most specific age-group provided by the data was extracted. If the age-group on which the parameter value was based on was not available, assumptions were made based on the context of the study. Therefore, the following was applied in case of missing information on age-groups in the study population:
- If the study was conducted in the general population without further specification and if only one prevalence estimate is provided, the age-group was considered to be 0-85 years. Subsequently, if the beginning and last age-group is missing, the lower value of the youngest age-group is 1 year, the upper value of the oldest age-groups is 85 years.
- If the study was conducted among adult populations but no age-range is provided, the age-group is considered to be 17-65 years.
- If the study was conducted among pupils but no age-range is provided, the age-group is considered to be 5-15 years.
- If the study was conducted among pregnant women but no age-range is provided, the age-group is considered to be 15-49 years (reproductive age).
- If the study was conducted among blood donors but no age-range is provided, the age-group is considered to be 17-65 years.
- If the study was conducted among army recruits or soldiers but no age-range is provided, the age-group is considered to be 18-45 years.
- If the study was conducted among the working population but no age-range is provided, the age-group is considered to be 16-65 years.
- HBsAg Prevalence: The most specific prevalence estimate provided by the data was extracted (defined by age-/sex-/year-prevalence). Separate lines for each marker were used in the data extraction file (e.g. one for HBeAg and one line for HBsAg, even if the study group/publication was the same)
- HBeAg Prevalence (optional marker): The most specific prevalence estimate (defined by age-/sex-/year-prevalence) of HBeAg among HBsAg-positive individuals was extracted and, if applicable was calculated to reflect prevalence among HBsAg carriers.
- anti-HBc Prevalence (optional marker): The most specific prevalence estimate provided by the data was extracted (defined by age-/sex-/year-prevalence).
- Laboratory method: Testing immune response markers of HBV infection began in the 1970s by counter-immuno-electrophoresis technique (CIEP). Since then, different detection methods have been developed (RIA, EIA, …). The most applied method in prevalence studies is the ELISA (enzyme-linked immunosorbent assay). Five categories were established to record the method/test used for prevalence detection in the studies: ELI new (ELISA -2, -3, EIA, …), EIA old (CMIA, CIEP, RPHA), NAT (qPCR/real-time PCR, nested PCR, multiplex PCR), other (e.g. RIA); Unknown/not specified.
- Country: Country names were recorded according to www.who.int and, for additional analysis purpose, were grouped according to the six WHO regions: the African Region, the Region of the Americas, the Eastern Mediterranean Region, the European Region, the South East-Asia Region and the Western Pacific Region.
- Sample size of individuals blood drawn from; of individuals involved in analyses/bases for parameter estimate: As a quality indicator of the study, we distinguished the effective sample size, i.e. the number of individuals involved in the analysis/on which the parameter estimate is based on, from the number of individuals from which blood was drawn from (separate column) and the initially calculated/planed sample size (separate column).
- Population: Although focus was on the general population, two additional groups were included and specified. These include: HCW and blood donor (plus subgroups unspecified, paid, unpaid/voluntary). If in this column “population” was specified as HCW or blood donor and not as general population, the risk of bias column (following) remains empty.
- Level: Information is provided if the study was conducted on a national, sub-national, city level or if the level was not further specified (four categories).
- Study Location: This free-text variable specifies the city/area within the country where the included study was conducted. The variables/columns Level and Study Location were additionally included following the WHO Meeting on Impact of Hepatitis B Vaccination at WHO, Geneva, in March 2014.
Additional data from other sources than the eligible studies:
- Year of vaccine introduction in the entire country: data is derived from official reports by WHO Member States and unless otherwise stated, data is reported annually through the WHO/UNICEF joint reporting process. http://www.who.int/entity/immunization/monitoring_surveillance/data/year_vaccine_introduction.xls?ua=1
- Period when the study was conducted: pre- vaccination or post vaccination. This is determined according the year of introduction in the whole country.
- Coverage estimates series: data is obtained from WUENIC: http://apps.who.int/immunization_monitoring/globalsummary/timeseries/tswucoveragebcg.html
- GDP per capita was used form UN data that compiles information from the World Bank Source http://data.un.org/Data.aspx?q=GDP&d=SNAAMA&f=grID%3a101%3bcurrID%3aUSD%3bpcFlag%3a1 ),
- Longitude and latitude data (source: www.google.com).
- Population structure and size data for each country was from the UN population division:
http://www.un.org/en/development/desa/population/
|
Method of computation |
The data was modelled using a Bayesian logistic regression looking at the proportion of individuals that tested positive for HBsAg in each study, weighting each study by its size and using a conditional autoregressive (CAR) model accounting for spatial and economic correlations between similar countries. This model uses data from well sampled countries to estimate prevalence in more data poor countries with effects such as sex, age and vaccination status, these are also informed by the geographic and countries GDP proximity to other countries (CAR model). Under the assumption that countries that are close together economically and/or geographically will have more similar prevalence due to similar social structure and health care capabilities.
The response variable in the model was the prevalence of Hepatitis surface antigen (HBsAg) with the explanatory variables being age (three categories, under 5, juvenile (5-15) and adult (16+), split using the average age of participants in the study), sex (proportion female in the study), study bias (e.g. a high fraction of study participants from indigenous populations), 3 dose vaccine coverage, birth dose of the vaccine and country of study. The coverage of routine 3 dose vaccination and birth dose vaccination in each study was calculated by cross referencing the year of and age of participants in each study with the corresponding WHO-UNICEF vaccine coverage estimates for that country. The WHO-UNICEF estimates are annual data for the country as a whole, and did not contain information on vaccine efficacy which was not used in the analysis as no data on this was obtained. The vaccine efficacy would be implicitly estimated in the analysis as we see vaccination having a variable effect across time and space across the studies. The coverage of routine 3 dose vaccination and birth dose vaccination in each study was calculated by cross referencing the year of and age of participants in each study with the corresponding WHO-UNICEF vaccine coverage estimates for that country. The coverage of routine 3 dose vaccination and birth dose vaccination in each study was calculated by cross referencing the year of and age of participants in each study with the corresponding WHO-UNICEF vaccine coverage estimates for that country. More explicitly, the model uses the ages and timing of the study to calculate the years across which the participants are born, so if the if there was an age group range of 10-15 in a study that was undertaken in 2015, the birth years would be from 2000-2005, we then average the vaccination coverage from the WHO-UNICEF estimates across those 5 years assuming that each age was evenly represented in that age group in the study. The same process was used for the 3 dose and birth dose vaccination.
The general logistic model equation is described below,
Yi ~Binomial (πi, Ni), logπi1−πi= β0+ ∑j=1pβjxij+ui
Where βj are the fixed effects of the explanatory variables xii. With the spatial random effects described by
ui~ N(u−i,σ2u/ni)
,
where,
u−i= ∑j ∈ neigh(i)wiuj/ni
Where ni is the number of neighbours for country i and weights wi, are 1.
The model was simulated in the Bayesian statistical package WinBUGS, and data manipulation and model initialisation run from R (3.3.1) using R2WinBUGS. The model considers the parameters of age, sex, study bias (e.g. a high fraction of study participants from indigenous populations), vaccine coverage, birth dose of the vaccine and country of study.
The model uses the CAR-normal function, in WinBUGS, to model the spatial and economic autocorrelation related to neighbouring countries. For each country that had prevalence data, a weighted central position was calculated using the size and location of each study. For those countries with no data, we used the population centroid. In a novel approach, we considered 3 dimensions in the country adjacency matrix; we used the usual geographic dimensions, latitude and longitude and also combined these with the natural log of the country’s GDP per capita. This was to measure not only geographic but also the developmental proximity of countries. The adjacency matrix for the geo-economic distance gives a score between each country to every other country. Those countries which are close geographically and economically would have a low score and those further apart either geographically or economically would have a high score/distance. Therefore, those countries that are more alike will have a low score and those countries which are alike would have a high score.
The way we proportioned the geographic and economic distance to produce the adjacency matrix was then explored, this is because geographic distance may be more or less important than economic similarities. Thus, by creating a number of different adjacency matrices (not definitive) we could select the most suitable matrix that explains reality best. We normalised the geographic and GDP distance and then calculated the distance between these two normalised figures. This creates a smoothed Gaussian surface that is dependent on both spatial proximity and GDP per-capita proximity. We compared ratios of, 1:0, 1:1, 2:1, 1:2 (Geographic:GDP).
For each different adjacency matrix, we also had to select a neighbourhood distance, i.e. over what distance can a country be effected by another. Thus, we also varied the radius of distance from which to select neighbours for the neighbourhood network, we used the maximum minimum distance, twice the maximum minimum and three times the maximum minimum, thus varying the number of neighbours each country would have.
Finally, to decide the magnitude of the effect one country has on another in the neighbourhood network we varied the weights of pairs of countries in the adjacency matrix, using either a neutral weighting of 1, so that each neighbour has an equal effect on each other (not dependent on the distance in the network), or decaying weights over distance with 1/distance, and 1/distance2, where the closer the country is the greater the effect it has on another country. The outcome of these 36 different combinations led to minimum DIC (Deviance Information Criterion) being found for a ratio of 1:2 (Geographic:GDP), the neighbourhood networks minimum distance being twice the maximum minimum distance and an even weighting of 1/distance for each adjacent country.
This model structure produces estimates for all fixed effects and also individual country level risk, this provides information on which are significantly at greater or lower risk to the average risk.
All parameters were given un-informative priors. Simulations were run with 3 MCMC chains with 50,000 burn in iterations and each parameter estimated from 1000 samples taken from a thinned 250,000 iterations to produce the posterior distribution. Convergence was attained, with r̂ values all very close to 1.000. Due to the Bayesian framework and WinBUGS software it was possible to gain estimates for countries where we had no data on prevalence, using their GDP and geographic proximity to inform this estimate. Those countries with the largest number of studies provided the estimates with the tightest confidence intervals and those with few or no data were less well defined, often producing a log normal distributed posterior distribution, giving estimates with long tails.
Posterior distributions of parameters were inspected for convergence and to check for covariance between parameters. Where necessary parameters were centred and scaled to N (0, 1) to aid parameter convergence and remover covariance. This was done for the sex parameter, which was entered as the proportion of the sample that was female; this was seen to co-vary with the intercept and bias parameters before re-centring and scaling. However, the covariance of routine vaccination and birth dose persisted even after re-centring. This is in part unsurprising as there a few instances where birth dose is administered without the routine vaccination. Here we tried to reduce this interaction of the terms by transforming the birth dose data. We modelled birth dose using only data where the birth dose was greater than 60, 70, 80 & 90% respectively, we also modelled birth dose to the square, thus increasing the effect of high birth doses over smaller doses. Model selection dependent on which one both reduced the covariance between the parameters and returned the lowest DIC score.
Model validation was conducted using 90% of randomly selected data against the remaining 10%, and by comparing model estimates of prevalence against observed data (Figure 3). Figure 4 shows the average prevalence in each country from all the studies plotted against the models estimate. Figure 5 shows the marginal and joint posterior distributions for the fitted parameters. Table 1 gives the estimated parameter values with associated credible intervals.
During the validation exercise (in which countries were consulted over their estimates) it was pointed out that China had undertaken three very large-scale population-based serological surveys in order to establish baseline prevalence and progress towards HBV elimination. There were a large number of other surveys from China, that are less representative than these three nationwide surveys. We conducted a sensitivity analysis by restricting the data from China to the three nationally representative surveys. The effect of this change in input data was that the effect of vaccination was more distinct, but the estimated age effects (change in prevalence in children under 5, or juveniles (children 5-15 years)) were no longer significantly different from zero (see Table 2 and Figure 6). The deviance was significantly reduced, suggesting a much better fitting model (Table 2), albeit on a somewhat reduced dataset.
|