Vaccine 30 (2012) 3986–3991Contents lists available at SciVerse ScienceDirect
Vaccine
journal homepage: www.elsevier.com/locate/vaccine
Short communication
An efficient statistical algorithm for a temporal scan statistic applied to vaccine
safety analyses夽
D.L. McClure a,∗ , S. Xu a , E. Weintraub b , J.M. Glanz a,c
a
Institute for Health Research, Kaiser Permanente Colorado, Denver, CO, United States
Immunization Safety Office, Division of Healthcare Quality and Promotion, Centers for Disease Control and Prevention, Atlanta, GA, United States
c
Department of Epidemiology, Colorado School of Public Health, Denver, CO, United States
b
a r t i c l e
i n f o
Article history:
Received 9 December 2011
Received in revised form 3 April 2012
Accepted 10 April 2012
Available online 22 April 2012
Keywords:
Temporal scan
Surveillance
Vaccine safety
a b s t r a c t
In the US, the Vaccine Safety Datalink (VSD) project, sponsored by the Centers for Disease Control and
Prevention, conducts near–real-time, population-based, active surveillance for vaccine safety. One of the
steps in analyzing signals, if there are enough cases, is to apply temporal scan statistics. The purpose is to
determine if the cases clustered in time within an overall a priori defined post-vaccination observation
interval. We presented a relatively efficient and accurate algorithm for the purely temporal scan statistic
as applied to vaccine safety investigations. It only needs SAS/BASE® software, and the algorithm is simple
enough to be programmed in another software languages. Our present work is focused on incorporating
the temporal scan statistic algorithm within our previous approach for finding an optimal risk window
for studies of vaccine safety.
© 2012 Elsevier Ltd. All rights reserved.
1. Background
In the US, the Vaccine Safety Datalink (VSD) project [1], sponsored by the Centers for Disease Control and Prevention, conducts
near–real-time, population-based, active surveillance for vaccine
safety [2]. One of the steps in analyzing signals, if there are enough
cases, is to apply temporal scan statistics [3]. The purpose is to
determine if the cases clustered in time within an overall a priori defined post-vaccination observation interval. Here we present
an efficient and accurate algorithm for finding the most-likely
subinterval cluster of cases, based on a purely temporal scan
statistic.
2. Methods
The temporal scan statistic is a function of the observed and
expected number of events in an unknown scanning window
of width w. [3–6] For a fixed total number of cases, N, is distributed within an overall time interval of length W. Under the null
hypothesis of a uniform distribution of events, i.e. no clustering, the
any particular scan window of
expected number of cases, E, within
width wi , is E = Nwi /W , with W =
w . The log–likelihood ratio
i i
of the observed number of cases, O, conditioned on N is:
LLR = O ln
O
E
+ (N − O) ln
and do not necessarily represent the official position of the Centers for Disease
Control and Prevention nor that of America’s Health Insurance Plans.
∗ Corresponding author. Tel.: +1 303 614 1255; fax: +1 303 614 1225.
E-mail address: david.l.mcclure@kp.org (D.L. McClure).
0264-410X/$ – see front matter © 2012 Elsevier Ltd. All rights reserved.
http://dx.doi.org/10.1016/j.vaccine.2012.04.040
(N − E)
(1)
In practice, the LLR statistic is maximized based on all possible
scan windows applied to the observed data within a pre-specified
range. For example, for the commonly used 42 day risk interval of
adverse events following immunization, the maximum LLR for all
windows is found by calculating the LLR according to equation 1
for all possible 902 scan windows from 1, 2,. . ., to 42 days.
Since there is no closed-form solution for the p-value function
for equation 1, [2–6] we used the method of Abrams et al. [7] to
fit a Gumbel-type distribution to 1000 LLR values simulated under
the null hypothesis of a uniform distribution, i.e. no temporal clustering. The inputs for the simulation are N, the total number of
cases and the length of the overall interval. From the 1000 LLR’s, we
applied the mean, LLR, and standard deviation, LLR to the approximate p-value function:
p = 1 − exp(− exp(−
夽 Disclaimer: The findings and conclusions in this report are those of the authors
(N − O)
LLR − a
b
)
(2)
where
a = LLR − b, ≈ 0.57721 (Euler’s constant), and b =
LLR 6/.
To benchmark the p-value approximation, we compared them
to p-values generated by simulation [4,7]. For various sets of N and
W, we simulated p-values from 1 million iterations of the LLR scan
D.L. McClure et al. / Vaccine 30 (2012) 3986–3991
3987
1.00E+00
1.00E-01
1.00E-02
1.00E-03
P Value
1.00E-04
lognormal
1.00E-05
simulated
simulated
gumbel
1.00E-06
lognorma l
gumbel
1.00E-07
normal
gamma
1.00E-08
weibull
normal
gamma
weibull
1.00E-09
1.00E-10
1.00E-11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
Log Likelihood Ratio
Fig. 1. Selected values from 1 million iterations of simulated p-values verses p-values from fitted gumbel, lognormal, normal, gamma and weibull distributions of 1000
iterations. For 100 total cases and 42 day overall window.
statistic under the null hypothesis of a uniform distribution. The
simulated p-value is based on the rank (from highest to lowest)
of the generated LLR’s. We calculated the absolute percent bias of
the natural log of the simulated p-values to those fit from the normal, lognormal, gamma and Weibull distributions at each LLR. We
then reported the mean absolute bias over the span of simulation
parameters for each fitted distribution.
All calculations and simulations were accomplished in SAS/BASE
9.1® software for PC Windows® . The corresponding SAS code is
presented in Appendix A.
3. Results
Over the span of the simulation parameters, from 1 to 18 LLR,
for 10,100, and 1000 total cases and 21, 42, and 84 day overall windows, the mean absolute bias of the Gumbel distributed p-values
was within 5% overall of the simulated values. For the other distributions, the mean absolute bias was 10%, 120%, 22%, and 85% for
the lognormal, normal, gamma, and Weibull distributions, respectively.
Fig. 1 illustrates a typical trend of p-values vs. LLR for the simulation and distributions. All distributions followed the simulated
p-values within 10% for LLR’s less than 6. The Gumbel distribution
presented a closer fit for LLR’s greater than 6.
For the temporal scan algorithm using the Gumbel approximation, the execution time in SAS® was ten seconds or less (Windows
XP® desktop PC, 3 GHz, 3.5 GB RAM).
4. Conclusions
We presented a relatively efficient and accurate algorithm
for the purely temporal scan statistic as applied to vaccine
safety investigations. It only needs SAS/BASE® software, and
the algorithm is simple enough to be programmed in another
software languages. Our present work is focused on incorporating the temporal scan statistic algorithm within our previous
approach for finding an optimal risk window for studies of vaccine
safety [8].
Acknowledgements
We thank Dr. Martin Kulldorff of the Department of Population
Medicine, Harvard Medical School and Harvard Pilgrim Health Care
Institute, for several helpful comments. This study was supported
by the Centers for Disease Control and Prevention via contract 2002002-00732 (the Vaccine Safety Datalink Project) with America’s
Health Insurance Plans.
3988
D.L. McClure et al. / Vaccine 30 (2012) 3986–3991
Appendix A.
SAS code for temporal scan statistics of events following vaccination.
D.L. McClure et al. / Vaccine 30 (2012) 3986–3991
3989
3990
D.L. McClure et al. / Vaccine 30 (2012) 3986–3991
D.L. McClure et al. / Vaccine 30 (2012) 3986–3991
References
[1] Baggs J, Gee J, Lewis E, Fowler G, Benson P, Lieu T, et al. The vaccine safety
datalink: a model for monitoring immunization safety. Pediatrics 2011;127(May
(Suppl. 1)):S45–53.
[2] Yih WK, Kulldorff M, Fireman BH, Shui IM, Lewis EM, Klein NP, et al. Active
surveillance for adverse events: the experience of the vaccine safety datalink
project. Pediatrics 2011;127(May (Suppl. 1)):S54–64.
[3] Kulldorff M, Nagarwalla N. Spatial disease clusters: detection and inference.
Statistics in Medicine 1995;14:799–810.
3991
[4] SaTScan v9.1.1: software for the spatial and space-time scan statistics. Available
at: www.satscan.org. [Accessed April 22, 2011].
[5] Naus J. The distribution of the size of the maximum cluster of points on a line.
Journal of the American Statistical Association 1965;60:532–8.
[6] Naus J, Wallenstein S. Temporal surveillance using scan statistics. Statistics in
Medicine 2006;25(2):311–24.
[7] Abrams AM, Kleinman K, Kulldorff M. Gumbel based p-value approximations for
spatial scan statistics. International Journal of Health Geographics 2010;9:61.
[8] Xu S, Zhang L, Nelson JC, Zeng C, Mullooly J, McClure D, et al. Identifying optimal
risk windows for self-controlled case series studies of vaccine safety. Statistics
in Medicine 2011;30(7):742–52.
Reproduced with permission of the copyright owner. Further reproduction prohibited without
permission.
11/28/22, 1:57 PM
Using Public Health Data to Meet Community Needs Scoring Guide
Using Public Health Data to Meet Community Needs Scoring Guide
Due Date: End of Unit 8.
Percentage of Course Grade: 20%.
Note: Your instructor may also use the Writing Feedback Tool to provide feedback on your writing. In the tool, click on the
linked resources for helpful writing information.
CRITERIA
NONPERFORMANCE
BASIC
PROFICIENT
DISTINGUISHED
Establish priorities
for quality
performance
improvement.
15%
Does not
develop
priorities for
quality
performance
improvement.
Develops priorities for
quality performance
improvement, but
does not establish a
foundation or
reasoning for
selecting the priorities.
Establishes priorities
for quality
performance
improvement.
Establishes priorities for
quality performance
improvement and identifies
assumptions underlying the
proposed priorities.
Identify criteria for
priority assessment
using current best
practices, such as
using data available
from past
performance to
assess gaps;
involving
stakeholders for
input; et cetera.
15%
Does not
identify criteria
for priority
assessment.
Identifies criteria for
priority assessment,
but does not use
current best practices,
such as using data
available from past
performance to
assess gaps; involving
stakeholders for input;
et cetera.
Identifies criteria for
priority assessment
using current best
practices, such as
using data available
from past
performance to
assess gaps;
involving
stakeholders for
input; et cetera.
Identifies criteria for priority
assessment using current best
practices, such as using data
available from past
performance to assess gaps;
involving stakeholders for
input; et cetera, and evaluates
the appropriateness of the
criteria for priority
assessment.
Analyze current best
practices for
leadership, team
development, and
conflict resolution
related to quality
improvement teams.
15%
Does not
identify current
best practices
for leadership,
team
development,
and conflict
resolution
related to
quality
improvement
teams.
Identifies current best
practices for
leadership, team
development, and
conflict resolution
related to quality
improvement teams,
but does not analyze
the current best
practices.
Analyzes current
best practices for
leadership, team
development, and
conflict resolution
related to quality
improvement teams.
Analyzes current best
practices for leadership, team
development, and conflict
resolution related to quality
improvement teams, and
identifies assumptions on
which the analysis is based.
Identify behaviors
that support quality
performance
improvement.
15%
Does not
identify
behaviors that
support quality
performance
improvement.
Identifies behaviors,
but not all behaviors
identified support
quality performance
improvement.
Identifies behaviors
that support quality
performance
improvement.
Identifies behaviors that
support quality performance
improvement, and suggests
criteria that could be used to
evaluate the appropriateness
of the identified behaviors.
Develop a plan to
transform data into
useful information
that aids with quality
improvement and
process
improvement.
15%
Does not
develop a plan
to transform
data into useful
information
that aids with
quality
improvement
and process
improvement.
Develops a plan, but
does not explain how
it will transform data
into useful information
that aids with quality
improvement and
process improvement.
Develops a plan to
transform data into
useful information
that aids with quality
improvement and
process
improvement.
Develops a plan to transform
data into useful information
that aids with quality
improvement and process
improvement, and uses
professionally validated
criteria to evaluate the
appropriateness of the plan.
https://courserooma.capella.edu/bbcswebdav/institution/DHA/DHA8042/190100/Scoring_Guides/u08a1_scoring_guide.html
1/2
11/28/22, 1:57 PM
CRITERIA
Using Public Health Data to Meet Community Needs Scoring Guide
NONPERFORMANCE
BASIC
PROFICIENT
DISTINGUISHED
Develop strategies
that enable others to
achieve goals and
objectives in
support of quality
performance
improvement.
15%
Does not
develop
strategies that
enable others
to achieve
goals and
objectives in
support of
quality
performance
improvement.
Develops strategies,
but fails to document
how they will enable
others to achieve
goals and objectives
in support of quality
performance
improvement.
Develops strategies
that enable others to
achieve goals and
objectives in support
of quality
performance
improvement.
Develops strategies that
enable others to achieve goals
and objectives in support of
quality performance
improvement, and proposes
criteria for evaluating the
strategies.
Write content clearly
and logically, with
correct use of
grammar,
punctuation, and
mechanics.
5%
Does not write
content clearly
or logically, or
with correct
use of
grammar,
punctuation,
and
mechanics.
Writes content with
errors in clarity, logic,
grammar, punctuation,
or mechanics.
Writes content
clearly and logically,
with correct use of
grammar,
punctuation, and
mechanics.
Writes content clearly and
logically, with correct use of
grammar, punctuation, and
mechanics, and uses relevant
evidence to support a central
idea.
Correctly format
paper, citations, and
references using
APA style.
5%
Does not
format paper,
citations, and
references
using APA
style.
Formats paper,
citations, and
references using APA
style but with errors.
Correctly formats
paper, citations, and
references using
APA style. Citations
contain a few errors.
Correctly formats paper,
citations, and references using
APA style. Citations are free
from all errors.
https://courserooma.capella.edu/bbcswebdav/institution/DHA/DHA8042/190100/Scoring_Guides/u08a1_scoring_guide.html
2/2
Knight et al. BMC Health Services Research 2013, 13:200
http://www.biomedcentral.com/1472-6963/13/200
RESEARCH ARTICLE
Open Access
Evaluating maternity care using national
administrative health datasets: How are statistics
affected by the quality of data on method of
delivery?
Hannah E Knight1,2*, Ipek Gurol-Urganci1,2, Tahir A Mahmood1, Allan Templeton1, David Richmond1,
Jan H van der Meulen1,2 and David A Cromwell1,2
Abstract
Background: Information on maternity services is increasingly derived from national administrative health data. We
evaluated how statistics on maternity care in England were affected by the completeness and consistency of data
on “method of delivery” in a national dataset.
Methods: Singleton deliveries occurring between April 2009 and March 2010 in English NHS trusts were extracted
from the Hospital Episode Statistics (HES) database. In HES, method of delivery can be entered twice: 1) as a
procedure code in core fields, and 2) in supplementary maternity fields. We examined overall consistency of these
data sources at a national level and among individual trusts. The impact of different analysis rules for handling
inconsistent data was then examined using three maternity statistics: emergency caesarean section (CS) rate;
third/fourth degree tear rate amongst instrumental deliveries, and elective CS rate for breech presentation.
Results: We identified 629,049 singleton deliveries. Method of delivery was not entered as a procedure or in the
supplementary fields in 0.8% and 12.5% of records, respectively. In 545,594 records containing both data items,
method of delivery was coded consistently in 96.3% (kappa = 0.93; p < 0.001). Eleven of 136 NHS trusts had
comparatively poor consistency ( 0.98, where
stated) [18-21].
The seven method of delivery categories used in this
study represent only one possible classification. The grouping was dictated by the OPCS procedure and maternity tail
codes. A weakness of this classification is the definition of
caesarean section as either elective or emergency. The
2004 NICE guideline recommended that the urgency of a
Knight et al. BMC Health Services Research 2013, 13:200
http://www.biomedcentral.com/1472-6963/13/200
caesarean section be indicated using the Lucas/National
Confidential Enquiry into Patient Outcome and Death
(NCEPOD) classification and noted that replacing the
terms ‘emergency’ and ‘elective’ with its four grades of
urgency would aid communication between health professionals [22]. Currently, the HES database is unable to
capture this classification system.
Data quality is a concern for healthcare providers, managers and policy makers [23]. In England, the Care Quality
Commission now mandates an annual audit of data quality within NHS trusts, [24] and a recent systematic review
of coding accuracy in all types of routinely collected
hospital discharge data found that coding accuracy rates
have been improving [25]. Since 2002, the coding of primary diagnosis within HES has improved in accuracy from
73.8 per to 96.0% when compared against case notes [24].
The results of this study add to this work by addressing
concerns about the quality of HES maternity data [26].
The high level of consistency in the recording of method
of delivery overall supports its use for the construction of
national maternity statistics. Coding disagreements were
most common for the categories of emergency and elective
caesarean section. Nonetheless, overall consistency was excellent between both emergency (kappa = 0.92; p < 0.001)
and elective (kappa = 0.90; p < 0.001) caesarean section
procedure and maternity tail codes. This supports a previous conclusion that coding errors were unlikely to account
for the large variation in the rates of emergency caesarean
section observed between NHS trusts [27].
At an NHS trust level, levels of consistency were high
for the majority of organisations, which provides evidence
to support the use of HES-based quality indicators for the
purpose of comparing the performance of NHS trusts.
However, our results illustrate the importance of addressing data quality within NHS trusts with divergent coding
practices. The risk of organisations being mistakenly identified as “outliers” on performance indicators due to data
errors is well-known. Our results suggest this risk is also
increased by the sensitivity of maternity statistics to the
analysis rules used to handle inconsistent data.
The study’s results also suggest that any publishers of
maternity statistics should describe details of how data
quality was assessed and incomplete and consistent data
were handled in the analysis. In England, the Health and
Social Care Information Centre (HSCIC) publishes maternity statistics at Strategic Health Authority, NHS trust and
individual unit level annually [3]. This public body is
England's central source of health and social care information and the value of its publications on maternity services
would be enhanced if they again provided information on
the level of agreement between data in the procedure
fields and in the maternity tail.
Providing methodological information may be more
problematic for commercial companies that supply
Page 7 of 8
hospitals with comparative measures of organisational performance given the need to balance transparency with the
protection of intellectual property. Nonetheless, companies
that provide maternity benchmarking services could be
required to meet minimum standards of transparency as
part of the conditions of access to administrative health
data. Whilst national trends and local over time can be
reported as long as the definitions used by these organisations remain the same, the definitions used are still important for interpretation.
Implications
Approaches to validate the use of administrative health
data for maternity statistics commonly fall into two categories. They either check the consistency of the administrative health data against medical records [17-20,28] or
against another source of maternity data such as national
birth registers [29-31]. Such external validation studies
can be time consuming, costly and technically challenging,
as well as raising ethical and information governance issues related to access and data linkage. We used a particular feature of HES to examine its internal consistency and
this is an example of how relationships within administrative health data can be used to identify organisations with
divergent coding practices [32]. Whilst external validation
should remain the “gold standard”, this approach to data
quality assessment is simple to perform and has the potential to be developed more widely as a complementary
technique.
Abbreviations
HES: Hospital Episode Statistics; ARHQ: Agency for Healthcare Research and
Quality; IC: Information Centre; NHS: National Health Service; OPCS: Office of
Population Census and Surveys; ICD-10: International Classification of
Diseases, 10th Edition.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
DC, JvdM, TM, DR and AT conceived the idea. HK, DC, IG-U and JvdM
designed the methodology. HK, IG-U and DC conducted the statistical
analysis. HK wrote the manuscript. DC, JvdM, IG-U, TM, DR and AT
commented on subsequent drafts and approved the final version. All
authors read and approved the final manuscript.
Acknowledgements
We thank the Department of Health for providing the patient-level Hospital
Episode Statistics data used in this study. Permission to use this data was
granted by the NHS Information Centre. National maternity statistics and
provider-level data are available via the HESonline website: http://www.hscic.
gov.uk/hes.
Received: 12 December 2012 Accepted: 27 May 2013
Published: 30 May 2013
References
1. Roberts CL, Cameron CA, Bell JC, Algert CS, Morris JM: Measuring maternal
morbidity in routinely collected health data: development and validation
of a maternal morbidity outcome indicator. Med Care 2008, 46:786–794.
2. Agency for Healthcare Research and Quality. 2012. http://www.
qualityindicators.ahrq.gov/.
Knight et al. BMC Health Services Research 2013, 13:200
http://www.biomedcentral.com/1472-6963/13/200
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
Hospital Episode Statistics. 2013. http://www.hscic.gov.uk/hes.
Raleigh VS, Cooper J, Bremner SA, Scobie S: Patient safety indicators for
England from hospital administrative data: case–control analysis and
comparison with US data. BMJ 2008, 17(337):a1702.
Lain SJ, Hadfield RM, Raynes-Greenow CH, Ford JB, Mealing NM, Algert CS,
Roberts CL: Quality of data in perinatal population health databases: a
systematic review. Med Care 2012, 50(4):e7–e20.
Dr Foster Intelligence. 2012. http://drfosterintelligence.co.uk/.
CHKS. 2012. http://insight.chks.co.uk/index.php?id = 829.
BirthChoiceUK. 2012. http://www.birthchoiceuk.com.
RCOG: Patterns of Maternity Care in English NHS Hospitals. London: RCOG;
2013. http://www.rcog.org.uk/files/rcog-corp/Patterns%20of%20Maternity%
20Care%20in%20English%20NHS%20Hospitals%202011-12_0.pdf.
Health and Social Care Information Centre. NHS Maternity Statistics 2005–6;
2013. http://www.hscic.gov.uk/pubs/maternity0506.
Health and Social Care Information Centre. NHS Maternity Statistics 2010–11
Explanatory notes; 2013:pp22–pp26. Available at: http://www.hscic.gov.uk/
pubs/maternity1011.
Petrie A, Sabin C: Medical Statistics at a Glance, Volume 39. 3rd edition.
Oxford: Wiley-Blackwell Publishing; 2009:118.
Spiegelhalter DJ: Funnel plots for institutional comparison. Qual Saf Health
Care 2002, 11:390–391.
Spiegelhalter DJ: Funnel plots for comparing institutional performance.
Stat Med 2005, 24:1185–1202.
Bland JM, Altman DG: Statistical methods for assessing agreement between
two methods of clinical measurement. Lancet 1986, 327:307–310.
Office for National Statistics: Births in England and Wales by parents' country
of birth, 2010: Table 7. 2012. http://www.ons.gov.uk/ons/rel/vsob1/parents–
country-of-birth–england-and-wales/2010/births-in-england-and-wales-byparents–country-of-birth–2010.html.
Kuklina EV, Whiteman MK, Hillis SD, Jamieson DJ, Meikle SF, Posner SF,
Marchbanks PA: An enhanced method for identifying obstetric deliveries:
implications for estimating maternal morbidity. Matern Child Health J
2008, 12(4):469–477.
Yasmeen S, Romano S, Schembri ME, Keyzer JM, Gilbert WM: Accuracy of
obstetric diagnoses and procedures in hospital discharge data. Am J
Obstet Gynecol 2006, 195:992–1001.
Roberts CL, Bell JC, Ford JB, Morris JM: Monitoring the quality of maternity
care: how well are labour and delivery events reported in population
health data? Paediatr Perinat Epidemiol 2009, 23:144–152.
Lydon-Rochelle MT, Holt VL, Cárdenas V, Nelson JC, Easterling TR, Gardella C,
Callaghan WM: The reporting of pre-existing maternal medical conditions
and complications of pregnancy on birth certificates and in hospital
discharge data. Am J Obstet Gynecol 2005, 193(1):125–134.
Korst LM, Gregory KD, Gornbein JA: Elective primary caesarean delivery:
accuracy of administrative data. Paediatr Perinat Epidemiol 2004, 18:112–119.
National Collaborating Centre for Women’s and Children’s Health: Caesarean
section. London: National Institute for Clinical Excellence; 2004.
Audit Commission: Data remember: improving the quality of patient-based
information in the NHS 2002. London: Audit Commission; 2012. http://
archive.audit-commission.gov.uk/auditcommission/sitecollectiondocuments/
AuditCommissionReports/NationalStudies/dataremember.pdf.
Care Quality Commission: 2012. http://www.cqc.org.uk/organisations-weregulate/registered-services/quality-and-risk-profiles-qrps.
Burns EM, Rigby E, Mamidanna R, Bottle A, Aylin P, Ziprin P, Faiz D:
Systematic review of discharge coding accuracy. J Public Health 2012,
34:138–148.
Brennan L, Watson M, Klaber R, Charles T: The importance of knowing
context of hospital episode statistics when reconfiguring the NHS.
BMJ 2012, 344:e2432.
Bragg F, Cromwell DA, Edozien LC, Durol-Urganci I, Mahmood TA,
Templeton A, van der Meulen JH: Variation in rates of caesarean section
among English NHS trusts after accounting for maternal and clinical risk:
cross sectional study. BMJ 2010, 341:c5065.
Lain S, Roberts C, Hadfield R, et al: How accurate is the reporting of
obstetric haemorrhage in hospital discharge data? A validation study.
Austral N Z J Obstet Gynecol 2008, 48:481–485.
Joseph KS, Fahey J: Validation of perinatal data in the Discharge Abstract
Database of the Canadian Institute for Health Information. Chronic Dis
Canada 2009, 29:96–100.
Page 8 of 8
30. Dattani N, Datta-Nemdharry P, Macfarlane A: Linking maternity data for
England, 2005–06: methods and data quality. Health Stat Q 2011, 49:53–79.
31. Dattani N, Datta-Nemdharry P, Macfarlane A: Linking maternity data for
England 2007: methods and data quality. Health Stat Q 2012(53):4–21.
32. Johal A, Mitchell D, Lees T, Cromwell D, Van Der Meulen J: Use of Hospital
Episode Statistics to investigate abdominal aortic aneurysm surgery. Br J
Surg 2012, 99(1):66–72.
doi:10.1186/1472-6963-13-200
Cite this article as: Knight et al.: Evaluating maternity care using
national administrative health datasets: How are statistics affected by
the quality of data on method of delivery?. BMC Health Services Research
2013 13:200.
Submit your next manuscript to BioMed Central
and take full advantage of:
• Convenient online submission
• Thorough peer review
• No space constraints or color figure charges
• Immediate publication on acceptance
• Inclusion in PubMed, CAS, Scopus and Google Scholar
• Research which is freely available for redistribution
Submit your manuscript at
www.biomedcentral.com/submit
BioMed Central publishes under the Creative Commons Attribution License (CCAL). Under
the CCAL, authors retain copyright to the article but users are allowed to download, reprint,
distribute and /or copy articles in BioMed Central journals, as long as the original work is
properly cited.
11/28/22, 1:56 PM
Discussion Participation Scoring Guide
Print
Discussion Participation Scoring Guide
Due Date: Weekly.
Percentage of Course Grade: 30%.
Discussion Participation Grading Rubric
Criteria
Non-performance
Basic
Proficient
Distinguished
Applies relevant course
Does not explain relevant
Explains relevant course concepts,
concepts, theories, or materials course concepts, theories, or theories, or materials.
correctly.
materials.
Applies relevant course
concepts, theories, or materials
correctly.
Analyzes course concepts, theories, or
materials correctly, using examples or
supporting evidence.
Collaborates with fellow
Does not collaborate with
learners, relating the discussion fellow learners.
to relevant course concepts.
Collaborates with fellow learners
without relating discussion to the
relevant course concepts.
Collaborates with fellow
learners, relating the discussion
to relevant course concepts.
Collaborates with fellow learners, relating
the discussion to relevant course concepts
and extending the dialogue.
Applies relevant professional,
personal, or other real-world
experiences.
Does not contribute
professional, personal, or
other real-world
experiences.
Contributes professional, personal, Applies relevant professional,
or other real-world experiences, but personal, or other real-world
lacks relevance.
experiences.
Applies relevant professional, personal, or
other real-world experiences to extend the
dialogue.
Supports position with
applicable knowledge.
Does not establish relevant
position.
Establishes relevant position.
Validates position with applicable
knowledge.
Supports position with
applicable knowledge.
Participation Guidelines
Actively participate in discussions. To do this you should create a substantive post for each of the discussion
topics. Each post should demonstrate your achievement of the participation criteria. In addition, you should also
respond to the posts of at least two of your fellow learners for each discussion question-unless the discussion
instructions state otherwise. These responses to other learners should also be substantive posts that contribute to the
conversation by asking questions, respectfully debating positions, and presenting supporting information relevant
to the topic. Also, respond to any follow-up questions the instructor directs to you in the discussion area.
To allow other learners time to respond, you are encouraged to post your initial responses in the discussion area by
midweek. Comment to other learners' posts are due by Sunday at 11:59 p.m. (Central time zone).
https://courserooma.capella.edu/bbcswebdav/institution/DHA/DHA8042/190100/Scoring_Guides/discussion_participation_scoring_guide.html
1/1
Hey Tutor,
You have a discussion to submit on Wednesday
and an assignment to submit on Friday. The
discussion should be single-spaced word
document on one page, and the assignment should
be 8 pages double-spaced
Please be careful how you title the 2 jobs.
Discussion Topic: Clinical Informatics in Health
Care
Clinical informatics in health care settings has helped health professionals better
understand the most efficient and effective approaches to delivering care.
Research and discuss with your peers some of the best practices in improving
health care today, including data-driven decision making and team development.
Assignment Topic: Using Public Health Data to
Meet Community Needs
Introduction
For this assignment, assume the role of a health care leader at a managed care
company. Your organization contracts with a wide array of providers. The team
approach is essential for determining future strategic planning and ensuring
quality for your providers. Recently, your organization has increased emphasis
on preventative medicine and patient education as a way of reducing
unnecessary claims and improving overall health of your managed care
participants. The organization is also working to improve on its quality assurance
guidance to its providers.
You have been assigned to head a team that will gather important public health
data and best practices for preventable diseases and thus reduce disease risk.
Your team has to first determine the approach to be taken in gathering,
analyzing, and articulating this information. To be successful in this program, the
team must be able to apply statistical methods that can help transform the data
into workable information. Ultimately, the managed care organization hopes to
equip its providers with clinical practice guidelines based on the most accurate
and current research.
The project requires that you lead a team of professionals and make
recommendations including budget recommendations to the long-term approach.
As a leader, you must understand the behaviors that foster quality improvement
and be able to apply best practices to team leadership.
Instructions
To be successful in this assignment, you need to complete the following:
Research criteria for priority assessment using current best practices, such as
using data available from past performance to assess gaps; involving
stakeholders for input; et cetera.
• Present current best practices for statistical analysis of health care data.
• Describe how statistical tools, models, and approaches are used to transform
data into workable information.
• Develop a plan to transform data into useful information that aids with
quality improvement and process improvement.
For this assignment, develop an APA-formatted executive summary. In your
summary:
•
Discuss current best practices for leadership, team development, and conflict
resolution related to quality improvement teams.
• Provide recommendations on the following:
• The criteria needed for priority assessment.
• The use of data available from past performance to assess gaps.
• How to best gather stakeholder input.
• Any other needed strategic approaches you deem as appropriate.
• Describe how your team would apply statistical approaches to ensure
providers are given the most recent, accurate, and effective information about
prevention and quality improvement.
• Make recommendations for the managed care company on how to budget and
strategize for this project.
• Develop and explain 3–5 strategic approaches that enable others to achieve
goals and objectives in support of quality performance improvement.
Pay attention to the critical elements that form part of the grading criteria for this
assignment:
•
•
•
•
•
•
•
Establish priorities for quality performance improvement.
Identify criteria for priority assessment using current best practices, such as
using data available from past performance to assess gaps; involving
stakeholders for input; et cetera.
Analyze current best practices for leadership, team development, and conflict
resolution related to quality improvement teams.
Identify behaviors that support quality performance improvement.
Develop a plan to transform data into useful information that aids with
quality improvement and process improvement.
Develop strategies that enable others to achieve goals and objectives in
support of quality performance improvement.
Writing Requirements
Your paper should meet the following requirements:
Length: 7–10 double-spaced pages (excluding the cover page and references
list). Include page numbers, headings, and running headers.
• References: A minimum of five current peer-reviewed references.
• Formatting: Use current APA style and formatting, paying particular attention
to citations and references.
• Font and font size: Times New Roman, 12 point.
Review the Using Public Health Data to Meet Community Needs Scoring Guide
to ensure you understand the grading criteria for this assignment.
•
In addition to the resources I sent you, please read on these topics from other
sources to help you do the Discussion and Assignment. Look for resources on
“Telemedicine and Clinical Informatics” This resource is from our book but I
cannot copy it so you can search for it else where.
86
European Journal of Public Health
Conflicts of interest: None declared.
5
Ancker JS, Kaufman D. Rethinking health numeracy: a multidisciplinary literature
review. J Am Med Inform Assoc 2007;14:713–21.
6
Agostinelli A, La Torre G, Chiaradia G, et al. Effetti della diversa presentazione dei
dati sulla reportistica dei reparti ospedalieri: studio pilota. [Effects of the different
data presentation on hospital wards report: a pilot study]. L’Ospedale 2006;2:44–51.
Key points
The choice of a graphical format may influence the understanding of data in the hospital–medical management.
The misinterpretation of data display format could have an
impact on health decision making.
Medical and statistical educators need to consider how to
make future health professionals able to comprehend statistical methodology that exceeds what is currently presented
in introductory courses.
7
Huff D. How to lie with statistics. New York: W.W. Norton & Company inc, 1954.
8
Saary MJ. Radar plots: a useful way for presenting multivariate health care data.
J Clin Epidemiol 2008;61:311–7.
9
Marshall T, Mohammed MA, Rouse A. A randomized controlled trial of league
tables and control charts as aids to health service decision-making. Int J Qual Health
Care 2004;16:309–15.
10 Colligan L, Anderson JE, Potts HWW, Berman J. Does the process map influence
the outcome of quality improvement work? A comparison of sequential flow
diagram and hierarchical task analysis diagram. BMC Health Services Res 2010;10:7.
11 Brundage M, Feldman-Stewart D, Leis A, et al. Communicating quality of life information to cancer patients: a study of six presentation formats. J Clin Oncol
2005;23:6949–56.
References
1
Tufte ER. Visual explanations: images and quantities, evidence and narrative.
Cheshire, Connecticut: Graphics Press, 1997.
2
Mcluhan M. Understanding media: the extensions of Man. 1st edn. McGraw Hill,
New York, 1964.
3
Miccio M. La torre di Babele. Manuale di teoria e tecnica della comunicazione.
Milano: Sperling & Kupfer, 2002.
4
Elting LS, Martin CG, Cantor SB, et al. Influence of data display formats on
physician investigator’s decisions to stop clinical trials: prospective trial with
repeated measures. Br Med J 1999;318:1527–31.
12 Pylar J, Wills CE, Lillie J, et al. Men’s interpretations of graphical information in a
videotape decision aid. Health Expect 2007;10:184–93.
13 Cox AR, Kirkham H. A case study of a graphical misrepresentation. Drug Safety
2007;30:831–6.
14 Donaldson L. Patient safety in Europe: challenges and opportunities. Ital J Public
Health 2005;2:11–5.
15 Elting LS, Bodey GP. Is a picture worth a thousand medical words? A randomized
trial of reporting formats for medical research data. Methods Inf Med 1991;30:45–50.
16 Gigerenzer G, Gaissmaier W, Kurz-Milcke E, et al. Helping doctors and patients
make sense of health statistics. Psychol Sci Public Interest 2007;8:53–96.
.........................................................................................................
European Journal of Public Health, Vol. 23, No. 1, 86–92
The Author 2012. Published by Oxford University Press on behalf of the European Public Health Association. All rights reserved.
doi:10.1093/eurpub/cks046 Advance Access published on 10 May 2012
.........................................................................................................
Studies using English administrative data (Hospital
Episode Statistics) to assess health-care outcomes—
systematic review and recommendations for reporting
Sidhartha Sinha1,2, George Peach2, Jan D. Poloniecki1, Matt M. Thompson2, Peter J. Holt1,2
1 Department of Outcomes Research, St George’s University of London, London
2 St George’s Vascular Institute, St George’s Hospital, London
Correspondence: Sidhartha Sinha, St George’s Vascular Institute, St George’s Hospital, Blackshaw Road, Tooting,
SW17 0QT, London, Tel: +447813857972, Fax: +442087253495, e-mail: sid261@hotmail.com; sidhartha.sinha@nhs.net
Background: Studies using English administrative data from the Hospital Episode Statistics (HES) are increasingly
used for the assessment of health-care quality. This study aims to catalogue the published body of studies using
HES data to assess health-care outcomes, to assess their methodological qualities and to determine if reporting
recommendations can be formulated. Methods: Systematic searches of the EMBASE, Medline and Cochrane
databases were performed using defined search terms. Included studies were those that described the use of
HES data extracts to assess health-care outcomes. Results: A total of 148 studies were included. The majority of
published studies were on surgical specialties (60.8%), and the most common analytic theme was of inequalities
and variations in treatment or outcome (27%). The volume of published studies has increased with time (r = 0.82,
P < 0.0001), as has the length of study period (r = 0.76, P < 0.001) and the number of outcomes assessed per study
(r = 0.72, P = 0.0023). Age (80%) and gender (57.4%) were the most commonly used factors in risk adjustment, and
regression modelling was used most commonly (65.2%) to adjust for confounders. Generic methodologic data
were better reported than those specific to HES data extraction. For the majority of parameters, there were no
improvements with time. Conclusions: Studies published using HES data to report health-care outcomes have
increased in volume, scope and complexity with time. However, persisting deficiencies related to both generic
and context-specific reporting have been identified. Recommendations have been made to improve these aspects
as it is likely that the role of these studies in assessing health care, benchmarking practice and planning service
delivery will continue to increase.
.........................................................................................................
Systematic review of HES studies
Introduction
n the context of health services research, administrative data refers
Ito data collected by health-care providers and insurers with the
intention of facilitating reimbursement of health-care costs.1 Such
data sets tend to contain routine demographic data along with
clinical information based on statistical coding classifications.
Although not originally intended for health-care quality
assessment, the advantages of such data sets are well documented
in the literature, notably that they usually encompass large populations, are easily available, relatively inexpensive to acquire and
amenable to computerized data extraction.1,2 However, such
advantages are countered by concerns relating to the accuracy and
completeness of clinical coding within health-care providers from
year to year, variation in data quality between providers and the bias
inherent to retrospective data sources.1,3
Hospital Episode Statistics (HES) data are the administrative data
set for English hospitals in the National Health Service (NHS) and
have been collected since 1989 (Supplementary Box S1).4
Epidemiological studies using HES data have driven significant
service changes in health-care delivery in England and are thus
considered an important source of evidence.5 Nonetheless,
on-going anecdotal concerns about the accuracy of HES data have
persisted over the last decade.6,7 This is despite a growing body of
evidence that suggests that the data are fit for purpose in certain
aspects of health-care quality assessment.8–11
However, with respect to quality of published studies using HES
data, it is noteworthy that previous systematic reviews have focused
specifically on the issue of coding accuracy.12,13 There has not been
any attempt to systematically catalogue the published body of
evidence or to assess methodological aspects of what are essentially
large retrospective observational studies.
The primary objective of this study was to catalogue the current
body of studies using HES data to assess health-care outcomes
published in the English medical literature and to assess specific
aspects of methodological quality with the aim of producing
guidelines for future researchers wishing to use HES data.
Secondary objectives were to assess whether there was any
difference in quality between studies focussing on medical or
surgical specialties and to determine whether the quality of studies
had improved over time.
Methods
The literature review conformed to Preferred Reporting Items for
Systematic Reviews and Meta-analyses statement standards.14
Literature search
An electronic search of the medical literature from 1989 to July
2011 was performed using the following databases: EMBASE
Classic, EMBASE, Ovid-MEDLINE, Ovid-MEDLINE in-process
and other non-indexed citations, the Cochrane database of
systematic reviews and the Cochrane database of controlled trials.
The HES website was searched for additional publications, and
included references were hand searched for further relevant
articles. The search strategy is listed in Supplementary Appendix
S1. Studies were included if they described the use of HES data
extracts for the purposes of describing or assessing health outcomes
and trends. Studies were excluded if they were non-English
language articles, if they described the use of English administrative
data before 1989 (i.e. before the introduction of HES), if they
described the use of local Patient Administration Data (PAS—
data abstracted directly from patient medical records by each
hospital’s coding department) data rather than centrally collated
HES data, if they duplicated the use of HES extracts from
previous published works or if they described administrative data
sets other than HES (e.g. the Scottish Morbidity Record for
87
Scotland and the Patient Episode Database for Wales). The last
exclusion criteria acknowledged the fact that although Scottish
and Welsh administrative data do share certain broad similarities
with HES, they cannot be considered identical. In particular, it has
been noted that certain policy changes relevant to health-care administrative data may have been implemented metachronously for
each of the countries.15 Review articles, letters and commentaries
were also excluded.
Data extraction
Two authors (S.S. and G.P.) extracted data from the included studies
independently. Disagreements about aspects of study quality were
discussed, and a consensus decision agreed in each case after
discussion with the senior author (P.H.). Aspects of generic methodological quality were assessed using domains derived from tools
used for assessing the quality of observational studies and based on
STROBE (‘Strengthening the Reporting of Observational Studies in
Epidemiology’) guidelines and other published works.16–20 Quality
markers specific to HES data sets were derived by discussion among
the authors and from reference to published studies.3,21,22 In keeping
with current opinion that giving an arbitrary numerical score to
aspects of study quality is problematic, quality markers were
judged to be adequate, inadequate, unclear or not applicable.23,24
The aspects of demographic, generic, statistical and HES-specific
methodological data extracted are listed in Supplementary Box S2.
Statistical analysis
Statistical analysis was performed using StatsDirect software (version
2.7.8). The Mann–Whitney two-sided U-test was used to compare
means, Fisher’s exact test to compare proportions and the Pearson’s
correlation used to assess association (P < 0.05 was considered
significant).
Results
A total of 1911 articles were retrieved through the electronic
searches. Of these, 66 articles were duplicates. The titles and
abstracts of the remaining 1845 articles were examined and 164
retained for further analysis. At this stage, a further 24 articles (7
from the HES website and 17 from reference lists) were added to the
pool. A total of 40 full-text articles were subsequently excluded
leaving 148 articles for inclusion in the review (Supplementary
Figure S1). The list of included studies is given in Supplementary
Appendix S2.
Forty-two of 148 (28.4%) studies were on medical conditions, 90/
148 (60.8%) related to surgical specialties and 16/148 (10.8%)
covered both. Cardiology and vascular surgery were the most
prolific medical and surgical specialties (Supplementary Figure
S2). In investigative themes, there was considerable overlap with
many studies having more than one focus; however, the most
common primary theme was analysis of inequalities and variations
in treatment or outcome (40/148 studies, 27%) (table 1).
Table 1 Classification of main themes pursued by HES publications
Primary theme of article
Number (%)
Inequalities/variations in treatment or outcome
Trends in hospital treatment and outcome
Volume–outcome relationship
Epidemiology of specific disease
Risk prediction
Comparison with national databases/audits
Service planning
HES data coding quality
Waiting time analysis
Outlier detection
40/148 (27)
33/148 (22.3)
20/148 (13.5)
17/148 (11.5)
13/148 (8.8)
12/148 (8.1)
4/148 (2.7)
4/148 (2.7)
3/148 (2)
2/148 (1.4)
88
European Journal of Public Health
Table 2 Aspects of reporting quality extracted from selected studies
Aspect of data
Adequate/yes (%) Inadequate/no (%) Unclear/not Not applicable/states
stated (%) not required
Title/abstract/introduction
Clearly and correctly identifies HES (rather than PAS) as source of data?
135/148 (91.2)
Methods (generic)
A priori justification for choosing study period or power calculation performed? 35/148 (23.6)
Clear selection of participants?
115/148 (77.8)
Clear definition of exposure?
122/148 (82.4)
Clear definition of outcomes?
132/148 (89.2)
Conflict of interest or source of funding declared?
105/148 (70.9)
Methods (HES specific)
Any mention of missing or invalid data?
47/148 (31.6)
Any mention of duplicate records?
21/148 (14.2)
Any linkage of episodes to form continuous spells?
40/148 (27)
Any attempt at case validation?
29/148 (19.6)
Any assessment of year-to-year data quality?
14/107 (13.1)
Any linkage to ONS-mortality data?
23/61 (37.7)a
Clear distinction between episodes, spells, admissions, procedures and patients? 95/148 (64.2)
2/148 (1.4)
11/148 (7.4)
–
113/148 (76.4)
33/148 (22.2)
26/148 (17.6)
16/148 (10.8)
43/148 (29.1)
–
–
–
–
–
–
–
–
–
–
101/148 (68.2)
127/148 (85.8)
91/148 (61.5)
115/148 (77.7)
93/107 (86.9)
36/61 (59)
53/148 (35.8)
–
–
–
–
17/148 (11.5)
–
4/148 (2.7)
–
–
41/148 (27.7)
2/61 (3.3) 87/148 (58.9)
–
–
a: Fifteen of 47 (31.9%) for studies published from 2006 onwards.
Table 3 Aspects of statistical adjustment for confounding variables
in HES studies
Statistical aspects
Adjustment for confounding variables
Yes
No
Unclear
Not applicable
Source of data for confounding variables
HES only
HES + other (routine and/or clinical)
Clinical only
Unclear
Methodology to adjust for confounding variables
Regression
Stratification
Other/unclear
Aspects of regression modelling
Use of interaction terms to identify non-constant
risk relations
Automated selection of covariates for inclusion in model
Multilevel hierarchical modelling used
Number (%)
115/148 (77.7)
28/148 (18.9)
1/148 (0.7)
2/148 (1.4)
89/115 (77.4)
23/115 (20)
2/115 (1.7)
1/115 (0.9)
75/115 (65.2)
36/115 (31.3)
4/115 (3.5)
17/75 (22.7)
18/75 (24)
14/75 (18.7)
There was a positive trend between volume of published studies
and completed calendar year (Pearson’s r = 0.82, P < 0.0001)
(Supplementary Figure S3). The study period for which HES data
were extracted was not clear in 5/148 studies (3.4%). Among the
remaining 143 studies, the mean study period was 4.67 years
[standard deviation (SD): 3.38 years, median: 5 years, Interquartile
range (IQR): 1–7]. The number of records extracted from the HES
was not stated or unclear in 29/148 (19.6%) studies. Of the remaining 119 studies, the mean number of HES records extracted per
study was 857 437.9 (SD: 3 478 456.6, median: 40 480, IQR:
7516–227 206). There was a significant positive correlation
between mean length of study period and year (Pearson’s r = 0.76,
P < 0.001) but a negative correlation between the mean number of
HES records extracted per study and year (Pearson’s r =0.36,
P = 0.193).
In keeping with the varied themes of investigation, there was great
variability in the types and numbers of outcomes and indicators
assessed by studies. Several studies derived multiple outcomes
from HES data with the mean number per study 1.7 (SD: 1.19,
median: 1, IQR: 1–2, maximum 8). There was a positive correlation
between mean number of outcomes reported by studies and year
(Pearson’s r = 0.72, P = 0.0023). Sixty-one of 148 (41.2%) studies
reported measures of mortality, and 6 different definitions of this
end point were derived [in-hospital (38.4%), 30-day in-hospital
(23.3%), 30-day all-cause (9.6%), 90-day all-cause (2.7%), 1-year
all-cause (15.1%) and 18-month all-cause mortality (1.4%)]. In
9.6% of cases, the period for mortality was not clearly defined.
Only 23/61 (37.7%) studies [15/47 (31.9%) for studies published
from 2006 onwards] deriving mortality data from the HES
attempted to link data to the Office for National Statistics (ONS)
mortality database to improve consistency of case fatality rates
(table 2). Four definitions of emergency readmission were encountered [7 days (5.5%), 28 days (72.2%), 30 days (5.5%) and 1 year
(16.8%)], and this outcome was most frequently treated as binary
(77.7%) rather than count data. Survival analysis for determination
of emergency readmissions or re-interventions was used in only 6/18
[33.3%] of studies. Alternative metrics such as discharge destination
were used in only 1/148 (0.7%) studies, whereas 12/148 (8.1%)
studies attempted to derive measures of care quality by extracting
data related to complications of care (operative re-interventions,
general medical complications or patient safety indicators).
Generic aspects of methodological quality were better reported
than HES-specific aspects (table 2). In 13/148 (8.8%) of studies,
the distinction between HES and PAS data was either unclear or
incorrect, and it is noteworthy that 3/40 studies were excluded at
the article selection stage for similarly confusing HES and PAS data.
Of the 35/148 (23.6%) of studies justifying the study period, 34/35
(97%) provided a valid clinical context, whereas 1/35 (3%)
performed a post hoc power calculation. Issues relating to missing
or invalid data were described in 47/148 (31.2%) studies, but only
30/47 (63.8%) described how these data were actually handled.
Fifteen of 30 (50%) studies described imputation, use of ‘grossing’
factors,21 censoring or sensitivity analyses to deal with the issue of
missing data, whereas 15/30 (50%) simply excluded such data. An
attempt at case validation was made in 29/148 (19.6%) studies with
the source of data for validation being locally held clinical data (e.g.
medical notes, operating theatre records or PAS data) (14/29,
48.3%), national clinical databases (14/29, 48.3%) or unclear (1/
29, 3.4%).
The majority of the studies (115/148, 77.5%) addressed the issue
of confounding variables, and the majority of these (77.4%) used the
HES database as the sole source for identification of covariates
(table 3). Other data sets commonly used in conjunction with
HES data were either routinely available (census data) or derived
from clinical databases. The HES-derived covariates most commonly
used for control of confounding (or risk adjustment) were age
(92/115 studies, 80%) and gender (66/115 studies, 57.4%)
Systematic review of HES studies
(Supplementary Figure S4). Twenty-nine of 115 (25.2%) studies
used a HES-derived measure of patient co-morbidity, and the
score described by Charlson (or modifications thereof) was most
commonly used (19/29, 65.5%). There was a significant positive
correlation between the mean number of covariates used in
models and year of publication (Pearson’s r = 0.785, P = 0.0005).
Regression modelling was used more commonly than stratification,
but it is noteworthy that a significant proportion of such studies
used automated statistical algorithms to select covariates for
inclusion in regression models (24%), whereas the methodology
used to select covariates was unclear in further 4/75 (5.3%)
studies. The majority (77.3%) of the studies did not look for
evidence of non-constant risk relationships between covariates,
and, in particular, only 9/41 (21.9%) studies that used mode of
admission or HES-derived co-morbidity scores as covariates were
tested for interactions. Multilevel hierarchical modelling to take
account of ward- or hospital-level variables was used in only 14/75
(18.7%) of studies, all of which were published after 2002.
The studies were dichotomised into two cohorts relating to publication year 1994–2000 (‘early’ cohort, n = 21) and 2001–11 (‘recent’
cohort, n = 127) for comparison of methodological qualities, and the
only significant difference found was in the proportions of studies
clearly and correctly identifying HES as the source of data (early
cohort 66.6%, recent cohort 95.3%, P = 0.0004). Regression
modelling was more commonly used by studies in the latter group
(early cohort 40%, recent cohort 54%, P = 0.035). No differences
were found between cohorts when studies were divided by
specialty into medical (n = 42) and surgical (n = 90) except that
papers on surgical specialties tended to use a greater number of
covariates in statistical models (surgical cohort mean number of
covariates 3.6, medical cohort mean number of covariates 2.29,
P = 0.003).
Discussion
Our results suggest that published studies using HES data extracts
have increased in volume with time and are becoming more
ambitious in terms of scope (e.g. years of data and numbers of
outcomes extracted) and more sophisticated in their analytical
methodology (e.g. more selective use of HES data extracts and use
of multilevel hierarchical modelling). However, a number of
deficiencies in reporting standards have been identified which
hamper interpretation and inter-study comparability.
Despite concerns with the nature of routine data available from
administrative databases, it is recognized that there are several
scenarios in the assessment of health-care outcomes for which prospective and randomised studies would be impractical or inadequate.25 Consequently, and as demonstrated by this systematic
review, the volume of studies published using routine data has
increased significantly with time. With specific reference to
English administrative data, concerns have primarily related to
issues with data coding accuracy and completeness.4,6,26 However,
it is noteworthy that sequential systematic reviews of English administrative data coding accuracy covering the period from 1989 to
2010 have found sufficiently high levels of accuracy for procedural
and diagnostic coding with evidence of temporal improvement to
support their use in research.12,13 Furthermore, HES data allow
population outcomes to be defined outside of specialist units
participating in randomised trials or voluntarily submitting to
clinical databases, thus facilitating analysis of ‘real-world’ results.
Nonetheless, criticisms of HES data persist and it is recognized
that inter-provider variation in coding quality remains a significant
confounder in attempts to draw conclusions about variations in
health-care outcomes.7
Given these concerns, it is crucial that published studies, which
draw conclusions from extracts of HES data, report their methodology in a transparent and reproducible manner. Although
89
consensus statements have been developed to encourage improvements in reporting quality for randomised trials, systematic
reviews, meta-analyses and observational studies, it is recognized
that ‘quality’ can be a nebulous concept.16 Traditionally, ‘quality’ is
equated with ‘non-susceptibility to bias’, and this poses particular
problems for retrospective studies using administrative data.
Nonetheless, several generic and HES-specific aspects of methodology have been suggested in the literature, and these can be used
as a guide for improving the quality of published studies.2,3,16,18,22
HES data should be considered distinct to data extracted directly
from individual hospital’s data systems given the extensive cleaning
process that the former goes through. Thus, researchers should be
explicit about identifying the source of data as from the HES
database in the abstract of their work to enable accurate
assessment and categorisation. A further qualifying statement in
the methods section that data were not taken directly from
individual hospital databases would add further clarification to
this issue given that five studies (three excluded at the search
stage) incorrectly classed PAS data as HES data. The premise that
retrospective studies using administrative data generally involve such
large numbers that formal power calculations are unnecessary seems
to be supported by the observation that only one study carried out
such a calculation. However, only 34 studies reported a clear reason
(i.e. clinical context) for choosing the study period, and it can be
argued that this is an indicator of a robust and genuine research
question formulated in advance of data extraction.
The identification of the patient cohort and exposure should
clearly state inclusion/exclusion criteria (e.g. age limits) and the
actual International Classification of Diseases (ICD) and Office of
Population Censuses and Surveys Classification of Interventions and
Procedures (OPCS) codes used. It is acknowledged that print space
may necessitate inclusion of the latter as Supplementary Data. Any
logic used in further cleaning the data to improve case validity (e.g.
exclusion of cases with certain diagnoses) should also be stated along
with the stage of data extraction at which this was used to aid interpretation and allow reproducibility. Given the variety of topics
investigated, it is unsurprising that that there was significant variability in the reporting of outcomes. However, although outcomes
reporting must remain context sensitive, it should be appreciated
that ascertainment of some measures (such as mortality) can be
improved by linkage of HES data to other databases (such as
ONS-mortality data).22 Indeed, it is surprising that in-hospital
death was the most commonly used mortality outcome measure
given that it can be confounded by institutional, political and
financial factors favouring more rapid discharge.27 With specific
reference to the HES–ONS mortality linkage, this feature was first
provided by the NHS Information Centre to organisations with appropriate access rights in 2006 (personal communication with the
NHS Information Centre) although studies establishing the feasibility of probabilistic matching techniques to facilitate record linkage
using regional-level data preceded this by 5 years.28,29 This time
frame fits the present review’s finding that all references to ONS
linkage were after 2001 and that the majority was after 2006.
There is increasing interest in the use of HES data to identify alternative and more patient-centred outcomes, such as discharge destination and complications of care, and although the current uptake is
relatively low, it is anticipated that this will increase in the future.30
When extracting derived outcome parameters (e.g. complications of
care), it is again crucial that the precise methodology used is described
(e.g. which ICD/OPCS codes were used and which episodes of an
admission were screened). The fact that systematic reviews of HEScoding accuracy have found accuracy rates of >80% for diagnostic
and procedural codes may partly explain why case validation was
performed in only 19.6% of studies.11–13 Alternatively, it is possible
that a barrier towards case validation and coding quality assessment is
that access to each patient’s case note number requires additional
clearance from HES data controllers.31
90
European Journal of Public Health
Methodological aspects specific to HES data were poorly reported
in comparison with generic aspects. The basic unit of each HES
record is the ‘Finished Consultant Episode’ (FCE), which covers
the time a patient spends under the care of one consultant,
whereas a ‘spell’ refers to the entire time a patient stays in hospital
under a single provider (Supplementary Figure S5).31 There is
currently no universally accepted methodology for assigning spells
based on FCEs. Spells can be assigned in different ways based on
different algorithms, and HES data are supplied to researchers
without pre-assignment of spells, which underscores the
importance of explicitly defining how spells were assigned when
publishing analyses of HES data extracts. There is further potential
for error when determining outcomes, such as admission rates and
length of stay if episodes are not linked together to recreate each
provider spell and if spells are not tracked between different
providers to form ‘continuous in-patient’ spells.31 Linkage is
possible because FCEs are sequentially numbered, and each patient
is assigned a unique pseudo-anonymized identification number (the
‘HES_ID’) based on case note number, date of birth, sex, residential
postcode and provider code.31,32 Even before the official introduction of the HES_ID in 2000, attempts at HES data record linkage
were performed by researchers based on similar demographic data,
and it is, thus, surprising that so few studies (27%) reported carrying
out this process.33
Despite the cleaning process to which it has been subjected, a
proportion of HES data may be missing or be invalid, and researchers
should specify how these proportions were handled in their methodology.31 Attempts to impute data or to use ‘grossing’ factors are not
necessarily preferable to censored analysis, but it is essential that the
approach used is clearly stated. Duplicated records can be present in
HES data prejudicing outcome measures (such as same day readmissions) and thus must be identified and excluded. For studies analysing
more than 1 year of HES data (and particularly when the epoch
includes periods where coding classifications change or are
modified), there should be a statement regarding year-to-year data
quality variation and whether this was considered in analyses.21,22
This review has also shown that regression modelling has been
increasingly used as a method to adjust for confounding in studies
using HES data with advanced statistical techniques such as
multilevel hierarchical modelling introduced in the last decade.
However, a significant proportion of studies (24%) report the use
of automated statistical algorithms (such as forward entry or
backward elimination) based on statistical significance and the
assumption of linear relationships to select covariates for inclusion
in regression models. Such methods should be avoided as they can
result in inappropriate exclusions or inclusions, and decisions about
inclusion of covariates should be based on appropriate clinical
knowledge and judgement in conjunction with statistical results.18
With particular reference to risk-adjustment methodology, it has
been demonstrated that non-additive risk relationships
(i.e. covariate interactions) in logistic regression models can
exacerbate bias, and such effects have been specifically shown for
the Charlson co-morbidity index and emergency mode of
admission, which are frequently used for risk adjustment.3 In the
examples mentioned, the interactions were credibly explained by
inter-hospital differences in coding and admission practices but
required substantial additional statistical analysis, which may
explain why only 22.7% of included studies performed this analysis.
A significant proportion (29.1%) of studies did not declare
conflicts of interest or state sources of funding, but a large body
of evidence supports the existence of an association between
source of funding and pro-funder conclusions in biomedical
research.34 Published studies using administrative data have driven
significant service change and reconfiguration in both the UK and
USA, and given that such changes can have significant financial
implications for health-care providers, it is suggested that a clear
declaration of source of funding and conflict of interest be made
in all cases.5,35
This study has some limitations. Government (e.g. Department of
Health) publications would not have been found in the search,
unless they had appeared in peer-reviewed medical journals and
thus introduces publication bias. Additionally, the limitation of
assessing study methodology indirectly through the window of
reporting quality is well recognized.36 Hence, although failure to
report certain criteria is not conclusive proof that they were not
met, transparent reporting must be considered central to a
published study’s credibility. Strengths of the study stem from its
adherence to published guidelines for performing systematic reviews
(such as the use of two independent data extractors).
On the basis of our findings, a list of reporting recommendations
is suggested to aid researchers wishing to publish studies using HES
data extracts (table 4). Although it is acknowledged that the HES
administrative data set shares certain similarities with data sets from
some European countries (e.g. the ability to link in-patient episodes
Table 4 Suggested reporting parameters for studies using HES data
1. Clearly identify the study as using retrospective administrative data in the title or abstract.a
2. Use the term ‘Hospital Episode Statistics’ in the abstract and avoid use of alternative eponyms such as ‘Health Episode Statistics’b
3. Clearly distinguish between HES (or equivalent centralized data) and PAS (or equivalent locally collected) data in the methods section of the papera,b
4. Provide a rationale for selecting the study period and state that this was decided before data extraction.a
5. Explicitly describe the parameters used in selecting participants and exposures, for examplea:
(i) Listing all ICD and OPCS (or other) codes used.
(ii) Any inclusion or exclusion criteria and the order in which they were applied.
6. Clearly describe any steps taken in cleaning the data set, for examplea:
(i) Identification of missing or invalid data.
(ii) Identification of duplicate records.
(iii) How missing, invalid or duplicate records were handled in analyses?
7. For studies involving more than 1 year of data, provide a statement that year to year data quality variations were assessed.a
8. Explicitly report use of the HES_ID (or equivalent identifier) to create continuous in-patient spells from composite episodes and separate provider spells.a,b
9. Clearly distinguish between whether episodes, spells, patients or procedures were counted and how these were defined.a
10. For outcomes such as mortality, link HES records to other databases such as the Office of National Statistics (or equivalent national mortality register) to
improve consistency.a,b
11. For outcomes such as readmissions and re-interventions, take account of denominator changes with time (e.g. survival analysis) or provide a statement of
the implications if this was not done.a
12. For studies comparing health-care providers, provide some evidence of external case validation or a statement of the implications if this was not done.a
13. In statistical modelling, clearly state the rationale for choice of candidate risk factors from available case-mix variables.a
14. In statistical modelling, screen for evidence of non-additive risk relationships between case-mix variables.a
15. Clearly state the presence or absence of any conflicts of interest and any sources of funding.a
a: Generic aspects with applicability to data sets from other countries.
b: Specific aspects with applicability to HES data.
Systematic review of HES studies
with the national mortality database as in Denmark’s National
Patient Register and Sweden’s Hospital Discharge Register), it is
recognized that these features are certainly not universally
available (e.g. France’s Donees du Programme de Médicalisation
du Systéme d’Information (PMSI) and Italy’s Schede di
Dimissione Ospedaliera).37 Nonetheless, the majority of recommendations can be applied in a generic manner, which maintains their
applicability to administrative data sets from other European
countries. The list is intended to serve as a guide only, and it
should be noted that in-depth guidelines relating to other aspects
of methodology such as case validation have been reported by
others.17 Given the deficiencies in reporting parameters in studies
using English administrative data identified by this review, it is
suggested that further research focus on health-care administrative
data sets from other European countries. This might provide
valuable comparative data that would aid in refining the generic
aspects of the guidelines presented in this study.
Supplementary data
91
10 Garout M, Tilney H, Tekkis P, Aylin P. Comparison of administrative data with the
Association of Coloproctology of Great Britain and Ireland (ACPGBI) colorectal
cancer database. Int J Colorectal Dis 2008;23:155–63.
11 Holt P, Poloniecki J, Thompson M. Multicentre study of the quality of a large
administrative data set and implications for comparing death rates. Br J Surg
2012;99:58–65.
12 Campbell SE, Campbell MK, Grimshaw JM, Walker AE. A systematic review of
discharge coding accuracy. J Public Health 2001;23:205–11.
13 Burns EM, Rigby E, Mamidanna R, et al. Systematic review of discharge coding
accuracy. J Public Health 2012;34:138–48.
14 Liberati A, Altman DG, Tetzlaff J, et al. The PRISMA statement for reporting
systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ 2009;339:b2700.
15 Farrar S, Yi D, Sutton M, et al. Has payment by results affected the way that
English hospitals provide care? Difference-in-differences analysis. BMJ
2009;339:554–6.
16 Sanderson S, Tatt ID, Higgins JP. Tools for assessing quality and susceptibility to
bias in observational studies in epidemiology: a systematic review and annotated
bibliography. Int J Epidemiol 2007;36:666–76.
Supplementary data are available at EURPUB online.
17 Benchimol EI, Manuel DG, To T, et al. Development and use of reporting guidelines
for assessing the quality of validation studies of health administrative data. J Clin
Epidemiol 2011;64:821–9.
Funding
18 McNamee R. Regression modelling and other methods to control confounding.
Occup Environ Med 2005;62:500–6.
P.H. is a Clinician Scientist financially supported by the National
Institute for Health Research (NIHR-CS-011-008). The views
expressed in this publication are those of the authors and not
necessarily those of the NHS, the National Institute for Health
Research or the Department of Health. This piece of work
received no additional funding from external sources.
Conflicts of interest: None declared.
Key points
Health-care administrative data are increasingly used to
inform the quality agenda.
Health outcome analyses using English administrative data
have increased in volume, scope and sophistication with
time.
Deficiencies in certain reporting parameters have persisted
and recommendations have been formulated to address
these shortcomings.
19 Bourgeois FT, Murthy S, Mandl KD. Outcome reporting among drug trials
registered in ClinicalTrials.gov. Ann Intern Med 2010;153:158–66.
20 Cuttini M, Saracci R. Can we facilitate the ethical approval of international observational studies? Int J Epidemiol 2009;38:1108–9.
21 Hansell A, Bottle A, Shurlock L, Aylin P. Accessing and using hospital activity data. J
Public Health Med 2001;23:51–6.
22 Lakhani A, Coles J, Eayres D, et al. Creative use of existing clinical and health
outcomes data to assess NHS performance in England: part 1—performance
indicators closely linked to clinical care. BMJ 2005;330:1426–31.
23 Jüni P, Witschi A, Bloch R, Egger M. The hazards of scoring the quality of clinical
trials for meta-analysis. JAMA 1999;282:1054–60.
24 Higgins JPT, Altman DG. Chapter 8: assessing risk of bias in included studies. In:
Higgins JPT, Green S, editors. Cochrane Handbook for Systematic Reviews of
Interventions: The Cochrane Collaboration. England: John Wiley and Sons Ltd, 2008:
187–241.
25 Black N. Why we need observational studies to evaluate the effectiveness of health
care. BMJ 1996;312:1215–8.
26 Hughes G. Hospital Episode Statistics: are they anything to write home about?
Emerg Med J 2009;26:392.
27 Daley J, Henderson WG, Khuri SF. Risk-adjusted surgical outcomes. Ann Rev Med
2001;52:275–87.
References
28 Murray GD, Lawrence AE, Boyd J. Linkage of Hospital Episode Statistics
(HES) Data to Office for National Statistics (ONS) Mortality Records. United
Kingdom: Bristol Royal Infirmary Inquiry, 2000, Available at: www.bristol-inq…
Essay Writing Service Features
Our Experience
No matter how complex your assignment is, we can find the right professional for your specific task. Achiever Papers is an essay writing company that hires only the smartest minds to help you with your projects. Our expertise allows us to provide students with high-quality academic writing, editing & proofreading services.Free Features
Free revision policy
$10Free bibliography & reference
$8Free title page
$8Free formatting
$8How Our Dissertation Writing Service Works
First, you will need to complete an order form. It's not difficult but, if anything is unclear, you may always chat with us so that we can guide you through it. On the order form, you will need to include some basic information concerning your order: subject, topic, number of pages, etc. We also encourage our clients to upload any relevant information or sources that will help.
Complete the order form
Once we have all the information and instructions that we need, we select the most suitable writer for your assignment. While everything seems to be clear, the writer, who has complete knowledge of the subject, may need clarification from you. It is at that point that you would receive a call or email from us.
Writer’s assignment
As soon as the writer has finished, it will be delivered both to the website and to your email address so that you will not miss it. If your deadline is close at hand, we will place a call to you to make sure that you receive the paper on time.
Completing the order and download