PlainHealth

Health data guide

Understanding Mortality Data

What the numbers mean, where they come from, and how to read them correctly.

Key Takeaway

Mortality data comes from death certificates compiled by the CDC's NCHS. Always use age-adjusted rates, not raw counts, when comparing states or time periods. A state with more elderly residents will have more deaths regardless of actual health conditions. Age-adjustment removes this bias and makes comparisons meaningful.

How Mortality Data Is Created

Every death in the United States requires a death certificate. The attending physician or medical examiner records the cause of death using a standardized chain-of-events format: the immediate cause, the conditions leading to it, and any significant contributing conditions. This information is coded using ICD-10 by trained medical record specialists called nosologists.

State vital statistics offices compile these certificates and transmit data to the CDC's National Center for Health Statistics. NCHS produces the national mortality datasets that PlainHealth uses. This means the data represents actual reported causes of death across the entire US population, it is not a sample or estimate. Every registered death is included.

The result is one of the most comprehensive mortality surveillance systems in the world, covering 330+ million Americans with a time series extending back decades.

Rates vs. Counts, Why It Matters

Raw death counts are misleading for comparison. California had roughly 266,000 total deaths in 2017; Wyoming had about 5,400. California has nearly 50 times more heart disease deaths than Wyoming, but only because it has 65 times more people. To compare meaningfully, you need death rates expressed as deaths per 100,000 population.

But even crude rates can mislead because age is the strongest predictor of mortality. A state with a median age of 45 will have a higher crude death rate than one with a median age of 32, even if health conditions are identical at every age group.

Measure Definition Best Used For Limitation
Raw count Total deaths from a cause Estimating resource needs Not comparable across populations
Crude rate Deaths ÷ population × 100,000 Understanding actual burden Biased by age distribution
Age-adjusted rate Standardized to 2000 US population Comparing states and time periods Less intuitive to interpret

PlainHealth shows age-adjusted rates for all state comparisons. Browse state pages to see how mortality patterns differ across the country using comparable numbers.

Age-Adjusted Rates: A Worked Example

Consider heart disease in California vs. Wyoming. In 2015:

State Raw Deaths Crude Rate Age-Adjusted Rate Interpretation
California ~62,000 160/100K 152/100K Slightly better than national avg
Wyoming ~1,000 175/100K 162/100K Slightly above national avg
US Average - 168/100K 157/100K Baseline for comparison

The age-adjusted rates reveal a smaller gap between the states than raw counts suggest. Wyoming's younger median age gives it a misleadingly low crude rate for some causes, age-adjustment corrects this distortion. Explore individual cause pages to see state comparisons for each cause.

What "Leading Cause" Means

The "leading causes of death" ranking is based on the underlying cause, the disease or injury that initiated the chain of events leading to death. This is a useful but imperfect classification system with several important nuances:

  • Underlying vs. contributing: Diabetes may contribute to a death classified under heart disease. As a result, some conditions appear less prevalent as "leading causes" than their true health burden warrants. Diabetes is likely undercounted by 2–3x as an underlying cause.
  • Coding changes: ICD classification updates can shift deaths between categories without any change in actual health conditions. This is why consistent datasets like NCHS 1999–2017 are valuable, they use a single classification framework.
  • Certification quality: Cause of death determination varies in accuracy. Deaths in hospitals with autopsy results are more precisely classified than deaths at home with limited medical history, particularly in rural areas.
  • Rare conditions: The NCHS leading causes dataset covers 10 major categories. Many causes (infectious diseases, rare cancers) are grouped into "all other causes" not tracked here.

Explore PlainHealth's causes of death pages to see how each cause ranks nationally and varies by state.

ICD-10 Classification: How Causes Are Coded

Each death is coded using the International Classification of Diseases, 10th Revision (ICD-10), a global standard maintained by the World Health Organization. ICD-10 assigns a code to every cause of death, for example, "I21" for acute myocardial infarction (heart attack).

Trained nosologists at state health departments apply these codes based on the death certificate narrative. For complex deaths with multiple conditions, the underlying cause follows a hierarchy of rules defined by the WHO and CDC. The result is a standardized, consistent dataset that allows comparisons across years and across the entire country.

One limitation: when the ICD transitions to a new revision (ICD-11 is now in use internationally), some causes shift codes. This can create apparent trend breaks that reflect classification changes, not actual health changes.

How to Use PlainHealth Data

PlainHealth presents 19 years of mortality data (1999–2017) across 10 leading causes and all 50 states plus DC. Use it to:

  • Compare age-adjusted death rates between states for specific causes, look for states significantly above or below the national average.
  • Track trends over time, is heart disease declining? Is suicide increasing? Consistent multi-year trends are more meaningful than single-year spikes.
  • Understand the leading causes of death in your state vs. the national average, some states have unusual cause profiles driven by regional factors.
  • See how different causes affect different regions of the country, the stroke belt and opioid epidemic patterns are visible in the data.

Limitations to Keep in Mind

Mortality data is powerful but not omniscient. Key limitations include:

  • Underreporting of contributing causes: Conditions like diabetes and hypertension are dramatically undercounted as underlying causes because they often contribute to rather than directly cause death.
  • No within-state granularity: State averages can mask significant urban/rural disparities. Appalachian Kentucky and Louisville are one data point in this dataset.
  • Suicide underreporting: Stigma and uncertainty about intent lead to underreporting of suicide, especially in certain communities and jurisdictions. True suicide rates are likely higher than official statistics.
  • Data lags: The NCHS dataset ends at 2017. It does not capture COVID-19, the fentanyl crisis peak, or recent healthcare changes. For more recent data, consult CDC WONDER directly.

Frequently Asked Questions

Where does mortality data come from?

Mortality data comes from death certificates filed with state vital statistics offices. Every death in the US requires a death certificate listing the cause of death, coded using the International Classification of Diseases (ICD-10). The CDC's National Center for Health Statistics compiles these into national and state-level datasets that cover every registered death in the country.

What is an age-adjusted death rate?

An age-adjusted death rate removes the effect of different age distributions between populations. Without adjustment, a state with many elderly residents would appear unhealthier than a younger state, even if risk at every age were identical. Age-adjustment uses the 2000 US Standard Population to apply a uniform age structure, enabling fair comparisons across states and time periods.

What is the difference between crude and age-adjusted rates?

A crude rate is total deaths divided by total population, it reflects what actually happened but cannot be fairly compared across populations with different age structures. An age-adjusted rate standardizes for age, enabling genuine health comparisons. PlainHealth shows age-adjusted rates for all state comparisons because crude rates can mislead when states have very different demographic profiles.

Why does the data only go to 2017?

PlainHealth uses the CDC NCHS Leading Causes of Death dataset, which provides a clean, standardized time series from 1999 to 2017. More recent data is available through CDC WONDER but uses different formatting and classification changes. The 1999–2017 range provides a consistent 19-year window for trend analysis without breaking changes in ICD coding or data format.

How is cause of death determined?

The attending physician or medical examiner records the cause of death on the death certificate as a chain of events: immediate cause, conditions leading to it, and contributing factors. The underlying cause, the disease or injury that initiated the chain, is what gets coded and counted in national statistics using ICD-10 classification.

Can one death be counted under multiple causes?

In the leading causes dataset, each death is counted once under the underlying cause of death. However, death certificates can list multiple contributing causes. A person with diabetes who dies of a heart attack is typically counted under heart disease. This means some conditions like diabetes are undercounted as primary causes even though they frequently contribute to deaths.

Sources

  • CDC National Center for Health Statistics, Leading Causes of Death, 1999–2017
  • ICD-10, International Classification of Diseases, 10th Revision (WHO)
  • CDC, Age Adjustment Using the 2000 US Standard Population (Technical Notes)

This content is for informational purposes only and does not constitute medical advice. For health concerns, consult a qualified healthcare provider.

Understanding the Data

The information presented throughout this guide is informed by publicly available public records published by federal and state government agencies. Our database aggregates and standardizes these records to make them more accessible and easier to interpret for general audiences. When we reference specific statistics or trends, they are drawn directly from these authoritative sources unless explicitly noted otherwise.

It is important to understand the limitations of any large-scale data dataset. Records may contain errors from the original data collection process, some fields may be incomplete for older entries, and classification systems may have changed over time. Our analysis accounts for these factors by clearly labeling data vintage, flagging records with missing critical fields, and noting when temporal comparisons span methodology changes in the source data.

For readers who want to conduct their own research, we recommend going directly to the source whenever possible. federal and state government agencies provides detailed documentation on collection methodology, sampling frames, and known data quality issues. Our goal is not to replace primary sources but to make them more approachable and to highlight patterns that may not be immediately obvious when browsing raw records.

How We Analyze Data Records

Our analytical approach involves several steps designed to surface meaningful insights from large datasets. First, we clean and standardize the raw data, handling variations in naming conventions, date formats, and categorical labels. Then we compute summary statistics, distributions, and comparative benchmarks across relevant dimensions such as geography, time period, and category type.

Key metrics we examine include statistical records, geographic distributions, temporal trends. These indicators provide a multi-dimensional view of each entity in our database, allowing users to understand not just individual records but how they compare to peers, regional averages, and national benchmarks. We believe this contextual approach is far more valuable than presenting raw numbers in isolation.