Health Equity
Responding to claims that efforts to identify discrimination is a ‘witch hunt’
Overview
In this lab, you will work through a series of exercises to visualize data regarding racial inequities in maternal mortality rates in the United States. Specifically, you will practice producing both uni-variate and multi-variate column plots and line plots from aggregate data in Tableau.
The Counter-Claim
In a “A Woke Panic on Maternal Mortality” Stanley Goldfarb and Benita Cotton-Orr argue that activists and academics have exploited statistics on maternal mortality rates to push a narrative of systemic racism in healthcare. They argue that the data are insufficient for proving discrimination (due to both error-prone reporting and a small sample size) and that “radical activists” and “woke ideologues” are pushing race-based healthcare policies based on ideology and not on science. Their arguments reflect a narrative that efforts to examine disparities are “witch hunts.”
Refuting the “witch hunt” narrative requires showing that disparities are not coincidental one-off events but are persistent over time and across place. Overall, this lab will demonstrate what we can see when we historicize and geographically situate statistics that we can’t see when only looking at a snapshot of data.
The Dataset
Through the National Vital Statistics System, the US National Center for Health Statistics aggregates data about population health (including births, deaths, and pregnancies) from states in order to guide public health policy and decision-making. The Center for Disease Control makes this data available to the public through a series of public use files, along with an online data query interface called WONDER (Wide-ranging OnLine Data for Epidemiologic Research). Navigating to the WONDER interface, an analyst can access and download US mortality and population data, spanning the years from 1999 to 2019, aggregated at national, state, and county levels. With the data gathered from US death certificates, which report an individuals’ underlying cause of death according to the Tenth Revision of the International Classification of Diseases, the death counts can be disaggregated into hundreds of different causes of death. The death counts can also be disaggregated by a series of demographic variables, including gender, age, race, and Hispanic ethnicity.
By adjusting the query parameters on the WONDER request form to indicate how the CDC’s data should be aggregated and filtered, data analysts can produce a series of different rectangular datasets. For example, an analyst can opt to group the results by Year and Race and filter the Cause of Death to ICD-10 codes O00-O99 (Pregnancy, childbirth and the puerperium) to produce a dataset that tracks the death counts and crude death rate due to pregnancy-related causes for different racial demographics in the US from 1999 to 2021. Alternatively, an analyst could opt to group the results by State and Cause of Death and filter the data to 2021 to determine the leading causes of death in each state in that year.
Instructions
Part 1: What counts in this dataset?
Navigate the data documentation here in order to determine when and how a “death” is counted in this dataset.
Part 2: Explore the WONDER Interface
For timing purposes, I’m going to link the datasets that you need for today’s Tableau exercises here. However, I will take some time during class to walk you through the different components of the interface.
Part 3: Cleaning Data Files
Download the following files to your computer:
Open Tableau, and create a new workbook called cdc_maternal_deaths
. Click “Connect to Data” and the race_age_pregnancy_2020.txt
file.
You’re going to notice when you import the file that the table needs some cleaning.
First, there is a column at the start of each table labeled “Notes.” If you scroll down, you will eventually see a series of notes that we need to consider when analyzing the data:
This is all important information and we should keep it in mind as we go about our analysis. …but this is metadata, not data for analysis. or the purposes of our data analysis, we are going to Hide that column.
Question
Click on the downward triangle on the Notes column and select Hide.
Now that we have hidden that column, there are a series of empty rows at the bottom of our dataset. We are going to filter those out.
Question
Click on the “Add” link next to Filters in the top right corner of Tableau. Click Add, and select Race - a variable that shouldn’t have any missing values in this dataset. Click OK, and then check all boxes, except for “Null” to filter out the null values.
You will also notice that both the Deaths
column and the Crude Rate
column were assumed to contain text data, even though those columns are supposed to report numeric quantities. This is because, in certain subsets, the CDC “Suppressed” the data or determined it to be “Unreliable”. Any time fewer than 10 deaths would be reported in a subset of data, the CDC suppressed the death count to protect the privacy of the individuals in that subset. This means that the more specific you make your data request, the more likely it will be that data gets suppressed (because fewer people will be represented in each subset). Any time fewer than 20 deaths are reported, the crude rate is considered to be “Unreliable.” While this is helpful information for us, in order to plot the data, we need those columns to both be numeric.
Question
Click on the number ‘Abc’ symbol in both the Deaths
column and the Crude Rate
column and convert the columns to numeric. Make sure you select the Crude Rate
column is displayed in decimals.
We need to repeat this for the remainder of files we will use in the lab. However one of the files has some special formatting issues.
Question
In the top menu, click Data > New Data Source. Add either year_pregnancy.txt
or race_year_pregnancy.txt
.
Repeat the formatting steps for the file you choose, and then repeat this process for the two other text files.
When you get to race_pregnancy_2020.txt
, we need to take one additional step. Tableau does not recognize the delimiters in this file. Add the file to see what I mean. See how the table looks quite off?
To fix this, click on the downward arrow next to the file name in your data view, and then select “Text File Properties”. Set the Field Separator to “Tab”.
After that you should be able to edit the file the same way you did the rest of them.
Finally, we are ready to do some analysis!
Part 4: Data Analysis
Column Plots
Column plots are particularly useful for comparing the numeric values associated with different categories. They typically involve a categorical variable on the x-axis and a numeric variable on the y-axis. We can make sense of how categories differ by comparing the heights of each bar. In column plots, height serves as a visual aesthetic - some feature of a plot that communicates something about the values in our data.
Question
Create a new worksheet in Tableau called “U.S. Maternal Mortality, 2020”. Name the tab “race_2020”. Click on race_pregnancy_2020
in the “Data” tab. Drag Race
to the Rows field and Crude Rate
to the Columns field. Sort the bars from longest to shortest.
- How can we summarize what we see in this plot quantitatively?
- How might we interpret what we see in this plot?
- What might it be evidence of?
Question
Duplicate the last plot that you created, and rename the tab “race_2020_tree”. Click on the “Show Me” button and select the treemap. Adjust the title of the legend to “Crude Rate”.
Note how, in this new plot, we are no longer using height to visualize differences across categories. Instead, we are using both area and color to visualize the differences. Area and color are also visual aesthetics.
Notably, we are visualizing both area and color in a continuous way. Larger boxes are associated with larger values, and smaller boxes are associated with smaller values. Color is displayed on a gradient with darker colors associated with larger values and lighter colors associated with smaller values. The area size is specific to a numeric value, and the shade of color is specific to a numeric value. Indeed, it makes sense to use a color gradient when visualizing numeric data.
This differs from visualizing color in a discrete way.
Question
Duplicate the last plot that you created, and rename the tab “race_2020_tree_cat”. Drag Race
to the Color field.
Note how in this plot, colors are not displayed on a gradient. Instead, there are discrete colors associated with each race. This is because we differentiating color along a categorical variable, where there are discrete divisions between categories. In general, we want to use discrete color palettes when associating color with a categorical variable and continuous color palettes when associated color with a numeric variable.
Which of the plots that we just created do you find the most compelling and why?
Stacked Column Plots
In the last series of plots, we were only visualizing one categorical variable. …but sometimes, we want to compare numeric values across some intersection of categorical variables. In this case, we will create a stacked column plot, using both height and color as visual aesthetics.
Create a new worksheet in Tableau called “U.S. Maternal Mortality, 2020”. Name the tab “race_2020_age”. Click on race_age_pregnancy_2020
in the “Data” tab. Drag Ten-Year Age Groups
to the Columns field and Crude Rate
to the Rows field. Drag Race
to the Color field. Note how a discrete color palette is created.
Click on the “Show Me” button, and select the side-by-side bars. I personally prefer this type of plot over a stacked bar plot because I find it easier to compare across categories when the bars are side-by-side. Move around the order of the variables to make the most compelling plot. Add a Filter to only show the relevant age groups.
- How can we summarize what we see in this plot quantitatively?
- How might we interpret what we see in this plot?
- What might it be evidence of?
Line Plots
Line plots are particularly useful for showing change over time. They typically involve a date on the x-axis and a numeric variable on the y-axis. The height of each point tells us something about the time period represented.
Question
Create a new worksheet in Tableau called “U.S. Maternal Mortality - 1999-2020”. Name the tab “by_year”. Click on year_pregnancy
in the “Data” tab. Drag Year
to the Columns field and Crude Rate
to the Rows field.
- How can we summarize what we see in this plot quantitatively?
- How might we interpret what we see in this plot?
- What might it be evidence of?
Sometimes we want to compare the changes across different groups over time. To do so, we can use color to differentiate lines across different groups.
Question
Create a new worksheet in Tableau called “U.S. Maternal Mortality - 1999-2020”. Name the tab “by_year_race”. Click on year_race_pregnancy
in the “Data” tab. Drag Year
to the Columns field and Crude Rate
to the Rows field. Convert Crude Rate
to an attribute. Drag Race
to the Color field.
- What kind of color palette was created here?
- How can we summarize what we see in this plot quantitatively?
- How might we interpret what we see in this plot?
- What might it be evidence of?
Part 5: Reflection
Note that this data is often cited as evidence of racial disparities in healthcare. The CDC website includes a webpage titled “Working Together to Reduce Black Maternal Mortality,” and cites this data as evidence of racial disparities. Here are just a few examples of other articles that also do so:
- Racial Disparities in Maternal and Infant Health: Current Status and Efforts to Address Them
- American Black women face disproportionately high rates of maternal mortality
- Racial Disparities in Maternal Health
- Why Racial Gaps In Maternal Mortality Persist
- Visualizing the stark maternal health inequities in the United States
How might we summarize what we just learned into a factual claim? How does this data respond to the “witch hunt” narrative? What other data or analysis strategies could be engaged to respond to these counter-claims?