This data, from NCHS, shows provisional death counts for the US. These data are obtained from the CDC website, data.CDC.org. Within, you can find COVID-19-related deaths separated by education, age, sex, and race. Data was collected as early as January 1st, 2020 and continued until January 30th, 2021. The data was last updated February 3rd, 2021.
# Imports the raw data set. The original data set is a CSV file.raw_data <-read_csv("data/AH_Provisional_COVID-19_Deaths_by_Educational_Attainment__Race__Sex__and_Age.csv")
Rows: 224 Columns: 9
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): Data as of, Start Date, End Date, Education Level, Race or Hispanic...
dbl (2): COVID-19 Deaths, Total Deaths
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Shows the classes of the variables.glimpse(raw_data)
# Creates a new data set with the variables we would like to keep. In an effort to be # more user friendly, the variable names have been converted to all lowercase with no # spaces. Also, some variables have been converted to factor classes.new_data <- raw_data %>%# Changes the variable names and makes some factors.mutate(education_level =as.factor(`Education Level`),race_origin =as.factor(`Race or Hispanic Origin`),sex =as.factor(`Sex`),age_group =as.factor(`Age Group`),covid_deaths =`COVID-19 Deaths`,total_deaths =`Total Deaths` ) %>%# Pushes only the properly formatted variables to the new data set.select(education_level,race_origin,sex,age_group,covid_deaths,total_deaths)# Shows a summary of the variables included in the dataset.glimpse(new_data)
Rows: 224
Columns: 6
$ education_level <fct> Associate degree or some college, Associate degree or …
$ race_origin <fct> Hispanic, Hispanic, Hispanic, Hispanic, Hispanic, Hisp…
$ sex <fct> Female, Female, Female, Female, Male, Male, Male, Male…
$ age_group <fct> 0-17 years, 18-49 years, 50-64 years, 65 years and ove…
$ covid_deaths <dbl> 0, 423, 857, 1793, 0, 737, 1592, 2655, 0, 82, 176, 362…
$ total_deaths <dbl> 2, 3117, 4153, 10225, 1, 5676, 6183, 11544, 0, 591, 79…
summary(new_data)
education_level
Associate degree or some college:56
Bachelor’s degree or more :56
High school graduate/GED or less:56
Unknown :56
race_origin sex
Hispanic :32 Female:112
Non-Hispanic American Indian or Alaska Native :32 Male :112
Non-Hispanic Asian :32
Non-Hispanic Black :32
Non-Hispanic Native Hawaiian or Other Pacific Islander:32
Non-Hispanic White :32
Other/Unknown :32
age_group covid_deaths total_deaths
0-17 years :56 Min. : 0.00 Min. : 0.0
18-49 years :56 1st Qu.: 3.75 1st Qu.: 112.0
50-64 years :56 Median : 81.00 Median : 817.5
65 years and over:56 Mean : 1880.20 Mean : 15665.8
3rd Qu.: 627.00 3rd Qu.: 4997.5
Max. :76871.00 Max. :670295.0