Plot a population pyramid (age-sex) from a dataframe.
age_pyramid( data, age_group = "age_group", split_by = "sex", stack_by = NULL, count = NULL, proportional = FALSE, na.rm = TRUE, show_midpoint = TRUE, vertical_lines = FALSE, horizontal_lines = TRUE, pyramid = TRUE, pal = NULL )
data | Your dataframe (e.g. linelist) |
---|---|
age_group | the name of a column in the data frame that defines the age group categories. Defaults to "age_group" |
split_by | the name of a column in the data frame that defines the the bivariate column. Defaults to "sex". See NOTE |
stack_by | the name of the column in the data frame to use for shading
the bars. Defaults to |
count | for pre-computed data the name of the column in the data frame for the values of the bars. If this represents proportions, the values should be within [0, 1]. |
proportional | If |
na.rm | If |
show_midpoint | When |
vertical_lines | If you would like to add dashed vertical lines to help
visual interpretation of numbers. Default is to not show ( |
horizontal_lines | If |
pyramid | if |
pal | a color palette function or vector of colors to be passed to
|
If the split_by
variable is bivariate (e.g. an indicator for a
specific symptom), then the result will show up as a pyramid, otherwise, it
will be presented as a facetted barplot with with empty bars in the
background indicating the range of the un-facetted data set. Values of
split_by
will show up as labels at top of each facet.
library(ggplot2) old <- theme_set(theme_classic(base_size = 18)) # with pre-computed data ---------------------------------------------------- # 2018/2008 US census data by age and gender data(us_2018) data(us_2008) age_pyramid(us_2018, age_group = age, split_by = gender, count = count)age_pyramid(us_2008, age_group = age, split_by = gender, count = count)# 2018 US census data by age, gender, and insurance status data(us_ins_2018) age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = count )us_ins_2018$prop <- us_ins_2018$percent/100 age_pyramid(us_ins_2018, age_group = age, split_by = gender, stack_by = insured, count = prop, proportion = TRUE )# from linelist data -------------------------------------------------------- set.seed(2018 - 01 - 15) ages <- cut(sample(80, 150, replace = TRUE), breaks = c(0, 5, 10, 30, 90), right = FALSE ) sex <- sample(c("Female", "Male"), 150, replace = TRUE) gender <- sex gender[sample(5)] <- "NB" ill <- sample(c("case", "non-case"), 150, replace = TRUE) dat <- data.frame( AGE = ages, sex = factor(sex, c("Male", "Female")), gender = factor(gender, c("Male", "NB", "Female")), ill = ill, stringsAsFactors = FALSE ) # Create the age pyramid, stratifying by sex print(ap <- age_pyramid(dat, age_group = AGE))# Create the age pyramid, stratifying by gender, which can include non-binary print(apg <- age_pyramid(dat, age_group = AGE, split_by = gender))# Remove NA categories with na.rm = TRUE dat2 <- dat dat2[1, 1] <- NA dat2[2, 2] <- NA dat2[3, 3] <- NA print(ap <- age_pyramid(dat2, age_group = AGE))#> Warning: 2 missing rows were removed (1 values from `AGE` and 1 values from `sex`).#> Warning: 2 missing rows were removed (1 values from `AGE` and 1 values from `sex`).# Stratify by case definition and customize with ggplot2 ap <- age_pyramid(dat, age_group = AGE, split_by = ill) + theme_bw(base_size = 16) + labs(title = "Age groups by case definition") print(ap)# Stratify by multiple factors ap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, vertical_lines = TRUE ) + labs(title = "Age groups by case definition and sex") print(ap)# Display proportions ap <- age_pyramid(dat, age_group = AGE, split_by = sex, stack_by = ill, proportional = TRUE, vertical_lines = TRUE ) + labs(title = "Age groups by case definition and sex") print(ap)# empty group levels will still be displayed dat3 <- dat2 dat3[dat$AGE == "[0,5)", "sex"] <- NA age_pyramid(dat3, age_group = AGE)#> Warning: 11 missing rows were removed (1 values from `AGE` and 10 values from `sex`).theme_set(old)