These function produces MSF OCA dictionaries based on DHIS2 (for outbreaks) and Kobo (for surveys) data sets defining the data element name, code, short names, types, and key/value pairs for translating the codes into human-readable format.
Usage
msf_dict(
disease,
name = "MSF-outbreak-dict.xlsx",
tibble = TRUE,
compact = TRUE,
long = TRUE
)
msf_dict_survey(
disease,
name = "MSF-survey-dict.xlsx",
tibble = TRUE,
compact = TRUE,
long = TRUE,
template = TRUE
)
Arguments
- disease
Specify which disease you would like to use.
msf_dict()
supports "AJS", "Cholera", "Measles", "Meningitis"msf_dict_survey()
supports "Mortality", "Nutrition", "Vaccination_long" and "Vaccination_short" (only used in surveys iftemplate = TRUE
)
- name
the name of the dictionary stored in the package.
msf_dict_survey()
supports Kobo dictionaries not stored within this package, to use these: specifyname
as path to .xlsx file and set thetemplate = False
- tibble
Return data dictionary as a tidyverse tibble (default is TRUE)
- compact
if
TRUE
(default), then a nested data frame is returned where each row represents a single variable and a nested data frame column called "options", which can be expanded withtidyr::unnest()
. This only works iflong = TRUE
.- long
If
TRUE
(default), the returned data dictionary is in long format with each option getting one row. IfFALSE
, then two data frames are returned, one with variables and the other with content options.@param template Only used for
msf_dict_survey()
. IfTRUE
(default) the returned data dictionary is a generic MSF OCA ERB pre-approved dictionary. IfFALSE
allows you to read in your own Kobo dictionary by defining a path inname
.- template
(for survey dictionaries): if
TRUE
read in a generic dictionary based on the MSF OCA ERB pre-approved template. However you can also specify your own dictionary if this differs substantially, by settingtemplate = FALSE
and defining a path inname
.
See also
matchmaker::match_df()
gen_data()
msf_dict_survey()
Examples
if (require("dplyr") & require("matchmaker")) {
withAutoprint({
# You will often want to use MSF dictionaries to translate codes to human-
# readable variables. Here, we generate a data set of 20 cases:
dat <- gen_data(
dictionary = "Cholera",
varnames = "data_element_shortname",
numcases = 20,
org = "MSF"
)
print(dat)
# We want the expanded dictionary, so we will select `compact = FALSE`
dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
print(dict)
# Now we can use matchmaker to filter the data:
dat_clean <- matchmaker::match_df(dat, dict,
from = "option_code",
to = "option_name",
by = "data_element_shortname",
order = "option_order_in_set"
)
print(dat_clean)
})
}
#> > dat <- gen_data(dictionary = "Cholera", varnames = "data_element_shortname",
#> + numcases = 20, org = "MSF")
#> > print(dat)
#> # A tibble: 20 × 45
#> case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#> <chr> <date> <chr> <int> <int>
#> 1 A1 2018-04-28 Village B 59 NA
#> 2 A2 2018-04-26 Village A 58 NA
#> 3 A3 2018-01-24 Village B 16 NA
#> 4 A4 2018-01-07 Village A 46 NA
#> 5 A5 2018-01-29 Village A 9 NA
#> 6 A6 2018-01-20 Village D 10 NA
#> 7 A7 2018-04-18 Village C 9 NA
#> 8 A8 2018-01-18 Village C 28 NA
#> 9 A9 2018-01-10 Village C 51 NA
#> 10 A10 2018-01-01 Village B 16 NA
#> 11 A11 2018-04-09 Village C 21 NA
#> 12 A12 2018-04-26 Village A 9 NA
#> 13 A13 2018-02-28 Village C 40 NA
#> 14 A14 2018-01-03 Village B 34 NA
#> 15 A15 2018-04-30 Village A 62 NA
#> 16 A16 2018-03-08 Village C 43 NA
#> 17 A17 2018-02-28 Village B 38 NA
#> 18 A18 2018-03-13 Village A 70 NA
#> 19 A19 2018-04-22 Village D 17 NA
#> 20 A20 2018-03-14 Village B 37 NA
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> # trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> # date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> # previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> # readmission <fct>, msf_involvement <fct>,
#> # cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …
#> > dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
#> > print(dict)
#> # A tibble: 182 × 11
#> data_element_uid data_element_name data_element_shortname
#> <chr> <chr> <chr>
#> 1 AafTlSwliVQ egen_001_patient_case_number case_number
#> 2 OTGOtWBz39J egen_004_date_of_consultation_admiss… date_of_consultation_…
#> 3 wnmMr2V3T3u egen_006_patient_origin patient_origin
#> 4 sbgqjeVwtb8 egen_008_age_years age_years
#> 5 eXYhovYyl61 egen_009_age_months age_months
#> 6 UrYJSk2Wp46 egen_010_age_days age_days
#> 7 D1Ky5K7pFN6 egen_011_sex sex
#> 8 D1Ky5K7pFN6 egen_011_sex sex
#> 9 D1Ky5K7pFN6 egen_011_sex sex
#> 10 dTm5R53YYXC egen_012_pregnancy_status pregnant
#> # ℹ 172 more rows
#> # ℹ 8 more variables: data_element_description <chr>,
#> # data_element_valuetype <chr>, data_element_formname <chr>,
#> # used_optionset_uid <chr>, option_code <chr>, option_name <chr>,
#> # option_uid <chr>, option_order_in_set <dbl>
#> > dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name",
#> + by = "data_element_shortname", order = "option_order_in_set")
#> > print(dat_clean)
#> # A tibble: 20 × 45
#> case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#> <chr> <date> <chr> <int> <int>
#> 1 A1 2018-04-28 Village B 59 NA
#> 2 A2 2018-04-26 Village A 58 NA
#> 3 A3 2018-01-24 Village B 16 NA
#> 4 A4 2018-01-07 Village A 46 NA
#> 5 A5 2018-01-29 Village A 9 NA
#> 6 A6 2018-01-20 Village D 10 NA
#> 7 A7 2018-04-18 Village C 9 NA
#> 8 A8 2018-01-18 Village C 28 NA
#> 9 A9 2018-01-10 Village C 51 NA
#> 10 A10 2018-01-01 Village B 16 NA
#> 11 A11 2018-04-09 Village C 21 NA
#> 12 A12 2018-04-26 Village A 9 NA
#> 13 A13 2018-02-28 Village C 40 NA
#> 14 A14 2018-01-03 Village B 34 NA
#> 15 A15 2018-04-30 Village A 62 NA
#> 16 A16 2018-03-08 Village C 43 NA
#> 17 A17 2018-02-28 Village B 38 NA
#> 18 A18 2018-03-13 Village A 70 NA
#> 19 A19 2018-04-22 Village D 17 NA
#> 20 A20 2018-03-14 Village B 37 NA
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> # trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> # date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> # previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> # readmission <fct>, msf_involvement <fct>,
#> # cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …