MSF data dictionaries and dummy datasets

These function produces MSF OCA dictionaries based on DHIS2 (for outbreaks) and Kobo (for surveys) data sets defining the data element name, code, short names, types, and key/value pairs for translating the codes into human-readable format.

Usage

msf_dict(
  disease,
  name = "MSF-outbreak-dict.xlsx",
  tibble = TRUE,
  compact = TRUE,
  long = TRUE
)

msf_dict_survey(
  disease,
  name = "MSF-survey-dict.xlsx",
  tibble = TRUE,
  compact = TRUE,
  long = TRUE,
  template = TRUE
)

Arguments

disease

Specify which disease you would like to use.

msf_dict() supports "AJS", "Cholera", "Measles", "Meningitis"
msf_dict_survey() supports "Mortality", "Nutrition", "Vaccination_long" and "Vaccination_short" (only used in surveys if template = TRUE)

name

the name of the dictionary stored in the package.

msf_dict_survey() supports Kobo dictionaries not stored within this package, to use these: specify nameas path to .xlsx file and set the template = False

tibble

Return data dictionary as a tidyverse tibble (default is TRUE)

compact

if TRUE (default), then a nested data frame is returned where each row represents a single variable and a nested data frame column called "options", which can be expanded with tidyr::unnest(). This only works if long = TRUE.

long

If TRUE (default), the returned data dictionary is in long format with each option getting one row. If FALSE, then two data frames are returned, one with variables and the other with content options.

@param template Only used for msf_dict_survey(). If TRUE (default) the returned data dictionary is a generic MSF OCA ERB pre-approved dictionary. If FALSE allows you to read in your own Kobo dictionary by defining a path in name.

template

(for survey dictionaries): if TRUE read in a generic dictionary based on the MSF OCA ERB pre-approved template. However you can also specify your own dictionary if this differs substantially, by setting template = FALSE and defining a path in name.

Examples


if (require("dplyr") & require("matchmaker")) {
  withAutoprint({
    # You will often want to use MSF dictionaries to translate codes to human-
    # readable variables. Here, we generate a data set of 20 cases:
    dat <- gen_data(
      dictionary = "Cholera",
      varnames = "data_element_shortname",
      numcases = 20,
      org = "MSF"
    )
    print(dat)

    # We want the expanded dictionary, so we will select `compact = FALSE`
    dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
    print(dict)

    # Now we can use matchmaker to filter the data:
    dat_clean <- matchmaker::match_df(dat, dict,
      from = "option_code",
      to = "option_name",
      by = "data_element_shortname",
      order = "option_order_in_set"
    )
    print(dat_clean)
  })
}
#> > dat <- gen_data(dictionary = "Cholera", varnames = "data_element_shortname", 
#> +     numcases = 20, org = "MSF")
#> > print(dat)
#> # A tibble: 20 × 45
#>    case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#>    <chr>       <date>                        <chr>              <int>      <int>
#>  1 A1          2018-04-28                    Village B             59         NA
#>  2 A2          2018-04-26                    Village A             58         NA
#>  3 A3          2018-01-24                    Village B             16         NA
#>  4 A4          2018-01-07                    Village A             46         NA
#>  5 A5          2018-01-29                    Village A              9         NA
#>  6 A6          2018-01-20                    Village D             10         NA
#>  7 A7          2018-04-18                    Village C              9         NA
#>  8 A8          2018-01-18                    Village C             28         NA
#>  9 A9          2018-01-10                    Village C             51         NA
#> 10 A10         2018-01-01                    Village B             16         NA
#> 11 A11         2018-04-09                    Village C             21         NA
#> 12 A12         2018-04-26                    Village A              9         NA
#> 13 A13         2018-02-28                    Village C             40         NA
#> 14 A14         2018-01-03                    Village B             34         NA
#> 15 A15         2018-04-30                    Village A             62         NA
#> 16 A16         2018-03-08                    Village C             43         NA
#> 17 A17         2018-02-28                    Village B             38         NA
#> 18 A18         2018-03-13                    Village A             70         NA
#> 19 A19         2018-04-22                    Village D             17         NA
#> 20 A20         2018-03-14                    Village B             37         NA
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> #   trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …
#> > dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
#> > print(dict)
#> # A tibble: 182 × 11
#>    data_element_uid data_element_name                     data_element_shortname
#>    <chr>            <chr>                                 <chr>                 
#>  1 AafTlSwliVQ      egen_001_patient_case_number          case_number           
#>  2 OTGOtWBz39J      egen_004_date_of_consultation_admiss… date_of_consultation_…
#>  3 wnmMr2V3T3u      egen_006_patient_origin               patient_origin        
#>  4 sbgqjeVwtb8      egen_008_age_years                    age_years             
#>  5 eXYhovYyl61      egen_009_age_months                   age_months            
#>  6 UrYJSk2Wp46      egen_010_age_days                     age_days              
#>  7 D1Ky5K7pFN6      egen_011_sex                          sex                   
#>  8 D1Ky5K7pFN6      egen_011_sex                          sex                   
#>  9 D1Ky5K7pFN6      egen_011_sex                          sex                   
#> 10 dTm5R53YYXC      egen_012_pregnancy_status             pregnant              
#> # ℹ 172 more rows
#> # ℹ 8 more variables: data_element_description <chr>,
#> #   data_element_valuetype <chr>, data_element_formname <chr>,
#> #   used_optionset_uid <chr>, option_code <chr>, option_name <chr>,
#> #   option_uid <chr>, option_order_in_set <dbl>
#> > dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name", 
#> +     by = "data_element_shortname", order = "option_order_in_set")
#> > print(dat_clean)
#> # A tibble: 20 × 45
#>    case_number date_of_consultation_admiss…¹ patient_origin age_years age_months
#>    <chr>       <date>                        <chr>              <int>      <int>
#>  1 A1          2018-04-28                    Village B             59         NA
#>  2 A2          2018-04-26                    Village A             58         NA
#>  3 A3          2018-01-24                    Village B             16         NA
#>  4 A4          2018-01-07                    Village A             46         NA
#>  5 A5          2018-01-29                    Village A              9         NA
#>  6 A6          2018-01-20                    Village D             10         NA
#>  7 A7          2018-04-18                    Village C              9         NA
#>  8 A8          2018-01-18                    Village C             28         NA
#>  9 A9          2018-01-10                    Village C             51         NA
#> 10 A10         2018-01-01                    Village B             16         NA
#> 11 A11         2018-04-09                    Village C             21         NA
#> 12 A12         2018-04-26                    Village A              9         NA
#> 13 A13         2018-02-28                    Village C             40         NA
#> 14 A14         2018-01-03                    Village B             34         NA
#> 15 A15         2018-04-30                    Village A             62         NA
#> 16 A16         2018-03-08                    Village C             43         NA
#> 17 A17         2018-02-28                    Village B             38         NA
#> 18 A18         2018-03-13                    Village A             70         NA
#> 19 A19         2018-04-22                    Village D             17         NA
#> 20 A20         2018-03-14                    Village B             37         NA
#> # ℹ abbreviated name: ¹date_of_consultation_admission
#> # ℹ 40 more variables: age_days <int>, sex <fct>, pregnant <fct>,
#> #   trimester <fct>, foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>, …

MSF data dictionaries and dummy datasets

Usage

Arguments

See also

Examples