Skip to contents

These function produces MSF OCA dictionaries based on DHIS2 (for outbreaks) and Kobo (for surveys) data sets defining the data element name, code, short names, types, and key/value pairs for translating the codes into human-readable format.

Usage

msf_dict(
  disease,
  name = "MSF-outbreak-dict.xlsx",
  tibble = TRUE,
  compact = TRUE,
  long = TRUE
)

msf_dict_survey(
  disease,
  name = "MSF-survey-dict.xlsx",
  tibble = TRUE,
  compact = TRUE,
  long = TRUE,
  template = TRUE
)

Arguments

disease

Specify which disease you would like to use.

  • msf_dict() supports "AJS", "Cholera", "Measles", "Meningitis"

  • msf_dict_survey() supports "Mortality", "Nutrition", "Vaccination_long" and "Vaccination_short" (only used in surveys if template = TRUE)

name

the name of the dictionary stored in the package.

  • msf_dict_survey() supports Kobo dictionaries not stored within this package, to use these: specify nameas path to .xlsx file and set the template = False

tibble

Return data dictionary as a tidyverse tibble (default is TRUE)

compact

if TRUE (default), then a nested data frame is returned where each row represents a single variable and a nested data frame column called "options", which can be expanded with tidyr::unnest(). This only works if long = TRUE.

long

If TRUE (default), the returned data dictionary is in long format with each option getting one row. If FALSE, then two data frames are returned, one with variables and the other with content options.

@param template Only used for msf_dict_survey(). If TRUE (default) the returned data dictionary is a generic MSF OCA ERB pre-approved dictionary. If FALSE allows you to read in your own Kobo dictionary by defining a path in name.

template

(for survey dictionaries): if TRUE read in a generic dictionary based on the MSF OCA ERB pre-approved template. However you can also specify your own dictionary if this differs substantially, by setting template = FALSE and defining a path in name.

See also

Examples


if (require("dplyr") & require("matchmaker")) {
  withAutoprint({
    # You will often want to use MSF dictionaries to translate codes to human-
    # readable variables. Here, we generate a data set of 20 cases:
    dat <- gen_data(
      dictionary = "Cholera",
      varnames = "data_element_shortname",
      numcases = 20,
      org = "MSF"
    )
    print(dat)

    # We want the expanded dictionary, so we will select `compact = FALSE`
    dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
    print(dict)

    # Now we can use matchmaker to filter the data:
    dat_clean <- matchmaker::match_df(dat, dict,
      from = "option_code",
      to = "option_name",
      by = "data_element_shortname",
      order = "option_order_in_set"
    )
    print(dat_clean)
  })
}
#> > dat <- gen_data(dictionary = "Cholera", varnames = "data_element_shortname", 
#> +     numcases = 20, org = "MSF")
#> > print(dat)
#> # A tibble: 20 × 45
#>    case_number date_of_c…¹ patie…² age_y…³ age_m…⁴ age_d…⁵ sex   pregn…⁶ trime…⁷
#>    <chr>       <date>      <chr>     <int>   <int>   <int> <fct> <fct>   <fct>  
#>  1 A1          2018-04-02  Villag…      82      NA      NA F     W       NA     
#>  2 A2          2018-01-17  Villag…      10      NA      NA U     NA      NA     
#>  3 A3          2018-01-16  Villag…      23      NA      NA M     NA      NA     
#>  4 A4          2018-03-02  Villag…      63      NA      NA M     NA      NA     
#>  5 A5          2018-04-27  Villag…      40      NA      NA M     NA      NA     
#>  6 A6          2018-01-31  Villag…      18      NA      NA M     NA      NA     
#>  7 A7          2018-02-25  Villag…      37      NA      NA F     Y       1      
#>  8 A8          2018-02-05  Villag…      47      NA      NA F     NA      NA     
#>  9 A9          2018-03-20  Villag…      15      NA      NA M     NA      NA     
#> 10 A10         2018-01-06  Villag…      35      NA      NA F     NA      NA     
#> 11 A11         2018-04-08  Villag…      62      NA      NA F     W       NA     
#> 12 A12         2018-01-08  Villag…      56      NA      NA U     NA      NA     
#> 13 A13         2018-01-08  Villag…      26      NA      NA F     NA      NA     
#> 14 A14         2018-03-13  Villag…      16      NA      NA M     NA      NA     
#> 15 A15         2018-01-26  Villag…      61      NA      NA M     NA      NA     
#> 16 A16         2018-02-14  Villag…      44      NA      NA M     NA      NA     
#> 17 A17         2018-01-02  Villag…      59      NA      NA U     NA      NA     
#> 18 A18         2018-02-08  Villag…      74      NA      NA U     NA      NA     
#> 19 A19         2018-02-18  Villag…       7      NA      NA F     NA      NA     
#> 20 A20         2018-03-02  Villag…      42      NA      NA F     N       NA     
#> # … with 36 more variables: foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>,
#> #   date_of_last_vaccination <date>, prescribed_zinc_supplement <fct>,
#> #   prescribed_antibiotics <fct>, ors_consumed_litres <int>, …
#> > dict <- msf_dict(disease = "Cholera", long = TRUE, compact = FALSE, tibble = TRUE)
#> > print(dict)
#> # A tibble: 182 × 11
#>    data_elemen…¹ data_…² data_…³ data_…⁴ data_…⁵ data_…⁶ used_…⁷ optio…⁸ optio…⁹
#>    <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
#>  1 AafTlSwliVQ   egen_0… case_n… Anonym… TEXT    Case n… NA      NA      NA     
#>  2 OTGOtWBz39J   egen_0… date_o… Date p… DATE    Date o… NA      NA      NA     
#>  3 wnmMr2V3T3u   egen_0… patien… Locati… ORGANI… Patien… NA      NA      NA     
#>  4 sbgqjeVwtb8   egen_0… age_ye… Age of… INTEGE… Age in… NA      NA      NA     
#>  5 eXYhovYyl61   egen_0… age_mo… Age of… INTEGE… Age in… NA      NA      NA     
#>  6 UrYJSk2Wp46   egen_0… age_da… Age of… INTEGE… Age in… NA      NA      NA     
#>  7 D1Ky5K7pFN6   egen_0… sex     Sex of… TEXT    Sex     orgc5Y… M       Male   
#>  8 D1Ky5K7pFN6   egen_0… sex     Sex of… TEXT    Sex     orgc5Y… F       Female 
#>  9 D1Ky5K7pFN6   egen_0… sex     Sex of… TEXT    Sex     orgc5Y… U       Unknow…
#> 10 dTm5R53YYXC   egen_0… pregna… Pregna… TEXT    Pregna… IEjzG2… N       Not cu…
#> # … with 172 more rows, 2 more variables: option_uid <chr>,
#> #   option_order_in_set <dbl>, and abbreviated variable names
#> #   ¹​data_element_uid, ²​data_element_name, ³​data_element_shortname,
#> #   ⁴​data_element_description, ⁵​data_element_valuetype, ⁶​data_element_formname,
#> #   ⁷​used_optionset_uid, ⁸​option_code, ⁹​option_name
#> > dat_clean <- matchmaker::match_df(dat, dict, from = "option_code", to = "option_name", 
#> +     by = "data_element_shortname", order = "option_order_in_set")
#> > print(dat_clean)
#> # A tibble: 20 × 45
#>    case_number date_of_c…¹ patie…² age_y…³ age_m…⁴ age_d…⁵ sex   pregn…⁶ trime…⁷
#>    <chr>       <date>      <chr>     <int>   <int>   <int> <fct> <fct>   <fct>  
#>  1 A1          2018-04-02  Villag…      82      NA      NA Fema… Was pr… NA     
#>  2 A2          2018-01-17  Villag…      10      NA      NA Unkn… Not ap… NA     
#>  3 A3          2018-01-16  Villag…      23      NA      NA Male  Not ap… NA     
#>  4 A4          2018-03-02  Villag…      63      NA      NA Male  Not ap… NA     
#>  5 A5          2018-04-27  Villag…      40      NA      NA Male  Not ap… NA     
#>  6 A6          2018-01-31  Villag…      18      NA      NA Male  Not ap… NA     
#>  7 A7          2018-02-25  Villag…      37      NA      NA Fema… Yes, c… 1st tr…
#>  8 A8          2018-02-05  Villag…      47      NA      NA Fema… Not ap… NA     
#>  9 A9          2018-03-20  Villag…      15      NA      NA Male  Not ap… NA     
#> 10 A10         2018-01-06  Villag…      35      NA      NA Fema… Not ap… NA     
#> 11 A11         2018-04-08  Villag…      62      NA      NA Fema… Was pr… NA     
#> 12 A12         2018-01-08  Villag…      56      NA      NA Unkn… Not ap… NA     
#> 13 A13         2018-01-08  Villag…      26      NA      NA Fema… Not ap… NA     
#> 14 A14         2018-03-13  Villag…      16      NA      NA Male  Not ap… NA     
#> 15 A15         2018-01-26  Villag…      61      NA      NA Male  Not ap… NA     
#> 16 A16         2018-02-14  Villag…      44      NA      NA Male  Not ap… NA     
#> 17 A17         2018-01-02  Villag…      59      NA      NA Unkn… Not ap… NA     
#> 18 A18         2018-02-08  Villag…      74      NA      NA Unkn… Not ap… NA     
#> 19 A19         2018-02-18  Villag…       7      NA      NA Fema… Not ap… NA     
#> 20 A20         2018-03-02  Villag…      42      NA      NA Fema… Not cu… NA     
#> # … with 36 more variables: foetus_alive_at_admission <fct>, exit_status <fct>,
#> #   date_of_exit <date>, time_to_death <fct>, pregnancy_outcome_at_exit <fct>,
#> #   previously_vaccinated <fct>, previous_vaccine_doses_received <fct>,
#> #   readmission <fct>, msf_involvement <fct>,
#> #   cholera_treatment_facility_type <fct>, residential_status_brief <fct>,
#> #   date_of_last_vaccination <date>, prescribed_zinc_supplement <fct>,
#> #   prescribed_antibiotics <fct>, ors_consumed_litres <int>, …