R/add_weights_cluster.R
add_weights_cluster.Rd
For use in surveys where you took a sample population out of a larger source population, with a cluster survey design.
add_weights_cluster(
x,
cl,
eligible,
interviewed,
cluster_x = NULL,
cluster_cl = NULL,
household_x = NULL,
household_cl = NULL,
ignore_cluster = TRUE,
ignore_household = TRUE,
surv_weight = "surv_weight",
surv_weight_ID = "surv_weight_ID"
)
a data frame of survey data
a data frame containing a list of clusters and the number of households in each.
the column in x
which specifies the number of people
eligible for being interviewed in that household. (e.g. the total number of
children)
the column in x
which specifies the number of people
actually interviewed in that household.
the column in x
that indicates which cluster rows belong
to. Ignored if ignore_cluster
is TRUE.
the column in cl
that lists all possible clusters.
Ignored if ignore_cluster
is TRUE.
the column in x
that indicates a unique household
identifier. Ignored if ignore_household
is TRUE.
the column in cl
that lists the number of households
per cluster. Ignored if ignore_household
is TRUE.
If TRUE (default), set the weight for clusters to be 1.
This assumes that your sample was taken in a way which is a close
approximation of a simple random sample. Ignores inputs from cluster_cl
as well as cluster_x
.
If TRUE (default), set the weight for households to
be 1. This assumes that your sample of households was takenin a way which
is a close approximation of a simple random sample. Ignores inputs from
household_cl
and household_x
.
the name of the new column to store the weights. Defaults to "surv_weight".
the name of the new ID column to be created. Defaults to "surv_weight_ID"
Will multiply the inverse chances of a cluster being selected, a household being selected within a cluster, and an individual being selected within a household.
As follows:
((clusters available) / (clusters surveyed)) *
((households in each cluster) / (households surveyed in each cluster)) *
((individuals eligible in each household) / (individuals interviewed))
In the case where both ignore_cluster and ignore_household are TRUE, this will simply be:
# define a fake dataset of survey data
# including household and individual information
x <- data.frame(stringsAsFactors=FALSE,
cluster = c("Village A", "Village A", "Village A", "Village A",
"Village A", "Village B", "Village B", "Village B"),
household_id = c(1, 1, 1, 1, 2, 2, 2, 2),
eligible_n = c(6, 6, 6, 6, 6, 3, 3, 3),
surveyed_n = c(4, 4, 4, 4, 4, 3, 3, 3),
individual_id = c(1, 2, 3, 4, 4, 1, 2, 3),
age_grp = c("0-10", "20-30", "30-40", "50-60", "50-60", "20-30",
"50-60", "30-40"),
sex = c("Male", "Female", "Male", "Female", "Female", "Male",
"Female", "Female"),
outcome = c("Y", "Y", "N", "N", "N", "N", "N", "Y")
)
# define a fake dataset of cluster listings
# including cluster names and number of households
cl <- tibble::tribble(
~cluster, ~n_houses,
"Village A", 23,
"Village B", 42,
"Village C", 56,
"Village D", 38
)
# add weights to a cluster sample
# include weights for cluster, household and individual levels
add_weights_cluster(x, cl = cl,
eligible = eligible_n,
interviewed = surveyed_n,
cluster_cl = cluster, household_cl = n_houses,
cluster_x = cluster, household_x = household_id,
ignore_cluster = FALSE, ignore_household = FALSE)
#> cluster household_id eligible_n surveyed_n individual_id age_grp sex
#> 1 Village A 1 6 4 1 0-10 Male
#> 2 Village A 1 6 4 2 20-30 Female
#> 3 Village A 1 6 4 3 30-40 Male
#> 4 Village A 1 6 4 4 50-60 Female
#> 5 Village A 2 6 4 4 50-60 Female
#> 6 Village B 2 3 3 1 20-30 Male
#> 7 Village B 2 3 3 2 50-60 Female
#> 8 Village B 2 3 3 3 30-40 Female
#> outcome surv_weight surv_weight_ID
#> 1 Y 34.5 Village A_1
#> 2 Y 34.5 Village A_1
#> 3 N 34.5 Village A_1
#> 4 N 34.5 Village A_1
#> 5 N 34.5 Village A_2
#> 6 N 84.0 Village B_2
#> 7 N 84.0 Village B_2
#> 8 Y 84.0 Village B_2
# add weights to a cluster sample
# ignore weights for cluster and household level (set equal to 1)
# only include weights at individual level
add_weights_cluster(x, cl = cl,
eligible = eligible_n,
interviewed = surveyed_n,
cluster_cl = cluster, household_cl = n_houses,
cluster_x = cluster, household_x = household_id,
ignore_cluster = TRUE, ignore_household = TRUE)
#> cluster household_id eligible_n surveyed_n individual_id age_grp sex
#> 1 Village A 1 6 4 1 0-10 Male
#> 2 Village A 1 6 4 2 20-30 Female
#> 3 Village A 1 6 4 3 30-40 Male
#> 4 Village A 1 6 4 4 50-60 Female
#> 5 Village A 2 6 4 4 50-60 Female
#> 6 Village B 2 3 3 1 20-30 Male
#> 7 Village B 2 3 3 2 50-60 Female
#> 8 Village B 2 3 3 3 30-40 Female
#> outcome surv_weight surv_weight_ID
#> 1 Y 1.5 Village A_1
#> 2 Y 1.5 Village A_1
#> 3 N 1.5 Village A_1
#> 4 N 1.5 Village A_1
#> 5 N 1.5 Village A_2
#> 6 N 1.0 Village B_2
#> 7 N 1.0 Village B_2
#> 8 Y 1.0 Village B_2