Title: | Handling Incomplete Responses in Survey Data Analysis |
Version: | 2.2.0 |
Description: | Offers robust tools to identify and manage incomplete responses in survey datasets, thereby enhancing the quality and reliability of research findings. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.2.3 |
URL: | https://github.com/hendr1km/dropout, https://hendr1km.github.io/dropout/ |
BugReports: | https://github.com/hendr1km/dropout/issues |
Suggests: | knitr, rmarkdown, testthat (≥ 3.0.0), |
Config/testthat/edition: | 3 |
Depends: | R (≥ 3.5), |
LazyData: | true |
VignetteBuilder: | knitr |
NeedsCompilation: | yes |
Packaged: | 2024-10-15 20:19:07 UTC; hendrikm |
Author: | Hendrik Mann [aut, cre] |
Maintainer: | Hendrik Mann <hendrik.mann@uni-wuppertal.de> |
Repository: | CRAN |
Date/Publication: | 2024-10-16 00:30:02 UTC |
dropout: Handling Incomplete Responses in Survey Data Analysis
Description
Offers robust tools to identify and manage incomplete responses in survey datasets, thereby enhancing the quality and reliability of research findings.
Author(s)
Maintainer: Hendrik Mann hendrik.mann@uni-wuppertal.de
See Also
Useful links:
Report bugs at https://github.com/hendr1km/dropout/issues
Prepare data for C API: list of row-vectors with NA indicators
Description
This function prepares the data to be used by a C API by transforming each row of the dataframe into a vector of 0s and 1s, where 0 indicates a non-NA value and 1 indicates an NA value.
Usage
c_prepare(data)
Arguments
data |
A dataframe to be prepared for the C API. |
Value
A list of row-vectors, where each vector contains 0s (non-NA) and 1s (NA).
Check that the input is a valid dataframe
Description
This function checks if the input is a valid dataframe with at least two rows and two columns. It throws an error if the input does not meet these conditions.
Usage
check_input(data)
Arguments
data |
The input to be checked. |
Value
NULL. The function stops with an error message if the input is not a valid dataframe.
Detect Instances of Dropout in Data
Description
The drop_detect
function detects participants who drop out of the survey by recognizing NA sequences up to the last question of the survey. Additionally, the function provides the column name and index where the dropout occurs.
Usage
drop_detect(data)
Arguments
data |
A dataframe in which to detect instances of dropout. |
Value
A dataframe containing the following columns:
-
drop: A logical value indicating whether dropout has occurred (
TRUE
for dropout,FALSE
otherwise). -
drop_index: The index of the column where dropout occurred (
NA
if no dropout). -
column: The name of the column where the dropout occurred (
<NA>
if no dropout).
Examples
## Not run:
# Example usage with the 'flying' dataframe
detect_result <- drop_detect(flying)
print(detect_result)
## End(Not run)
Summarize Missing Data Metrics for Each Column
Description
The drop_summary
function generates a summary of missing data (NA values) for each column in a dataframe.
It computes various metrics such as the number of dropout participants, section NAs, the mode length of those missing value sections for, and the proportion of complete cases for each column.
Usage
drop_summary(data)
Arguments
data |
A dataframe for which to analyze missing data. |
Details
The function calls a C API to compute some metrics, which are then processed and returned as a summary dataframe.
Value
A dataframe containing the following columns:
-
column: The name of each column in the input dataframe.
-
drop: The number of dropped rows (missing values) for that column.
-
sec_na: The number of sections of consecutive NAs for that column.
-
sec_length: The mode (most frequent length) of sections of consecutive NAs for that column.
-
single_na: The number of single NA values (isolated missing values) for that column.
-
na: The total number of missing (NA) values for that column.
-
complete: The proportion of complete rows for that column, where a value of 1 means no missing data, and values closer to 0 mean more missing data.
Examples
## Not run:
# Example usage with the 'flying' dataframe
summary_result <- drop_summary(flying)
print(summary_result)
## End(Not run)
Flying Etiquette Survey Data
Description
This is a modified version of the Flying Etiquette Survey data behind the story: 41 percent of flyers say it's rude to recline your seat on an airplane.
Usage
flying
Format
a dataframe 1040 obs. and 28 columns, which are:
- respondent_id
respondentid
- travel_frequency
how often do you travel by plane?
- seat_recline
do you ever recline your seat when you fly?
- height
how tall are you?
- children_under_18
do you have any children under 18?
- two_armrests
in a row of three seats, who should get to use the two arm rests?
- middle_armrest
in a row of two seats, who should get to use the middle arm rest?
- window_shade
who should have control over the window shade?
- moving_to_unsold_seat
is it rude to move to an unsold seat on a plane?
- talking_to_seatmate
generally speaking, is it rude to say more than a few words to the stranger sitting next to you on a plane?
- getting_up_on_6_hour_flight
on a 6 hour flight from nyc to la, how many times is it acceptable to get up if you're not in an aisle seat?
- obligation_to_reclined_seat
is it rude to recline your seat on a plane?
- eliminate_reclining_seats
given the opportunity, would you eliminate the possibility of reclining seats on planes entirely?
- switch_for_friends
is it rude to ask someone to switch seats with you in order to be closer to friends?
- switch_for_family
is it rude to ask someone to switch seats with you in order to be closer to family?
- wake_passenger_bathroom
is it rude to wake a passenger up if you are trying to go to the bathroom?
- wake_passenger_walk
is it rude to wake a passenger up if you are trying to walk around?
- baby_on_plane
in general, is it rude to bring a baby on a plane?
- unruly_children
in general, is it rude to knowingly bring unruly children on a plane?
- electronics_violation
have you ever used personal electronics during take off or landing in violation of a flight attendant's direction?
- smoking_violation
have you ever smoked a cigarette in an airplane bathroom when it was against the rules?
- gender
gender
- age
age
- household_income
household income
- education
education
- location_census_region
location (census region)
- survey_type
type of the survey
Source
https://github.com/fivethirtyeight/data/tree/15f210532b2a642e85738ddefa7a2945d47e2585/flying-etiquette-survey
https://fivethirtyeight.com/features/airplane-etiquette-recline-seat/
Examples
data(flying)
Compute column name for dropped observations
Description
This function returns the column names corresponding to the dropped values in the dataset.
Usage
metric_column(c_output, drop_index, data)
Arguments
c_output |
A list containing output data to detect dropped columns. |
drop_index |
A vector indicating the index of the dropped column for each row. |
data |
The original dataset. |
Value
A vector of column names corresponding to dropped values.
Compute completeness of each column
Description
This function calculates the completeness of each column in the dataset by subtracting the NA count from the total number of rows.
Usage
metric_complete(na, data)
Arguments
na |
A vector of NA counts for each column. |
data |
The original dataset. |
Value
A vector of completeness values (proportions) for each column.
Compute drop metric for each column
Description
This function computes the number of drops for each column based on the c_output
data.
Usage
metric_drop(c_output)
Arguments
c_output |
A list containing output data to compute the drop metric. |
Value
A vector of drop values.
Compute drop indicator for each row
Description
This function computes a logical vector indicating whether each row in the dataset has been dropped.
Usage
metric_drop_id(c_output)
Arguments
c_output |
A list containing output data to compute the drop indicator. |
Value
A logical vector indicating dropped rows (TRUE = dropped).
Compute drop index for each row
Description
This function returns the index of the dropped column for each row, or NA
if no column was dropped.
Usage
metric_drop_index(c_output)
Arguments
c_output |
A list containing output data to compute the drop index. |
Value
A vector of drop indices (or NA if no drop occurred).
Compute total NA count for each column
Description
This function calculates the total NA count for each column by summing the drop, section NA, and single NA metrics.
Usage
metric_na(drop, sec_na, single_na)
Arguments
drop |
A vector of drop counts for each column. |
sec_na |
A vector of section NA counts for each column. |
single_na |
A vector of single NA counts for each column. |
Value
A vector of total NA counts.
Compute section length metric for each column
Description
This function calculates the mode (most common value) of the left-out section length for each column.
Usage
metric_sec_length(c_output)
Arguments
c_output |
A list containing output data to compute the section length metric. |
Value
A vector of section length values.
Compute section NA metric for each column
Description
This function calculates the total number of section NAs for each column in the dataset.
Usage
metric_sec_na(c_output)
Arguments
c_output |
A list containing output data to compute the section NA metric. |
Value
A vector of section NA values.
Compute single NA metric for each column
Description
This function calculates the total number of single NAs for each column.
Usage
metric_single_na(c_output)
Arguments
c_output |
A list containing output data to compute the single NA metric. |
Value
A vector of single NA values.
Detect dropped observations and columns
Description
This function identifies rows and columns with dropped values in the dataset based on the c_output
.
Usage
metrics_detect(data, c_output)
Arguments
data |
The original dataset. |
c_output |
A list containing output data to detect dropped values. |
Value
A dataframe indicating the dropped rows and columns.
Generate a summary of metrics for the dataset
Description
This function calculates a variety of metrics on the provided data and a corresponding output (c_output
).
These metrics include drop rate, section NA, section length, single NA, and the number of complete rows.
Usage
metrics_summary(data, c_output)
Arguments
data |
A dataframe containing the dataset. |
c_output |
A list or dataframe containing the computed output that helps generate the metrics. |
Value
A dataframe summarizing the computed metrics for each column.