Title: Handling Incomplete Responses in Survey Data Analysis
Version: 2.2.0
Description: Offers robust tools to identify and manage incomplete responses in survey datasets, thereby enhancing the quality and reliability of research findings.
License: MIT + file LICENSE
Encoding: UTF-8
RoxygenNote: 7.2.3
URL: https://github.com/hendr1km/dropout, https://hendr1km.github.io/dropout/
BugReports: https://github.com/hendr1km/dropout/issues
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0),
Config/testthat/edition: 3
Depends: R (≥ 3.5),
LazyData: true
VignetteBuilder: knitr
NeedsCompilation: yes
Packaged: 2024-10-15 20:19:07 UTC; hendrikm
Author: Hendrik Mann [aut, cre]
Maintainer: Hendrik Mann <hendrik.mann@uni-wuppertal.de>
Repository: CRAN
Date/Publication: 2024-10-16 00:30:02 UTC

dropout: Handling Incomplete Responses in Survey Data Analysis

Description

logo

Offers robust tools to identify and manage incomplete responses in survey datasets, thereby enhancing the quality and reliability of research findings.

Author(s)

Maintainer: Hendrik Mann hendrik.mann@uni-wuppertal.de

See Also

Useful links:


Prepare data for C API: list of row-vectors with NA indicators

Description

This function prepares the data to be used by a C API by transforming each row of the dataframe into a vector of 0s and 1s, where 0 indicates a non-NA value and 1 indicates an NA value.

Usage

c_prepare(data)

Arguments

data

A dataframe to be prepared for the C API.

Value

A list of row-vectors, where each vector contains 0s (non-NA) and 1s (NA).


Check that the input is a valid dataframe

Description

This function checks if the input is a valid dataframe with at least two rows and two columns. It throws an error if the input does not meet these conditions.

Usage

check_input(data)

Arguments

data

The input to be checked.

Value

NULL. The function stops with an error message if the input is not a valid dataframe.


Detect Instances of Dropout in Data

Description

The drop_detect function detects participants who drop out of the survey by recognizing NA sequences up to the last question of the survey. Additionally, the function provides the column name and index where the dropout occurs.

Usage

drop_detect(data)

Arguments

data

A dataframe in which to detect instances of dropout.

Value

A dataframe containing the following columns:

Examples

## Not run: 
# Example usage with the 'flying' dataframe
detect_result <- drop_detect(flying)
print(detect_result)

## End(Not run)


Summarize Missing Data Metrics for Each Column

Description

The drop_summary function generates a summary of missing data (NA values) for each column in a dataframe. It computes various metrics such as the number of dropout participants, section NAs, the mode length of those missing value sections for, and the proportion of complete cases for each column.

Usage

drop_summary(data)

Arguments

data

A dataframe for which to analyze missing data.

Details

The function calls a C API to compute some metrics, which are then processed and returned as a summary dataframe.

Value

A dataframe containing the following columns:

Examples

## Not run: 
# Example usage with the 'flying' dataframe
summary_result <- drop_summary(flying)
print(summary_result)

## End(Not run)


Flying Etiquette Survey Data

Description

This is a modified version of the Flying Etiquette Survey data behind the story: 41 percent of flyers say it's rude to recline your seat on an airplane.

Usage

flying

Format

a dataframe 1040 obs. and 28 columns, which are:

respondent_id

respondentid

travel_frequency

how often do you travel by plane?

seat_recline

do you ever recline your seat when you fly?

height

how tall are you?

children_under_18

do you have any children under 18?

two_armrests

in a row of three seats, who should get to use the two arm rests?

middle_armrest

in a row of two seats, who should get to use the middle arm rest?

window_shade

who should have control over the window shade?

moving_to_unsold_seat

is it rude to move to an unsold seat on a plane?

talking_to_seatmate

generally speaking, is it rude to say more than a few words to the stranger sitting next to you on a plane?

getting_up_on_6_hour_flight

on a 6 hour flight from nyc to la, how many times is it acceptable to get up if you're not in an aisle seat?

obligation_to_reclined_seat

is it rude to recline your seat on a plane?

eliminate_reclining_seats

given the opportunity, would you eliminate the possibility of reclining seats on planes entirely?

switch_for_friends

is it rude to ask someone to switch seats with you in order to be closer to friends?

switch_for_family

is it rude to ask someone to switch seats with you in order to be closer to family?

wake_passenger_bathroom

is it rude to wake a passenger up if you are trying to go to the bathroom?

wake_passenger_walk

is it rude to wake a passenger up if you are trying to walk around?

baby_on_plane

in general, is it rude to bring a baby on a plane?

unruly_children

in general, is it rude to knowingly bring unruly children on a plane?

electronics_violation

have you ever used personal electronics during take off or landing in violation of a flight attendant's direction?

smoking_violation

have you ever smoked a cigarette in an airplane bathroom when it was against the rules?

gender

gender

age

age

household_income

household income

education

education

location_census_region

location (census region)

survey_type

type of the survey

Source

https://github.com/fivethirtyeight/data/tree/15f210532b2a642e85738ddefa7a2945d47e2585/flying-etiquette-survey

https://fivethirtyeight.com/features/airplane-etiquette-recline-seat/

Examples

data(flying)

Compute column name for dropped observations

Description

This function returns the column names corresponding to the dropped values in the dataset.

Usage

metric_column(c_output, drop_index, data)

Arguments

c_output

A list containing output data to detect dropped columns.

drop_index

A vector indicating the index of the dropped column for each row.

data

The original dataset.

Value

A vector of column names corresponding to dropped values.


Compute completeness of each column

Description

This function calculates the completeness of each column in the dataset by subtracting the NA count from the total number of rows.

Usage

metric_complete(na, data)

Arguments

na

A vector of NA counts for each column.

data

The original dataset.

Value

A vector of completeness values (proportions) for each column.


Compute drop metric for each column

Description

This function computes the number of drops for each column based on the c_output data.

Usage

metric_drop(c_output)

Arguments

c_output

A list containing output data to compute the drop metric.

Value

A vector of drop values.


Compute drop indicator for each row

Description

This function computes a logical vector indicating whether each row in the dataset has been dropped.

Usage

metric_drop_id(c_output)

Arguments

c_output

A list containing output data to compute the drop indicator.

Value

A logical vector indicating dropped rows (TRUE = dropped).


Compute drop index for each row

Description

This function returns the index of the dropped column for each row, or NA if no column was dropped.

Usage

metric_drop_index(c_output)

Arguments

c_output

A list containing output data to compute the drop index.

Value

A vector of drop indices (or NA if no drop occurred).


Compute total NA count for each column

Description

This function calculates the total NA count for each column by summing the drop, section NA, and single NA metrics.

Usage

metric_na(drop, sec_na, single_na)

Arguments

drop

A vector of drop counts for each column.

sec_na

A vector of section NA counts for each column.

single_na

A vector of single NA counts for each column.

Value

A vector of total NA counts.


Compute section length metric for each column

Description

This function calculates the mode (most common value) of the left-out section length for each column.

Usage

metric_sec_length(c_output)

Arguments

c_output

A list containing output data to compute the section length metric.

Value

A vector of section length values.


Compute section NA metric for each column

Description

This function calculates the total number of section NAs for each column in the dataset.

Usage

metric_sec_na(c_output)

Arguments

c_output

A list containing output data to compute the section NA metric.

Value

A vector of section NA values.


Compute single NA metric for each column

Description

This function calculates the total number of single NAs for each column.

Usage

metric_single_na(c_output)

Arguments

c_output

A list containing output data to compute the single NA metric.

Value

A vector of single NA values.


Detect dropped observations and columns

Description

This function identifies rows and columns with dropped values in the dataset based on the c_output.

Usage

metrics_detect(data, c_output)

Arguments

data

The original dataset.

c_output

A list containing output data to detect dropped values.

Value

A dataframe indicating the dropped rows and columns.


Generate a summary of metrics for the dataset

Description

This function calculates a variety of metrics on the provided data and a corresponding output (c_output). These metrics include drop rate, section NA, section length, single NA, and the number of complete rows.

Usage

metrics_summary(data, c_output)

Arguments

data

A dataframe containing the dataset.

c_output

A list or dataframe containing the computed output that helps generate the metrics.

Value

A dataframe summarizing the computed metrics for each column.