Help for package dropout

Title:

Handling Incomplete Responses in Survey Data Analysis

Version:

2.2.0

Description:

Offers robust tools to identify and manage incomplete responses in survey datasets, thereby enhancing the quality and reliability of research findings.

License:

MIT + file LICENSE

Encoding:

UTF-8

RoxygenNote:

7.2.3

URL:

https://github.com/hendr1km/dropout, https://hendr1km.github.io/dropout/

BugReports:

https://github.com/hendr1km/dropout/issues

Suggests:

knitr, rmarkdown, testthat (≥ 3.0.0),

Config/testthat/edition:

Depends:

R (≥ 3.5),

LazyData:

true

VignetteBuilder:

knitr

NeedsCompilation:

yes

Packaged:

2024-10-15 20:19:07 UTC; hendrikm

Author:

Hendrik Mann [aut, cre]

Maintainer:

Hendrik Mann <hendrik.mann@uni-wuppertal.de>

Repository:

CRAN

Date/Publication:

2024-10-16 00:30:02 UTC

dropout: Handling Incomplete Responses in Survey Data Analysis

Description

Offers robust tools to identify and manage incomplete responses in survey datasets, thereby enhancing the quality and reliability of research findings.

Author(s)

Maintainer: Hendrik Mann hendrik.mann@uni-wuppertal.de

Prepare data for C API: list of row-vectors with NA indicators

Description

This function prepares the data to be used by a C API by transforming each row of the dataframe into a vector of 0s and 1s, where 0 indicates a non-NA value and 1 indicates an NA value.

Usage

c_prepare(data)

Arguments

data

A dataframe to be prepared for the C API.

Value

A list of row-vectors, where each vector contains 0s (non-NA) and 1s (NA).

Check that the input is a valid dataframe

Description

This function checks if the input is a valid dataframe with at least two rows and two columns. It throws an error if the input does not meet these conditions.

Usage

check_input(data)

Arguments

data

The input to be checked.

Value

NULL. The function stops with an error message if the input is not a valid dataframe.

Detect Instances of Dropout in Data

Description

The drop_detect function detects participants who drop out of the survey by recognizing NA sequences up to the last question of the survey. Additionally, the function provides the column name and index where the dropout occurs.

Usage

drop_detect(data)

Arguments

data

A dataframe in which to detect instances of dropout.

Value

A dataframe containing the following columns:

drop: A logical value indicating whether dropout has occurred (TRUE for dropout, FALSE otherwise).
drop_index: The index of the column where dropout occurred (NA if no dropout).
column: The name of the column where the dropout occurred (⁠<NA>⁠ if no dropout).

Examples

## Not run: 
# Example usage with the 'flying' dataframe
detect_result <- drop_detect(flying)
print(detect_result)

## End(Not run)

Summarize Missing Data Metrics for Each Column

Description

The drop_summary function generates a summary of missing data (NA values) for each column in a dataframe. It computes various metrics such as the number of dropout participants, section NAs, the mode length of those missing value sections for, and the proportion of complete cases for each column.

Usage

drop_summary(data)

Arguments

data

A dataframe for which to analyze missing data.

Details

The function calls a C API to compute some metrics, which are then processed and returned as a summary dataframe.

Value

A dataframe containing the following columns:

column: The name of each column in the input dataframe.
drop: The number of dropped rows (missing values) for that column.
sec_na: The number of sections of consecutive NAs for that column.
sec_length: The mode (most frequent length) of sections of consecutive NAs for that column.
single_na: The number of single NA values (isolated missing values) for that column.
na: The total number of missing (NA) values for that column.
complete: The proportion of complete rows for that column, where a value of 1 means no missing data, and values closer to 0 mean more missing data.

Examples

## Not run: 
# Example usage with the 'flying' dataframe
summary_result <- drop_summary(flying)
print(summary_result)

## End(Not run)

Flying Etiquette Survey Data

Description

This is a modified version of the Flying Etiquette Survey data behind the story: 41 percent of flyers say it's rude to recline your seat on an airplane.

Usage

flying

Format

a dataframe 1040 obs. and 28 columns, which are:

respondent_id: respondentid
travel_frequency: how often do you travel by plane?
seat_recline: do you ever recline your seat when you fly?
height: how tall are you?
children_under_18: do you have any children under 18?
two_armrests: in a row of three seats, who should get to use the two arm rests?
middle_armrest: in a row of two seats, who should get to use the middle arm rest?
window_shade: who should have control over the window shade?
moving_to_unsold_seat: is it rude to move to an unsold seat on a plane?
talking_to_seatmate: generally speaking, is it rude to say more than a few words to the stranger sitting next to you on a plane?
getting_up_on_6_hour_flight: on a 6 hour flight from nyc to la, how many times is it acceptable to get up if you're not in an aisle seat?
obligation_to_reclined_seat: is it rude to recline your seat on a plane?
eliminate_reclining_seats: given the opportunity, would you eliminate the possibility of reclining seats on planes entirely?
switch_for_friends: is it rude to ask someone to switch seats with you in order to be closer to friends?
switch_for_family: is it rude to ask someone to switch seats with you in order to be closer to family?
wake_passenger_bathroom: is it rude to wake a passenger up if you are trying to go to the bathroom?
wake_passenger_walk: is it rude to wake a passenger up if you are trying to walk around?
baby_on_plane: in general, is it rude to bring a baby on a plane?
unruly_children: in general, is it rude to knowingly bring unruly children on a plane?
electronics_violation: have you ever used personal electronics during take off or landing in violation of a flight attendant's direction?
smoking_violation: have you ever smoked a cigarette in an airplane bathroom when it was against the rules?
gender: gender
age: age
household_income: household income
education: education
location_census_region: location (census region)
survey_type: type of the survey

Source

https://github.com/fivethirtyeight/data/tree/15f210532b2a642e85738ddefa7a2945d47e2585/flying-etiquette-survey

https://fivethirtyeight.com/features/airplane-etiquette-recline-seat/

Examples

data(flying)

Compute column name for dropped observations

Description

This function returns the column names corresponding to the dropped values in the dataset.

Usage

metric_column(c_output, drop_index, data)

Arguments

c_output

A list containing output data to detect dropped columns.

drop_index

A vector indicating the index of the dropped column for each row.

data

The original dataset.

Value

A vector of column names corresponding to dropped values.

Compute completeness of each column

Description

This function calculates the completeness of each column in the dataset by subtracting the NA count from the total number of rows.

Usage

metric_complete(na, data)

Arguments

na

A vector of NA counts for each column.

data

The original dataset.

Value

A vector of completeness values (proportions) for each column.

Compute drop metric for each column

Description

This function computes the number of drops for each column based on the c_output data.

Usage

metric_drop(c_output)

Arguments

c_output

A list containing output data to compute the drop metric.

Value

A vector of drop values.

Compute drop indicator for each row

Description

This function computes a logical vector indicating whether each row in the dataset has been dropped.

Usage

metric_drop_id(c_output)

Arguments

c_output

A list containing output data to compute the drop indicator.

Value

A logical vector indicating dropped rows (TRUE = dropped).

Compute drop index for each row

Description

This function returns the index of the dropped column for each row, or NA if no column was dropped.

Usage

metric_drop_index(c_output)

Arguments

c_output

A list containing output data to compute the drop index.

Value

A vector of drop indices (or NA if no drop occurred).

Compute total NA count for each column

Description

This function calculates the total NA count for each column by summing the drop, section NA, and single NA metrics.

Usage

metric_na(drop, sec_na, single_na)

Arguments

drop

A vector of drop counts for each column.

sec_na

A vector of section NA counts for each column.

single_na

A vector of single NA counts for each column.

Value

A vector of total NA counts.

Compute section length metric for each column

Description

This function calculates the mode (most common value) of the left-out section length for each column.

Usage

metric_sec_length(c_output)

Arguments

c_output

A list containing output data to compute the section length metric.

Value

A vector of section length values.

Compute section NA metric for each column

Description

This function calculates the total number of section NAs for each column in the dataset.

Usage

metric_sec_na(c_output)

Arguments

c_output

A list containing output data to compute the section NA metric.

Value

A vector of section NA values.

Compute single NA metric for each column

Description

This function calculates the total number of single NAs for each column.

Usage

metric_single_na(c_output)

Arguments

c_output

A list containing output data to compute the single NA metric.

Value

A vector of single NA values.

Detect dropped observations and columns

Description

This function identifies rows and columns with dropped values in the dataset based on the c_output.

Usage

metrics_detect(data, c_output)

Arguments

data

The original dataset.

c_output

A list containing output data to detect dropped values.

Value

A dataframe indicating the dropped rows and columns.

Generate a summary of metrics for the dataset

Description

This function calculates a variety of metrics on the provided data and a corresponding output (c_output). These metrics include drop rate, section NA, section length, single NA, and the number of complete rows.

Usage

metrics_summary(data, c_output)

Arguments

data

A dataframe containing the dataset.

c_output

A list or dataframe containing the computed output that helps generate the metrics.

Value

A dataframe summarizing the computed metrics for each column.

dropout: Handling Incomplete Responses in Survey Data Analysis

Description

Author(s)

See Also

Prepare data for C API: list of row-vectors with NA indicators

Description

Usage

Arguments

Value

Check that the input is a valid dataframe

Description

Usage

Arguments

Value

Detect Instances of Dropout in Data

Description

Usage

Arguments

Value

Examples

Summarize Missing Data Metrics for Each Column

Description

Usage

Arguments

Details

Value

Examples

Flying Etiquette Survey Data

Description

Usage

Format

Source

Examples

Compute column name for dropped observations

Description

Usage

Arguments

Value

Compute completeness of each column

Description

Usage

Arguments

Value

Compute drop metric for each column

Description

Usage

Arguments

Value

Compute drop indicator for each row

Description

Usage

Arguments

Value

Compute drop index for each row

Description

Usage

Arguments

Value

Compute total NA count for each column

Description

Usage

Arguments

Value

Compute section length metric for each column

Description

Usage

Arguments

Value

Compute section NA metric for each column

Description

Usage

Arguments

Value

Compute single NA metric for each column

Description

Usage

Arguments

Value

Detect dropped observations and columns

Description