Type: | Package |
Title: | Report on Diversity and Inclusion in a Corporate Setting |
Version: | 0.3.1 |
Maintainer: | Philippe J.S. De Brouwer <philippe@de-brouwer.com> |
License: | AGPL (≥ 3) |
URL: | http://www.de-brouwer.com/div/ |
BugReports: | https://github.com/DrPhilippeDB/div/issues/ |
Description: | Facilitate the analysis of teams in a corporate setting: assess the diversity per grade and job, present the results, search for bias (in hiring and/or promoting processes). It also provides methods to simulate the effect of bias, random team-data, etc. White paper: 'Philippe J.S. De Brouwer' (2021) http://www.de-brouwer.com/assets/div/div-white-paper.pdf. Book (chapter 36): 'Philippe J.S. De Brouwer' (2020, ISBN:978-1-119-63272-6) and 'Philippe J.S. De Brouwer' (2020) <doi:10.1002/9781119632757>. |
Encoding: | UTF-8 |
Collate: | 'headers.R' 'diversity.R' 'div_conf_colour.R' 'div_fake_team.R' 'div_ci_median.R' 'div_paygap.R' 'div_parse_paygap.R' 'div_round_paygap.R' 'div_gauge_plot.R' 'div_plot_paygap_distribution.R' 'div_add_median_label.R' 'print.paygap.R' 'summary.paygap.R' |
Depends: | R (≥ 3.4.0), tidyverse |
Imports: | rlang, dplyr, tibble, tidyr, stringr, magrittr, ggplot2, gridExtra, plotly, pryr, rpart, kableExtra |
Suggests: | flexdashboard, knitr, rmarkdown, grid, lattice |
RoxygenNote: | 7.1.1 |
NeedsCompilation: | no |
Repository: | CRAN |
Packaged: | 2021-05-04 19:02:09 UTC; philippe |
Author: | Philippe J.S. De Brouwer [aut, cre] |
Date/Publication: | 2021-05-06 08:00:02 UTC |
Adds a column with new labels (H)igh and (L) for a given colName (within a given grade and jobID)
Description
This function calculates the entropy of a system with discrete states
Usage
div_add_median_label(
d,
colName = "age",
value1 = "T",
value2 = "F",
newColName = "isYoung"
)
Arguments
d |
tibble, a tibble with team data columns as defined in the documentation (at least the column colName (as set by next parameter), 'grade', and 'jobID') |
colName |
the name of the columns that contains the factor object to be used as explaining dimension for the paygap (defaults to 'gender') |
value1 |
character, the label to be used for the first half of observations (the smallest ones) |
value2 |
character, the label to be used for the second half of observations (the biggest ones) |
newColName |
the value in new column name that will hold the values value1 and value2 |
Value
dataframe (with columns grade, jobID, salary_selectedValue, salary_others, n_selectedValue, n_others, paygap, confidence) , where "confidence" is one of the following: NA = not available (numbers are too low), "" = no bias detectable, "." = there might be some bias, but we're not sure, "*" = bias detected wit some degree of confidence, "**" = quite sure there is bias, "***" = trust us, this is biased.
Examples
df <- div_add_median_label(div_fake_team())
colnames(df)
Function to calculate the confidence interval for the median
Description
Function to calculate the confidence interval for the median
Usage
div_ci_median(x, conf = 0.95)
Arguments
x |
numeric, data from which the median is calcualted |
conf |
numeric, the confidence interval as 1 - P(x < x0) |
Value
ci (confidence interval object)
Examples
x <- 1:100
div_ci_median(x)
return a colour code given a number of stars for the confidence level of bias
Description
This function returns a colour (R named colour) based on the confidence level
Usage
div_conf_colour(x)
Arguments
x |
the string associated to the paygap confidence: NA, ”, ',', '*', '***', '***' |
Value
string (named colour)
Examples
div_conf_colour("*")
Generate randomly team-data
Description
This function generates a data frame with data for a team (with salaries, gender, FTE, etc). This is a good start to test the package and to experiment what level of bias will be visible in the paygap for example.
Usage
div_fake_team(
seed = 100,
N = 200,
genders = c("F", "M", "O"),
gender_prob = c(0.4, 0.58, 0.02),
gender_salaryBias = c(1, 1.1, 1),
jobIDs = c("sales", "analytics"),
jobID_prob = c(0.6, 0.4),
citizenships = c("Polish", "German", "Italian", "Indian", "Other"),
citizenship_prob = c(0.6, 0.2, 0.1, 0.05, 0.05)
)
Arguments
seed |
numeric, the seed to be used in set.seed() |
N |
numeric, the size of the team to be used (default = 200) |
genders |
character, a vector of the genders to be used |
gender_prob |
numeric, relative probabilities of the different genders to occur (must have the same length as 'genders') |
gender_salaryBias |
numeric, vector with the relative salaries of the different genders (must have the same length as 'genders') |
jobIDs |
character, a vector with the labels of the job categories in the team (they will appear in each grade) |
jobID_prob |
numeric, a vector with the relative sizes of the different jobs in the team (must have the same length as 'jobIDs') |
citizenships |
character, a vector of the citizenships to be generated |
citizenship_prob |
numeric, relative probabilities of the different citizenships to occur (must have the same length as 'citizenships') |
Value
dataframe (employees of the random team)
Examples
library(div)
d <- div_fake_team()
head(d)
diversity(table(d$gender))
Uses ggplot2 to produce a gauge plot in RAG colour
Description
This function produces one or more gauge plots coloured in red (R), amber (A) or green (G) for a value between 0 and 1.
Usage
div_gauge_plot(df, breaks = c(0, 0.8, 0.95, 1), ncol = NULL, nbrSize = 6)
Arguments
df |
tibble, a tibble with columns "value" and "label" (value = the values between 0 and 1; - label = text to show e.g. paste("group", colnames(t))) |
breaks |
numeric vector with the lower limit, the border between green and amber, the border between amber and red, and the upper limit |
ncol |
numeric, the number of columns to produce |
nbrSize |
numeric, the font size for the label |
Value
ggplot object
Examples
d <- div_fake_team()
tbl_gender_div <- table(d$gender, d$grade) %>%
apply(2, diversity, prior = c(50.2, 49.8)) %>%
tibble(value = ., label = paste("Grade", names(.)))
div_gauge_plot(tbl_gender_div, ncol = 2, nbrSize = 4)
Prepare the paygap matrix to be published in LaTeX
Description
This function formats the paygap matrix (created by div_paygap()) and prepares it for printing via the function knitr::kable()
Usage
div_parse_paygap(
pg,
label = NULL,
min_nbr_show = NULL,
max_length_jobID = 12,
max_length_colnames = 9
)
Arguments
pg |
paygap object as created by div::div_paygap(). This is an S3 object with a specific structure |
label |
character, the label to be used in the caption of the kable object |
min_nbr_show |
numeric, if provided then only groups that have more than min_nbr_show employees in both categories (selectedValue and others) will be shown |
max_length_jobID |
numeric, if provided the maximal length of the column jobID (in characters) |
max_length_colnames |
numeric, if provided the maximal length of the column names (in characters) |
Value
knitr::kable object (for LaTeX)
Examples
d <- div_fake_team()
pg <- div_paygap(d)
div_parse_paygap(pg)
Function to calculate the paygap as a ratio.
Description
This function calculates the entropy of a system with discrete states
Usage
div_paygap(d, x = "gender", y = "salary", x_ctrl = "F", ctrl_var = "age")
Arguments
d |
tibble, a tibble with columns as definded |
x |
the name of the columns that contains the factor object to be used as explaining dimension for the paygap (defaults to 'gender') |
y |
the name of the columns that contains the numeric value to be used to calculate the paygap (could be salary or bonus for example) |
x_ctrl |
the value in the column defined by x that should be isolated (this versus the others), defaults to 'F' |
ctrl_var |
a control variable to be added (shows median per group for that variable) |
Value
dataframe (with columns grade, jobID, salary_x_ctrl, salary_others, n_x_ctrl, n_others, paygap, confidence) , where "confidence" is one of the following: NA = not available (numbers are too low), "" = no bias detectable, "." = there might be some bias, but we're not sure, "*" = bias detected wit some degree of confidence, "**" = quite sure there is bias, "***" = trust us, this is biased.
Examples
df <- div_paygap(div_fake_team())
df
Produce a histogram and normal distribution
Description
Plots a histogram, a normal distribution with the same standard deviation and mean as well as one with a mean centred around 1
Usage
div_plot_paygap_distribution(x, label = "Gender", mu_unbiased = 1)
Arguments
x |
numeric vector, column of paygap observations |
label |
character, prefix for the title |
mu_unbiased |
numeric, the mean of the unbiased distribution (for paygaps this should be 1) |
Value
ggplot2 object
Examples
d <- div_fake_team()
pg <- div_paygap(d)
div_plot_paygap_distribution(pg$data$paygap)
Rounds all numbers in the paygap data-frame
Description
This function all numbers to zero decimals, except the paygap (which is rounded to 2 decimals):
Usage
div_round_paygap(x)
Arguments
x |
paygap object (output of div::div_paygap()) |
Value
the paygap data-frame (tibble only, not the whole paygap object)
Examples
d <- div_fake_team()
pg <- div_paygap(d)
div_round_paygap(pg)
Calculate the diversity index
Description
This function calculates the entropy of a system with discrete states
Usage
diversity(x, prior = NULL)
Arguments
x |
numeric vector, observed probabilities of the classes |
prior |
numeric vector, the prior probabilities of the classes |
Value
the entropy or diversity measure
Examples
x <- c(0.4, 0.6)
diversity(x)
print the paygap object in the terminal
Description
print the paygap object in the terminal
Usage
## S3 method for class 'paygap'
print(x, ...)
Arguments
x |
paygap object, as created by the function div_paygpa() |
... |
arguments passed on to the generic print function: print(x$data) |
Value
text output
Examples
library(div)
div_fake_team() %>%
div_paygap %>%
print
Title
Description
Title
Usage
## S3 method for class 'paygap'
summary(object, ...)
Arguments
object |
paygap S3 object, as created by the function dif_paygap() |
... |
passed on to summary() |
Value
a summary of the paygap object
Examples
library(div)
d <- div_fake_team()
pg <- div_paygap(d)
summary(pg)