Introduction
The DisImpact
R package contains functions that help in
determining disproportionate impact (DI) based on the following
methodologies:
- percentage point gap (PPG) method,
- proportionality index method (method #1 in reference), and
- 80% index method (method #2 in reference).
Install Package
# From CRAN (Official)
install.packages('DisImpact')
# From github (Development)
::install_github('vinhdizzo/DisImpact') devtools
Load Packages
library(DisImpact)
library(dplyr) # Ease in manipulations with data frames
Load toy student equity data
To illustrate the functionality of the package, let’s load a toy data set:
# Load fake data set
data(student_equity)
# Print first few observations
head(student_equity)
## Ethnicity Gender Cohort Transfer Cohort_Math Math Cohort_English
## 1 Native American Female 2017 0 2017 1 2017
## 2 Native American Female 2017 0 2018 1 NA
## 3 Native American Female 2017 0 2018 1 2017
## 4 Native American Male 2017 1 2017 1 2018
## 5 Native American Male 2017 0 2017 1 2019
## 6 Native American Male 2017 1 2019 1 2018
## English Ed_Goal College_Status Student_ID EthnicityFlag_Asian
## 1 0 Deg/Transfer First-time College 100001 0
## 2 NA Deg/Transfer First-time College 100002 0
## 3 0 Deg/Transfer First-time College 100003 0
## 4 1 Other First-time College 100004 0
## 5 0 Deg/Transfer Other 100005 0
## 6 1 Other First-time College 100006 0
## EthnicityFlag_Black EthnicityFlag_Hispanic EthnicityFlag_NativeAmerican
## 1 0 0 1
## 2 0 0 1
## 3 0 0 1
## 4 0 0 1
## 5 0 0 1
## 6 0 0 1
## EthnicityFlag_PacificIslander EthnicityFlag_White EthnicityFlag_Carribean
## 1 0 0 0
## 2 0 0 0
## 3 0 0 0
## 4 0 0 0
## 5 0 0 0
## 6 0 0 0
## EthnicityFlag_EastAsian EthnicityFlag_SouthEastAsian
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
## EthnicityFlag_SouthWestAsianNorthAfrican EthnicityFlag_AANAPI
## 1 0 1
## 2 0 1
## 3 0 1
## 4 0 1
## 5 0 1
## 6 0 1
## EthnicityFlag_Unknown EthnicityFlag_TwoorMoreRaces
## 1 0 0
## 2 0 0
## 3 0 0
## 4 0 0
## 5 0 0
## 6 0 0
# For description of data set
## ?student_equity
For a description of the student_equity
data set, type
?student_equity
in the R console.
The toy data set can be summarized as follows:
# Summarize toy data
dim(student_equity)
## [1] 20000 24
<- student_equity %>%
dSumm group_by(Cohort, Ethnicity) %>%
summarize(n=n(), Transfer_Rate=mean(Transfer))
## `summarise()` has grouped output by 'Cohort'. You can override using the
## `.groups` argument.
## This is a summarized version of the data set dSumm
## # A tibble: 12 x 4
## # Groups: Cohort [2]
## Cohort Ethnicity n Transfer_Rate
## <int> <chr> <int> <dbl>
## 1 2017 Asian 3000 0.687
## 2 2017 Black 1000 0.31
## 3 2017 Hispanic 2000 0.205
## 4 2017 Multi-Ethnicity 500 0.524
## 5 2017 Native American 100 0.43
## 6 2017 White 3400 0.604
## 7 2018 Asian 3000 0.743
## 8 2018 Black 1000 0.297
## 9 2018 Hispanic 2000 0.218
## 10 2018 Multi-Ethnicity 500 0.484
## 11 2018 Native American 100 0.35
## 12 2018 White 3400 0.631
Percentage point gap (PPG) method
di_ppg
is the main work function, and it can take on
vectors or column names the tidy way:
# Vector
di_ppg(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.5264 overall 0.03000000
## 2 Black 2000 607 0.3035000 0.5264 overall 0.03000000
## 3 Hispanic 4000 847 0.2117500 0.5264 overall 0.03000000
## 4 Multi-Ethnicity 1000 504 0.5040000 0.5264 overall 0.03099032
## 5 Native American 200 78 0.3900000 0.5264 overall 0.06929646
## 6 White 6800 4200 0.6176471 0.5264 overall 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333 0 0
## 2 0.2735000 0.3335000 1 386
## 3 0.1817500 0.2417500 1 1139
## 4 0.4730097 0.5349903 0 0
## 5 0.3207035 0.4592965 1 14
## 6 0.5876471 0.6476471 0 0
## success_needed_full_parity
## 1 0
## 2 446
## 3 1259
## 4 23
## 5 28
## 6 0
# Tidy and column reference
di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>%
as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.5264 overall 0.03000000
## 2 Black 2000 607 0.3035000 0.5264 overall 0.03000000
## 3 Hispanic 4000 847 0.2117500 0.5264 overall 0.03000000
## 4 Multi-Ethnicity 1000 504 0.5040000 0.5264 overall 0.03099032
## 5 Native American 200 78 0.3900000 0.5264 overall 0.06929646
## 6 White 6800 4200 0.6176471 0.5264 overall 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333 0 0
## 2 0.2735000 0.3335000 1 386
## 3 0.1817500 0.2417500 1 1139
## 4 0.4730097 0.5349903 0 0
## 5 0.3207035 0.4592965 1 14
## 6 0.5876471 0.6476471 0 0
## success_needed_full_parity
## 1 0
## 2 446
## 3 1259
## 4 23
## 5 28
## 6 0
For a description of the di_ppg
function, including both
function arguments and returned results, type ?di_ppg
in
the R console.
Sometimes, one might want to break out the DI calculation by cohort:
# Cohort
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.5140 overall
## 2 2017 Black 1000 310 0.3100000 0.5140 overall
## 3 2017 Hispanic 2000 410 0.2050000 0.5140 overall
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.5140 overall
## 5 2017 Native American 100 43 0.4300000 0.5140 overall
## 6 2017 White 3400 2053 0.6038235 0.5140 overall
## 7 2018 Asian 3000 2230 0.7433333 0.5388 overall
## 8 2018 Black 1000 297 0.2970000 0.5388 overall
## 9 2018 Hispanic 2000 437 0.2185000 0.5388 overall
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.5388 overall
## 11 2018 Native American 100 35 0.3500000 0.5388 overall
## 12 2018 White 3400 2147 0.6314706 0.5388 overall
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 174
## 3 0.03000000 0.1750000 0.2350000 1 558
## 4 0.04382693 0.4801731 0.5678269 0 0
## 5 0.09800000 0.3320000 0.5280000 0 0
## 6 0.03000000 0.5738235 0.6338235 0 0
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 211
## 9 0.03000000 0.1885000 0.2485000 1 581
## 10 0.04382693 0.4401731 0.5278269 1 6
## 11 0.09800000 0.2520000 0.4480000 1 10
## 12 0.03000000 0.6014706 0.6614706 0 0
## success_needed_full_parity
## 1 0
## 2 205
## 3 619
## 4 0
## 5 9
## 6 0
## 7 0
## 8 242
## 9 641
## 10 28
## 11 19
## 12 0
di_ppg
is also applicable to summarized data; just pass
the counts to success
and group size to
weight
. For example, we use the summarized data set,
dSumm
, and sample size n
, in the
following:
di_ppg(success=Transfer_Rate*n, group=Ethnicity, cohort=Cohort, weight=n, data=dSumm) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.5140 overall
## 2 2017 Black 1000 310 0.3100000 0.5140 overall
## 3 2017 Hispanic 2000 410 0.2050000 0.5140 overall
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.5140 overall
## 5 2017 Native American 100 43 0.4300000 0.5140 overall
## 6 2017 White 3400 2053 0.6038235 0.5140 overall
## 7 2018 Asian 3000 2230 0.7433333 0.5388 overall
## 8 2018 Black 1000 297 0.2970000 0.5388 overall
## 9 2018 Hispanic 2000 437 0.2185000 0.5388 overall
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.5388 overall
## 11 2018 Native American 100 35 0.3500000 0.5388 overall
## 12 2018 White 3400 2147 0.6314706 0.5388 overall
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 174
## 3 0.03000000 0.1750000 0.2350000 1 558
## 4 0.04382693 0.4801731 0.5678269 0 0
## 5 0.09800000 0.3320000 0.5280000 0 0
## 6 0.03000000 0.5738235 0.6338235 0 0
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 211
## 9 0.03000000 0.1885000 0.2485000 1 581
## 10 0.04382693 0.4401731 0.5278269 1 6
## 11 0.09800000 0.2520000 0.4480000 1 10
## 12 0.03000000 0.6014706 0.6614706 0 0
## success_needed_full_parity
## 1 0
## 2 205
## 3 619
## 4 0
## 5 9
## 6 0
## 7 0
## 8 242
## 9 641
## 10 28
## 11 19
## 12 0
By default, di_ppg
uses the overall success rate as the
reference rate for comparison (default:
reference='overall'
). The reference
argument
also accepts 'hpg'
(highest performing group success rate
as the reference rate), 'all but current'
(success rate of
all groups combined excluding the comparison group), or a group value
from group
.
# Reference: Highest performing group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='hpg', data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6873333 Asian
## 2 2017 Black 1000 310 0.3100000 0.6873333 Asian
## 3 2017 Hispanic 2000 410 0.2050000 0.6873333 Asian
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6873333 Asian
## 5 2017 Native American 100 43 0.4300000 0.6873333 Asian
## 6 2017 White 3400 2053 0.6038235 0.6873333 Asian
## 7 2018 Asian 3000 2230 0.7433333 0.7433333 Asian
## 8 2018 Black 1000 297 0.2970000 0.7433333 Asian
## 9 2018 Hispanic 2000 437 0.2185000 0.7433333 Asian
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.7433333 Asian
## 11 2018 Native American 100 35 0.3500000 0.7433333 Asian
## 12 2018 White 3400 2147 0.6314706 0.7433333 Asian
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 347
## 3 0.03000000 0.1750000 0.2350000 1 905
## 4 0.04382693 0.4801731 0.5678269 1 60
## 5 0.09800000 0.3320000 0.5280000 1 16
## 6 0.03000000 0.5738235 0.6338235 1 182
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 416
## 9 0.03000000 0.1885000 0.2485000 1 990
## 10 0.04382693 0.4401731 0.5278269 1 108
## 11 0.09800000 0.2520000 0.4480000 1 30
## 12 0.03000000 0.6014706 0.6614706 1 279
## success_needed_full_parity
## 1 0
## 2 378
## 3 965
## 4 82
## 5 26
## 6 284
## 7 0
## 8 447
## 9 1050
## 10 130
## 11 40
## 12 381
# Reference: All but current (PPG minus 1)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='all but current', data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.4397143 all but current
## 2 2017 Black 1000 310 0.3100000 0.5366667 all but current
## 3 2017 Hispanic 2000 410 0.2050000 0.5912500 all but current
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.5134737 all but current
## 5 2017 Native American 100 43 0.4300000 0.5148485 all but current
## 6 2017 White 3400 2053 0.6038235 0.4677273 all but current
## 7 2018 Asian 3000 2230 0.7433333 0.4511429 all but current
## 8 2018 Black 1000 297 0.2970000 0.5656667 all but current
## 9 2018 Hispanic 2000 437 0.2185000 0.6188750 all but current
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.5416842 all but current
## 11 2018 Native American 100 35 0.3500000 0.5407071 all but current
## 12 2018 White 3400 2147 0.6314706 0.4910606 all but current
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 196
## 3 0.03000000 0.1750000 0.2350000 1 713
## 4 0.04382693 0.4801731 0.5678269 0 0
## 5 0.09800000 0.3320000 0.5280000 0 0
## 6 0.03000000 0.5738235 0.6338235 0 0
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 238
## 9 0.03000000 0.1885000 0.2485000 1 741
## 10 0.04382693 0.4401731 0.5278269 1 7
## 11 0.09800000 0.2520000 0.4480000 1 10
## 12 0.03000000 0.6014706 0.6614706 0 0
## success_needed_full_parity
## 1 0
## 2 227
## 3 773
## 4 0
## 5 9
## 6 0
## 7 0
## 8 269
## 9 801
## 10 29
## 11 20
## 12 0
# Reference: custom group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='White', data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6038235 White
## 2 2017 Black 1000 310 0.3100000 0.6038235 White
## 3 2017 Hispanic 2000 410 0.2050000 0.6038235 White
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6038235 White
## 5 2017 Native American 100 43 0.4300000 0.6038235 White
## 6 2017 White 3400 2053 0.6038235 0.6038235 White
## 7 2018 Asian 3000 2230 0.7433333 0.6314706 White
## 8 2018 Black 1000 297 0.2970000 0.6314706 White
## 9 2018 Hispanic 2000 437 0.2185000 0.6314706 White
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.6314706 White
## 11 2018 Native American 100 35 0.3500000 0.6314706 White
## 12 2018 White 3400 2147 0.6314706 0.6314706 White
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 263
## 3 0.03000000 0.1750000 0.2350000 1 738
## 4 0.04382693 0.4801731 0.5678269 1 18
## 5 0.09800000 0.3320000 0.5280000 1 8
## 6 0.03000000 0.5738235 0.6338235 0 0
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 304
## 9 0.03000000 0.1885000 0.2485000 1 766
## 10 0.04382693 0.4401731 0.5278269 1 52
## 11 0.09800000 0.2520000 0.4480000 1 19
## 12 0.03000000 0.6014706 0.6614706 0 0
## success_needed_full_parity
## 1 0
## 2 294
## 3 798
## 4 40
## 5 18
## 6 0
## 7 0
## 8 335
## 9 826
## 10 74
## 11 29
## 12 0
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='Asian', data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6873333 Asian
## 2 2017 Black 1000 310 0.3100000 0.6873333 Asian
## 3 2017 Hispanic 2000 410 0.2050000 0.6873333 Asian
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6873333 Asian
## 5 2017 Native American 100 43 0.4300000 0.6873333 Asian
## 6 2017 White 3400 2053 0.6038235 0.6873333 Asian
## 7 2018 Asian 3000 2230 0.7433333 0.7433333 Asian
## 8 2018 Black 1000 297 0.2970000 0.7433333 Asian
## 9 2018 Hispanic 2000 437 0.2185000 0.7433333 Asian
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.7433333 Asian
## 11 2018 Native American 100 35 0.3500000 0.7433333 Asian
## 12 2018 White 3400 2147 0.6314706 0.7433333 Asian
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 347
## 3 0.03000000 0.1750000 0.2350000 1 905
## 4 0.04382693 0.4801731 0.5678269 1 60
## 5 0.09800000 0.3320000 0.5280000 1 16
## 6 0.03000000 0.5738235 0.6338235 1 182
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 416
## 9 0.03000000 0.1885000 0.2485000 1 990
## 10 0.04382693 0.4401731 0.5278269 1 108
## 11 0.09800000 0.2520000 0.4480000 1 30
## 12 0.03000000 0.6014706 0.6614706 1 279
## success_needed_full_parity
## 1 0
## 2 378
## 3 965
## 4 82
## 5 26
## 6 284
## 7 0
## 8 447
## 9 1050
## 10 130
## 11 40
## 12 381
The user could also pass in custom reference points for comparison
(e.g., a state-wide rate). di_ppg
accepts either a single
reference point to be used or a vector of reference points, one for each
cohort. For the latter, the vector of reference points will be taken to
correspond to the cohort
variable, alphabetically
ordered.
# With custom reference (single)
di_ppg(success=Transfer, group=Ethnicity, reference=0.54, data=student_equity) %>%
as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.54 numeric 0.03000000
## 2 Black 2000 607 0.3035000 0.54 numeric 0.03000000
## 3 Hispanic 4000 847 0.2117500 0.54 numeric 0.03000000
## 4 Multi-Ethnicity 1000 504 0.5040000 0.54 numeric 0.03099032
## 5 Native American 200 78 0.3900000 0.54 numeric 0.06929646
## 6 White 6800 4200 0.6176471 0.54 numeric 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333 0 0
## 2 0.2735000 0.3335000 1 414
## 3 0.1817500 0.2417500 1 1193
## 4 0.4730097 0.5349903 1 6
## 5 0.3207035 0.4592965 1 17
## 6 0.5876471 0.6476471 0 0
## success_needed_full_parity
## 1 0
## 2 474
## 3 1314
## 4 37
## 5 31
## 6 0
# With custom reference (multiple)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference=c(0.5, 0.55), data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.50 numeric
## 2 2017 Black 1000 310 0.3100000 0.50 numeric
## 3 2017 Hispanic 2000 410 0.2050000 0.50 numeric
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.50 numeric
## 5 2017 Native American 100 43 0.4300000 0.50 numeric
## 6 2017 White 3400 2053 0.6038235 0.50 numeric
## 7 2018 Asian 3000 2230 0.7433333 0.55 numeric
## 8 2018 Black 1000 297 0.2970000 0.55 numeric
## 9 2018 Hispanic 2000 437 0.2185000 0.55 numeric
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.55 numeric
## 11 2018 Native American 100 35 0.3500000 0.55 numeric
## 12 2018 White 3400 2147 0.6314706 0.55 numeric
## moe pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.03000000 0.6573333 0.7173333 0 0
## 2 0.03099032 0.2790097 0.3409903 1 160
## 3 0.03000000 0.1750000 0.2350000 1 530
## 4 0.04382693 0.4801731 0.5678269 0 0
## 5 0.09800000 0.3320000 0.5280000 0 0
## 6 0.03000000 0.5738235 0.6338235 0 0
## 7 0.03000000 0.7133333 0.7733333 0 0
## 8 0.03099032 0.2660097 0.3279903 1 223
## 9 0.03000000 0.1885000 0.2485000 1 604
## 10 0.04382693 0.4401731 0.5278269 1 12
## 11 0.09800000 0.2520000 0.4480000 1 11
## 12 0.03000000 0.6014706 0.6614706 0 0
## success_needed_full_parity
## 1 0
## 2 190
## 3 591
## 4 0
## 5 8
## 6 0
## 7 0
## 8 254
## 9 663
## 10 34
## 11 21
## 12 0
Disproportionate impact using the PPG relies on calculating the
margine margin of error (MOE) pertaining around the success rate. The
MOE calculated in di_ppg
has 2 underlying assumptions
(defaults):
- the minimum MOE returned is 0.03, and
- using 0.50 as the proportion in the margin of error formula, \(1.96 \times \sqrt{\hat{p} (1-\hat{p}) / n}\).
To override 1, the user could specify min_moe
in
di_ppg
. To override 2, the user could specify
use_prop_in_moe=TRUE
in di_ppg
.
# min_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02) %>%
as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.5264 overall 0.02000000
## 2 Black 2000 607 0.3035000 0.5264 overall 0.02191347
## 3 Hispanic 4000 847 0.2117500 0.5264 overall 0.02000000
## 4 Multi-Ethnicity 1000 504 0.5040000 0.5264 overall 0.03099032
## 5 Native American 200 78 0.3900000 0.5264 overall 0.06929646
## 6 White 6800 4200 0.6176471 0.5264 overall 0.02000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6953333 0.7353333 0 0
## 2 0.2815865 0.3254135 1 402
## 3 0.1917500 0.2317500 1 1179
## 4 0.4730097 0.5349903 0 0
## 5 0.3207035 0.4592965 1 14
## 6 0.5976471 0.6376471 0 0
## success_needed_full_parity
## 1 0
## 2 446
## 3 1259
## 4 23
## 5 28
## 6 0
# use_prop_in_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02, use_prop_in_moe=TRUE) %>%
as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.5264 overall 0.02000000
## 2 Black 2000 607 0.3035000 0.5264 overall 0.02015028
## 3 Hispanic 4000 847 0.2117500 0.5264 overall 0.02000000
## 4 Multi-Ethnicity 1000 504 0.5040000 0.5264 overall 0.03098933
## 5 Native American 200 78 0.3900000 0.5264 overall 0.06759869
## 6 White 6800 4200 0.6176471 0.5264 overall 0.02000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6953333 0.7353333 0 0
## 2 0.2833497 0.3236503 1 406
## 3 0.1917500 0.2317500 1 1179
## 4 0.4730107 0.5349893 0 0
## 5 0.3224013 0.4575987 1 14
## 6 0.5976471 0.6376471 0 0
## success_needed_full_parity
## 1 0
## 2 446
## 3 1259
## 4 23
## 5 28
## 6 0
In cases where the proportion is used in calculating MOE, an observed
proportion of 0 or 1 would lead to a zero MOE. To account for these
scenarios, the user could leverage the prop_sub_0
and
prop_sub_1
parameters in di_ppg
and
ppg_moe
as substitutes. These parameters default to
0.5
, which maximizes the MOE (making it more difficult to
declare disproportionate impact).
# Set Native American to have have zero transfers and see what the results
di_ppg(success=Transfer, group=Ethnicity, data=student_equity %>% mutate(Transfer=ifelse(Ethnicity=='Native American', 0, Transfer)), use_prop_in_moe=TRUE, prop_sub_0=0.1, prop_sub_1=0.9) %>%
as.data.frame
## Warning in ppg_moe(n = n, proportion = pct, min_moe = min_moe, prop_sub_0 =
## prop_sub_0, : The vector `proportion` contains 0. This will lead to a zero MOE.
## `prop_sub_0=0.1` will be used in calculating the MOE for these cases.
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.5225 overall 0.03000000
## 2 Black 2000 607 0.3035000 0.5225 overall 0.03000000
## 3 Hispanic 4000 847 0.2117500 0.5225 overall 0.03000000
## 4 Multi-Ethnicity 1000 504 0.5040000 0.5225 overall 0.03098933
## 5 Native American 200 0 0.0000000 0.5225 overall 0.04157788
## 6 White 6800 4200 0.6176471 0.5225 overall 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.68533333 0.74533333 0 0
## 2 0.27350000 0.33350000 1 378
## 3 0.18175000 0.24175000 1 1123
## 4 0.47301067 0.53498933 0 0
## 5 -0.04157788 0.04157788 1 97
## 6 0.58764706 0.64764706 0 0
## success_needed_full_parity
## 1 0
## 2 438
## 3 1243
## 4 19
## 5 105
## 6 0
Proportionality index method
di_prop_index
is the main work function for this method,
and it can take on vectors or column names the tidy way:
# Without cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
## group n success pct_success pct_group di_prop_index di_indicator
## 1 Asian 6000 4292 0.407674772 0.30 1.3589159 0
## 2 Black 2000 607 0.057655775 0.10 0.5765578 1
## 3 Hispanic 4000 847 0.080452128 0.20 0.4022606 1
## 4 Multi-Ethnicity 1000 504 0.047872340 0.05 0.9574468 0
## 5 Native American 200 78 0.007408815 0.01 0.7408815 1
## 6 White 6800 4200 0.398936170 0.34 1.1733417 0
## success_needed_not_di success_needed_full_parity
## 1 0 0
## 2 256 496
## 3 998 1574
## 4 0 24
## 5 7 28
## 6 0 0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
as.data.frame
## group n success pct_success pct_group di_prop_index di_indicator
## 1 Asian 6000 4292 0.407674772 0.30 1.3589159 0
## 2 Black 2000 607 0.057655775 0.10 0.5765578 1
## 3 Hispanic 4000 847 0.080452128 0.20 0.4022606 1
## 4 Multi-Ethnicity 1000 504 0.047872340 0.05 0.9574468 0
## 5 Native American 200 78 0.007408815 0.01 0.7408815 1
## 6 White 6800 4200 0.398936170 0.34 1.1733417 0
## success_needed_not_di success_needed_full_parity
## 1 0 0
## 2 256 496
## 3 998 1574
## 4 0 24
## 5 7 28
## 6 0 0
# With cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
## cohort group n success pct_success pct_group di_prop_index
## 1 2017 Asian 3000 2062 0.401167315 0.30 1.3372244
## 2 2017 Black 1000 310 0.060311284 0.10 0.6031128
## 3 2017 Hispanic 2000 410 0.079766537 0.20 0.3988327
## 4 2017 Multi-Ethnicity 500 262 0.050972763 0.05 1.0194553
## 5 2017 Native American 100 43 0.008365759 0.01 0.8365759
## 6 2017 White 3400 2053 0.399416342 0.34 1.1747539
## 7 2018 Asian 3000 2230 0.413882702 0.30 1.3796090
## 8 2018 Black 1000 297 0.055122494 0.10 0.5512249
## 9 2018 Hispanic 2000 437 0.081106162 0.20 0.4055308
## 10 2018 Multi-Ethnicity 500 242 0.044914625 0.05 0.8982925
## 11 2018 Native American 100 35 0.006495917 0.01 0.6495917
## 12 2018 White 3400 2147 0.398478099 0.34 1.1719944
## di_indicator success_needed_not_di success_needed_full_parity
## 1 0 0 0
## 2 1 111 227
## 3 1 491 773
## 4 0 0 0
## 5 0 0 9
## 6 0 0 0
## 7 0 0 0
## 8 1 146 269
## 9 1 507 801
## 10 0 0 29
## 11 1 9 20
## 12 0 0 0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
as.data.frame
## cohort group n success pct_success pct_group di_prop_index
## 1 2017 Asian 3000 2062 0.401167315 0.30 1.3372244
## 2 2017 Black 1000 310 0.060311284 0.10 0.6031128
## 3 2017 Hispanic 2000 410 0.079766537 0.20 0.3988327
## 4 2017 Multi-Ethnicity 500 262 0.050972763 0.05 1.0194553
## 5 2017 Native American 100 43 0.008365759 0.01 0.8365759
## 6 2017 White 3400 2053 0.399416342 0.34 1.1747539
## 7 2018 Asian 3000 2230 0.413882702 0.30 1.3796090
## 8 2018 Black 1000 297 0.055122494 0.10 0.5512249
## 9 2018 Hispanic 2000 437 0.081106162 0.20 0.4055308
## 10 2018 Multi-Ethnicity 500 242 0.044914625 0.05 0.8982925
## 11 2018 Native American 100 35 0.006495917 0.01 0.6495917
## 12 2018 White 3400 2147 0.398478099 0.34 1.1719944
## di_indicator success_needed_not_di success_needed_full_parity
## 1 0 0 0
## 2 1 111 227
## 3 1 491 773
## 4 0 0 0
## 5 0 0 9
## 6 0 0 0
## 7 0 0 0
## 8 1 146 269
## 9 1 507 801
## 10 0 0 29
## 11 1 9 20
## 12 0 0 0
For a description of the di_prop_index
function,
including both function arguments and returned results, type
?di_prop_index
in the R console.
Note that the referenced document describing this method does not
recommend a threshold on the proportionality index for declaring
disproportionate impact. The di_prop_index
function uses
di_prop_index_cutoff=0.8
as the default threshold, which
the user could change.
# Changing threshold for DI
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_prop_index_cutoff=0.5) %>% as.data.frame
## cohort group n success pct_success pct_group di_prop_index
## 1 2017 Asian 3000 2062 0.401167315 0.30 1.3372244
## 2 2017 Black 1000 310 0.060311284 0.10 0.6031128
## 3 2017 Hispanic 2000 410 0.079766537 0.20 0.3988327
## 4 2017 Multi-Ethnicity 500 262 0.050972763 0.05 1.0194553
## 5 2017 Native American 100 43 0.008365759 0.01 0.8365759
## 6 2017 White 3400 2053 0.399416342 0.34 1.1747539
## 7 2018 Asian 3000 2230 0.413882702 0.30 1.3796090
## 8 2018 Black 1000 297 0.055122494 0.10 0.5512249
## 9 2018 Hispanic 2000 437 0.081106162 0.20 0.4055308
## 10 2018 Multi-Ethnicity 500 242 0.044914625 0.05 0.8982925
## 11 2018 Native American 100 35 0.006495917 0.01 0.6495917
## 12 2018 White 3400 2147 0.398478099 0.34 1.1719944
## di_indicator success_needed_not_di success_needed_full_parity
## 1 0 0 0
## 2 0 0 227
## 3 1 116 773
## 4 0 0 0
## 5 0 0 9
## 6 0 0 0
## 7 0 0 0
## 8 0 0 269
## 9 1 114 801
## 10 0 0 29
## 11 0 0 20
## 12 0 0 0
80% index method
di_80_index
is the main work function for this method,
and it can take on vectors or column names the tidy way:
# Without cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
## group n success pct reference reference_group di_80_index
## 1 Asian 6000 4292 0.7153333 0.7153333 Asian 1.0000000
## 2 Black 2000 607 0.3035000 0.7153333 Asian 0.4242777
## 3 Hispanic 4000 847 0.2117500 0.7153333 Asian 0.2960158
## 4 Multi-Ethnicity 1000 504 0.5040000 0.7153333 Asian 0.7045666
## 5 Native American 200 78 0.3900000 0.7153333 Asian 0.5452004
## 6 White 6800 4200 0.6176471 0.7153333 Asian 0.8634395
## di_indicator success_needed_not_di success_needed_full_parity
## 1 0 0 0
## 2 1 538 824
## 3 1 1443 2015
## 4 1 69 212
## 5 1 37 66
## 6 0 0 665
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
as.data.frame
## group n success pct reference reference_group di_80_index
## 1 Asian 6000 4292 0.7153333 0.7153333 Asian 1.0000000
## 2 Black 2000 607 0.3035000 0.7153333 Asian 0.4242777
## 3 Hispanic 4000 847 0.2117500 0.7153333 Asian 0.2960158
## 4 Multi-Ethnicity 1000 504 0.5040000 0.7153333 Asian 0.7045666
## 5 Native American 200 78 0.3900000 0.7153333 Asian 0.5452004
## 6 White 6800 4200 0.6176471 0.7153333 Asian 0.8634395
## di_indicator success_needed_not_di success_needed_full_parity
## 1 0 0 0
## 2 1 538 824
## 3 1 1443 2015
## 4 1 69 212
## 5 1 37 66
## 6 0 0 665
# With cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6873333 Asian
## 2 2017 Black 1000 310 0.3100000 0.6873333 Asian
## 3 2017 Hispanic 2000 410 0.2050000 0.6873333 Asian
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6873333 Asian
## 5 2017 Native American 100 43 0.4300000 0.6873333 Asian
## 6 2017 White 3400 2053 0.6038235 0.6873333 Asian
## 7 2018 Asian 3000 2230 0.7433333 0.7433333 Asian
## 8 2018 Black 1000 297 0.2970000 0.7433333 Asian
## 9 2018 Hispanic 2000 437 0.2185000 0.7433333 Asian
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.7433333 Asian
## 11 2018 Native American 100 35 0.3500000 0.7433333 Asian
## 12 2018 White 3400 2147 0.6314706 0.7433333 Asian
## di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1 1.0000000 0 0 0
## 2 0.4510184 1 240 378
## 3 0.2982541 1 690 965
## 4 0.7623666 1 13 82
## 5 0.6256062 1 12 26
## 6 0.8785017 0 0 284
## 7 1.0000000 0 0 0
## 8 0.3995516 1 298 447
## 9 0.2939462 1 753 1050
## 10 0.6511211 1 56 130
## 11 0.4708520 1 25 40
## 12 0.8495120 0 0 381
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6873333 Asian
## 2 2017 Black 1000 310 0.3100000 0.6873333 Asian
## 3 2017 Hispanic 2000 410 0.2050000 0.6873333 Asian
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6873333 Asian
## 5 2017 Native American 100 43 0.4300000 0.6873333 Asian
## 6 2017 White 3400 2053 0.6038235 0.6873333 Asian
## 7 2018 Asian 3000 2230 0.7433333 0.7433333 Asian
## 8 2018 Black 1000 297 0.2970000 0.7433333 Asian
## 9 2018 Hispanic 2000 437 0.2185000 0.7433333 Asian
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.7433333 Asian
## 11 2018 Native American 100 35 0.3500000 0.7433333 Asian
## 12 2018 White 3400 2147 0.6314706 0.7433333 Asian
## di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1 1.0000000 0 0 0
## 2 0.4510184 1 240 378
## 3 0.2982541 1 690 965
## 4 0.7623666 1 13 82
## 5 0.6256062 1 12 26
## 6 0.8785017 0 0 284
## 7 1.0000000 0 0 0
## 8 0.3995516 1 298 447
## 9 0.2939462 1 753 1050
## 10 0.6511211 1 56 130
## 11 0.4708520 1 25 40
## 12 0.8495120 0 0 381
For a description of the di_80_index
function, including
both function arguments and returned results, type
?di_80_index
in the R console.
By default, di_80_index
uses the group with the highest
success rate as reference in calculating the index. One could specify
the the comparison group using the reference_group
argument
(a value from group
).
# Changing reference group
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, reference_group='White') %>% as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6038235 White
## 2 2017 Black 1000 310 0.3100000 0.6038235 White
## 3 2017 Hispanic 2000 410 0.2050000 0.6038235 White
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6038235 White
## 5 2017 Native American 100 43 0.4300000 0.6038235 White
## 6 2017 White 3400 2053 0.6038235 0.6038235 White
## 7 2018 Asian 3000 2230 0.7433333 0.6314706 White
## 8 2018 Black 1000 297 0.2970000 0.6314706 White
## 9 2018 Hispanic 2000 437 0.2185000 0.6314706 White
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.6314706 White
## 11 2018 Native American 100 35 0.3500000 0.6314706 White
## 12 2018 White 3400 2147 0.6314706 0.6314706 White
## di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1 1.1383017 0 0 0
## 2 0.5133950 1 174 294
## 3 0.3395032 1 557 798
## 4 0.8678032 0 0 40
## 5 0.7121286 1 6 18
## 6 1.0000000 0 0 0
## 7 1.1771464 0 0 0
## 8 0.4703307 1 209 335
## 9 0.3460177 1 574 826
## 10 0.7664648 1 11 74
## 11 0.5542618 1 16 29
## 12 1.0000000 0 0 0
By default, di_80_index
uses 80%
(di_80_index_cutoff=0.80
) as the default threshold for
declaring disproportionate impact. One could override this using another
threshold via the di_80_index_cutoff
argument.
# Changing threshold for DI
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_80_index_cutoff=0.50) %>% as.data.frame
## cohort group n success pct reference reference_group
## 1 2017 Asian 3000 2062 0.6873333 0.6873333 Asian
## 2 2017 Black 1000 310 0.3100000 0.6873333 Asian
## 3 2017 Hispanic 2000 410 0.2050000 0.6873333 Asian
## 4 2017 Multi-Ethnicity 500 262 0.5240000 0.6873333 Asian
## 5 2017 Native American 100 43 0.4300000 0.6873333 Asian
## 6 2017 White 3400 2053 0.6038235 0.6873333 Asian
## 7 2018 Asian 3000 2230 0.7433333 0.7433333 Asian
## 8 2018 Black 1000 297 0.2970000 0.7433333 Asian
## 9 2018 Hispanic 2000 437 0.2185000 0.7433333 Asian
## 10 2018 Multi-Ethnicity 500 242 0.4840000 0.7433333 Asian
## 11 2018 Native American 100 35 0.3500000 0.7433333 Asian
## 12 2018 White 3400 2147 0.6314706 0.7433333 Asian
## di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1 1.0000000 0 0 0
## 2 0.4510184 1 34 378
## 3 0.2982541 1 278 965
## 4 0.7623666 0 0 82
## 5 0.6256062 0 0 26
## 6 0.8785017 0 0 284
## 7 1.0000000 0 0 0
## 8 0.3995516 1 75 447
## 9 0.2939462 1 307 1050
## 10 0.6511211 0 0 130
## 11 0.4708520 1 3 40
## 12 0.8495120 0 0 381
When dealing with a non-success variable like drop-out or probation
All methods and functions implemented in the DisImpact
package treat outcomes as positive: 1 is desired over 0 (higher rate is
better, lower rate indicates disparity). The choice of the name
success
in the functions’ arguments is intentional to
remind the user of this.
Suppose we have a variable that indicates something negative (e.g., a
flag for students on academic probation). We could calculate DI on the
converse of it by using the !
(logical negation)
operator:
## di_ppg(success=!Probation, group=Ethnicity, data=student_equity) %>%
## as.data.frame ## If there were a Probation variable
di_ppg(success=!Transfer, group=Ethnicity, data=student_equity) %>%
## Illustrating the point with `!` as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 1708 0.2846667 0.4736 overall 0.03000000
## 2 Black 2000 1393 0.6965000 0.4736 overall 0.03000000
## 3 Hispanic 4000 3153 0.7882500 0.4736 overall 0.03000000
## 4 Multi-Ethnicity 1000 496 0.4960000 0.4736 overall 0.03099032
## 5 Native American 200 122 0.6100000 0.4736 overall 0.06929646
## 6 White 6800 2600 0.3823529 0.4736 overall 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.2546667 0.3146667 1 954
## 2 0.6665000 0.7265000 0 0
## 3 0.7582500 0.8182500 0 0
## 4 0.4650097 0.5269903 0 0
## 5 0.5407035 0.6792965 0 0
## 6 0.3523529 0.4123529 1 417
## success_needed_full_parity
## 1 1134
## 2 0
## 3 0
## 4 0
## 5 0
## 6 621
Transformations on the fly
We can compute the success, group, and cohort variables on the fly:
# Transform success
<- sample(0:1, size=nrow(student_equity), replace=TRUE, prob=c(0.95, 0.05))
a mean(a)
## [1] 0.05065
di_ppg(success=pmax(Transfer, a), group=Ethnicity, data=student_equity) %>%
as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4379 0.7298333 0.5504 overall 0.03000000
## 2 Black 2000 683 0.3415000 0.5504 overall 0.03000000
## 3 Hispanic 4000 1002 0.2505000 0.5504 overall 0.03000000
## 4 Multi-Ethnicity 1000 533 0.5330000 0.5504 overall 0.03099032
## 5 Native American 200 86 0.4300000 0.5504 overall 0.06929646
## 6 White 6800 4325 0.6360294 0.5504 overall 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6998333 0.7598333 0 0
## 2 0.3115000 0.3715000 1 358
## 3 0.2205000 0.2805000 1 1080
## 4 0.5020097 0.5639903 0 0
## 5 0.3607035 0.4992965 1 11
## 6 0.6060294 0.6660294 0 0
## success_needed_full_parity
## 1 0
## 2 418
## 3 1200
## 4 18
## 5 25
## 6 0
# Collapse Black and Hispanic
di_ppg(success=Transfer, group=ifelse(Ethnicity %in% c('Black', 'Hispanic'), 'Black/Hispanic', Ethnicity), data=student_equity) %>% as.data.frame
## group n success pct reference reference_group moe
## 1 Asian 6000 4292 0.7153333 0.5264 overall 0.03000000
## 2 Black/Hispanic 6000 1454 0.2423333 0.5264 overall 0.03000000
## 3 Multi-Ethnicity 1000 504 0.5040000 0.5264 overall 0.03099032
## 4 Native American 200 78 0.3900000 0.5264 overall 0.06929646
## 5 White 6800 4200 0.6176471 0.5264 overall 0.03000000
## pct_lo pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333 0 0
## 2 0.2123333 0.2723333 1 1525
## 3 0.4730097 0.5349903 0 0
## 4 0.3207035 0.4592965 1 14
## 5 0.5876471 0.6476471 0 0
## success_needed_full_parity
## 1 0
## 2 1705
## 3 23
## 4 28
## 5 0
Calculate DI for many variables and groups
It is often the case that the user desires to calculate
disproportionate impact across many outcome variables and many
disaggregation/group variables. The function di_iterate
allows the user to specify a data set and the various variables to
iterate across:
# Multiple group variables
di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall') %>% as.data.frame
## success_variable cohort_variable cohort disaggregation group n
## 1 Transfer Cohort 2017 - None - All 10000
## 2 Transfer Cohort 2017 Ethnicity Asian 3000
## 3 Transfer Cohort 2017 Ethnicity Black 1000
## 4 Transfer Cohort 2017 Ethnicity Hispanic 2000
## 5 Transfer Cohort 2017 Ethnicity Multi-Ethnicity 500
## 6 Transfer Cohort 2017 Ethnicity Native American 100
## 7 Transfer Cohort 2017 Ethnicity White 3400
## 8 Transfer Cohort 2017 Gender Female 4930
## 9 Transfer Cohort 2017 Gender Male 4886
## 10 Transfer Cohort 2017 Gender Other 184
## 11 Transfer Cohort 2018 - None - All 10000
## 12 Transfer Cohort 2018 Ethnicity Asian 3000
## 13 Transfer Cohort 2018 Ethnicity Black 1000
## 14 Transfer Cohort 2018 Ethnicity Hispanic 2000
## 15 Transfer Cohort 2018 Ethnicity Multi-Ethnicity 500
## 16 Transfer Cohort 2018 Ethnicity Native American 100
## 17 Transfer Cohort 2018 Ethnicity White 3400
## 18 Transfer Cohort 2018 Gender Female 4928
## 19 Transfer Cohort 2018 Gender Male 4880
## 20 Transfer Cohort 2018 Gender Other 192
## success pct ppg_reference ppg_reference_group moe pct_lo
## 1 5140 0.5140000 0.5140 overall 0.03000000 0.4840000
## 2 2062 0.6873333 0.5140 overall 0.03000000 0.6573333
## 3 310 0.3100000 0.5140 overall 0.03099032 0.2790097
## 4 410 0.2050000 0.5140 overall 0.03000000 0.1750000
## 5 262 0.5240000 0.5140 overall 0.04382693 0.4801731
## 6 43 0.4300000 0.5140 overall 0.09800000 0.3320000
## 7 2053 0.6038235 0.5140 overall 0.03000000 0.5738235
## 8 2513 0.5097363 0.5140 overall 0.03000000 0.4797363
## 9 2548 0.5214900 0.5140 overall 0.03000000 0.4914900
## 10 79 0.4293478 0.5140 overall 0.07224656 0.3571013
## 11 5388 0.5388000 0.5388 overall 0.03000000 0.5088000
## 12 2230 0.7433333 0.5388 overall 0.03000000 0.7133333
## 13 297 0.2970000 0.5388 overall 0.03099032 0.2660097
## 14 437 0.2185000 0.5388 overall 0.03000000 0.1885000
## 15 242 0.4840000 0.5388 overall 0.04382693 0.4401731
## 16 35 0.3500000 0.5388 overall 0.09800000 0.2520000
## 17 2147 0.6314706 0.5388 overall 0.03000000 0.6014706
## 18 2638 0.5353084 0.5388 overall 0.03000000 0.5053084
## 19 2642 0.5413934 0.5388 overall 0.03000000 0.5113934
## 20 108 0.5625000 0.5388 overall 0.07072541 0.4917746
## pct_hi di_indicator_ppg success_needed_not_di_ppg
## 1 0.5440000 0 0
## 2 0.7173333 0 0
## 3 0.3409903 1 174
## 4 0.2350000 1 558
## 5 0.5678269 0 0
## 6 0.5280000 0 0
## 7 0.6338235 0 0
## 8 0.5397363 0 0
## 9 0.5514900 0 0
## 10 0.5015944 1 3
## 11 0.5688000 0 0
## 12 0.7733333 0 0
## 13 0.3279903 1 211
## 14 0.2485000 1 581
## 15 0.5278269 1 6
## 16 0.4480000 1 10
## 17 0.6614706 0 0
## 18 0.5653084 0 0
## 19 0.5713934 0 0
## 20 0.6332254 0 0
## success_needed_full_parity_ppg di_prop_index di_indicator_prop_index
## 1 0 1.0000000 0
## 2 0 1.3372244 0
## 3 205 0.6031128 1
## 4 619 0.3988327 1
## 5 0 1.0194553 0
## 6 9 0.8365759 0
## 7 0 1.1747539 0
## 8 22 0.9917049 0
## 9 0 1.0145719 0
## 10 16 0.8353071 0
## 11 0 1.0000000 0
## 12 0 1.3796090 0
## 13 242 0.5512249 1
## 14 641 0.4055308 1
## 15 28 0.8982925 0
## 16 19 0.6495917 1
## 17 0 1.1719944 0
## 18 18 0.9935198 0
## 19 0 1.0048134 0
## 20 0 1.0439866 0
## success_needed_not_di_prop_index success_needed_full_parity_prop_index
## 1 0 0
## 2 0 0
## 3 111 227
## 4 491 773
## 5 0 0
## 6 0 9
## 7 0 0
## 8 0 42
## 9 0 0
## 10 0 16
## 11 0 0
## 12 0 0
## 13 146 269
## 14 507 801
## 15 0 29
## 16 9 20
## 17 0 0
## 18 0 34
## 19 0 0
## 20 0 0
## di_80_index_reference_group di_80_index di_indicator_80_index
## 1 - All 1.0000000 0
## 2 Asian 1.0000000 0
## 3 Asian 0.4510184 1
## 4 Asian 0.2982541 1
## 5 Asian 0.7623666 1
## 6 Asian 0.6256062 1
## 7 Asian 0.8785017 0
## 8 Male 0.9774614 0
## 9 Male 1.0000000 0
## 10 Male 0.8233098 0
## 11 - All 1.0000000 0
## 12 Asian 1.0000000 0
## 13 Asian 0.3995516 1
## 14 Asian 0.2939462 1
## 15 Asian 0.6511211 1
## 16 Asian 0.4708520 1
## 17 Asian 0.8495120 0
## 18 Other 0.9516595 0
## 19 Other 0.9624772 0
## 20 Other 1.0000000 0
## success_needed_not_di_80_index success_needed_full_parity_80_index
## 1 0 0
## 2 0 0
## 3 240 378
## 4 690 965
## 5 13 82
## 6 12 26
## 7 0 284
## 8 0 58
## 9 0 0
## 10 0 17
## 11 0 0
## 12 0 0
## 13 298 447
## 14 753 1050
## 15 56 130
## 16 25 40
## 17 0 381
## 18 0 134
## 19 0 103
## 20 0 0
# Multiple group variables and different reference groups
bind_rows(
di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall')
di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups=c('White', 'Male'), include_non_disagg_results=FALSE) # include_non_disagg_results = FALSE: Already have this scenario in Overall run
, )
## # A tibble: 38 x 25
## success_variable cohort_variable cohort disaggregation group n success
## <chr> <chr> <int> <chr> <chr> <dbl> <int>
## 1 Transfer Cohort 2017 - None - All 10000 5140
## 2 Transfer Cohort 2017 Ethnicity Asian 3000 2062
## 3 Transfer Cohort 2017 Ethnicity Black 1000 310
## 4 Transfer Cohort 2017 Ethnicity Hispanic 2000 410
## 5 Transfer Cohort 2017 Ethnicity Multi-E~ 500 262
## 6 Transfer Cohort 2017 Ethnicity Native ~ 100 43
## 7 Transfer Cohort 2017 Ethnicity White 3400 2053
## 8 Transfer Cohort 2017 Gender Female 4930 2513
## 9 Transfer Cohort 2017 Gender Male 4886 2548
## 10 Transfer Cohort 2017 Gender Other 184 79
## # ... with 28 more rows, and 18 more variables: pct <dbl>, ppg_reference <dbl>,
## # ppg_reference_group <chr>, moe <dbl>, pct_lo <dbl>, pct_hi <dbl>,
## # di_indicator_ppg <dbl>, success_needed_not_di_ppg <dbl>,
## # success_needed_full_parity_ppg <dbl>, di_prop_index <dbl>,
## # di_indicator_prop_index <dbl>, success_needed_not_di_prop_index <dbl>,
## # success_needed_full_parity_prop_index <dbl>,
## # di_80_index_reference_group <chr>, di_80_index <dbl>, ...
There is a separate vignette that explains how one
might leverage di_iterate
for rapid dashboard development
and deployment with disaggregation and disproportionate impact
features.
Appendix: R and R Package Versions
This vignette was generated using an R session with the following packages. There may be some discrepancies when the reader replicates the code caused by version mismatch.
sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=C
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] forcats_0.5.0 scales_1.1.1 ggplot2_3.3.2 stringr_1.4.0
## [5] knitr_1.39 dplyr_1.0.8 DisImpact_0.0.21
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.8.3 highr_0.9 pillar_1.7.0 bslib_0.3.1
## [5] compiler_4.0.2 jquerylib_0.1.4 sets_1.0-21 prettydoc_0.4.1
## [9] tools_4.0.2 digest_0.6.25 gtable_0.3.0 jsonlite_1.7.0
## [13] evaluate_0.15 lifecycle_1.0.1 tibble_3.1.6 fstcore_0.9.12
## [17] pkgconfig_2.0.3 rlang_1.0.1 DBI_1.1.0 cli_3.2.0
## [21] parallel_4.0.2 yaml_2.3.5 xfun_0.30 fastmap_1.1.0
## [25] withr_2.5.0 duckdb_0.5.0 generics_0.1.2 vctrs_0.3.8
## [29] sass_0.4.1 grid_4.0.2 tidyselect_1.1.2 data.table_1.14.3
## [33] glue_1.6.1 R6_2.3.0 fansi_1.0.2 rmarkdown_2.14
## [37] farver_2.0.3 tidyr_1.2.0 purrr_0.3.4 blob_1.2.1
## [41] magrittr_2.0.2 htmltools_0.5.2 ellipsis_0.3.2 fst_0.9.8
## [45] assertthat_0.2.1 colorspace_1.4-1 collapse_1.8.8 labeling_0.3
## [49] utf8_1.2.2 stringi_1.4.6 munsell_0.5.0 crayon_1.5.0