Once you have built your full specification blueprint and feel comfortable with how the pipeline is executed, you can implement a full multiverse-style analysis.
Simply use
run_multiverse(<your expanded grid object>)
:
library(tidyverse)
library(multitool)
# create some data
the_data <-
data.frame(
id = 1:500,
iv1 = rnorm(500),
iv2 = rnorm(500),
iv3 = rnorm(500),
mod = rnorm(500),
dv1 = rnorm(500),
dv2 = rnorm(500),
include1 = rbinom(500, size = 1, prob = .1),
include2 = sample(1:3, size = 500, replace = TRUE),
include3 = rnorm(500)
)
# create a pipeline blueprint
full_pipeline <-
the_data |>
add_filters(include1 == 0, include2 != 3, include3 > -2.5) |>
add_variables(var_group = "ivs", iv1, iv2, iv3) |>
add_variables(var_group = "dvs", dv1, dv2) |>
add_model("linear model", lm({dvs} ~ {ivs} * mod))
# expand the pipeline
expanded_pipeline <- expand_decisions(full_pipeline)
# Run the multiverse
multiverse_results <- run_multiverse(expanded_pipeline)
multiverse_results
#> # A tibble: 48 × 4
#> decision specifications model_fitted pipeline_code
#> <chr> <list> <list> <list>
#> 1 1 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 2 2 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 3 3 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 4 4 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 5 5 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 6 6 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 7 7 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 8 8 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 9 9 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> 10 10 <tibble [1 × 3]> <tibble [1 × 5]> <tibble [1 × 2]>
#> # ℹ 38 more rows
The result will be another tibble
with various list
columns.
It will always contain a list column named
specifications
containing all the information you generated
in your blueprint. Next, there will a list column for your fitted model
fitted, labelled model_fitted
.
There are two main ways to unpack and examine multitool
results. The first is by using tidyr::unnest()
.
Inside the model_fitted
column, multitool
gives us 4 columns: model_parameters
,
model_performance
, model_warnings
, and
model_messages
.
multiverse_results |> unnest(model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
The model_parameters
column gives you the result of
calling parameters::parameters()
on each model in your
grid, which is a data.frame
of model coefficients and their
associated standard errors, confidence intervals, test statistic, and
p-values.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_parameters)
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) 0.140 0.0613 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.00984 0.0607 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0864 0.0612 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.0847 0.0655 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0763 0.0605 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0698 0.0599 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0474 0.0604 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0651 0.0646 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) 0.143 0.0611 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0368 0.0590 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
The model_performance
column gives fit statistics, such
as r2 or AIC and BIC values, computed by running
performance::performance()
on each model in your grid.
multiverse_results |>
unnest(model_fitted) |>
unnest(model_performance)
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 834. 835. 853.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
The model_messages
and model_warnings
columns contain information provided by the modeling function. If
something went wrong or you need to know something about a particular
model, these columns will have captured messages and warnings printed by
the modeling function.
I wrote wrappers around the tidyr::unnest()
workflow.
The main function is reveal()
. Pass a multiverse results
object to reveal()
and tell it which columns to grab by
indicating the column name in the .what
argument:
multiverse_results |>
reveal(.what = model_fitted)
#> # A tibble: 48 × 8
#> decision specifications model_function model_parameters model_performance
#> <chr> <list> <chr> <list> <list>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> <prfrmnc_>
#> # ℹ 38 more rows
#> # ℹ 3 more variables: model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
If you want to get straight to a specific result you can specify a
sub-list with the .which
argument:
multiverse_results |>
reveal(.what = model_fitted, .which = model_parameters)
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) 0.140 0.0613 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.00984 0.0607 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0864 0.0612 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.0847 0.0655 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0763 0.0605 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0698 0.0599 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0474 0.0604 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0651 0.0646 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) 0.143 0.0611 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0368 0.0590 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_*
multitool
will run and save anything you put in your
pipeline but most often, you will want to look at model parameters
and/or performance. To that end, there are a set of convenience
functions for getting at the most common multiverse results:
reveal_model_parameters
,
reveal_model_performance
,
reveal_model_messages
, and
reveal_model_warnings
.
reveal_model_parameters
unpacks the model parameters in
your multiverse:
multiverse_results |>
reveal_model_parameters()
#> # A tibble: 192 × 16
#> decision specifications model_function parameter coefficient se ci
#> <chr> <list> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm (Intercept) 0.140 0.0613 0.95
#> 2 1 <tibble [1 × 3]> lm iv1 -0.00984 0.0607 0.95
#> 3 1 <tibble [1 × 3]> lm mod 0.0864 0.0612 0.95
#> 4 1 <tibble [1 × 3]> lm iv1:mod 0.0847 0.0655 0.95
#> 5 2 <tibble [1 × 3]> lm (Intercept) -0.0763 0.0605 0.95
#> 6 2 <tibble [1 × 3]> lm iv1 -0.0698 0.0599 0.95
#> 7 2 <tibble [1 × 3]> lm mod -0.0474 0.0604 0.95
#> 8 2 <tibble [1 × 3]> lm iv1:mod -0.0651 0.0646 0.95
#> 9 3 <tibble [1 × 3]> lm (Intercept) 0.143 0.0611 0.95
#> 10 3 <tibble [1 × 3]> lm iv2 0.0368 0.0590 0.95
#> # ℹ 182 more rows
#> # ℹ 9 more variables: ci_low <dbl>, ci_high <dbl>, t <dbl>, df_error <int>,
#> # p <dbl>, model_performance <list>, model_warnings <list>,
#> # model_messages <list>, pipeline_code <list>
reveal_model_performance
unpacks the model
performance:
multiverse_results |>
reveal_model_performance()
#> # A tibble: 48 × 14
#> decision specifications model_function model_parameters aic aicc bic
#> <chr> <list> <chr> <list> <dbl> <dbl> <dbl>
#> 1 1 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 2 2 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 3 3 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 4 4 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 5 5 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 834. 835. 853.
#> 6 6 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> 7 7 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 838. 839. 857.
#> 8 8 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 831. 831. 849.
#> 9 9 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 840. 840. 858.
#> 10 10 <tibble [1 × 3]> lm <prmtrs_m [4 × 9]> 832. 832. 851.
#> # ℹ 38 more rows
#> # ℹ 7 more variables: r2 <dbl>, r2_adjusted <dbl>, rmse <dbl>, sigma <dbl>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
You can also choose to expand your decision grid with
.unpack_specs
to see which decisions produced what result.
You have two options for unpacking your decisions - wide
or
long
. If you set .unpack_specs = 'wide'
, you
get one column per decion variable. This is exactly the same as how your
decisions appeared in your grid.
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide")
#> # A tibble: 192 × 22
#> decision ivs dvs include1 include2 include3 model model_meta
#> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 2 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 3 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 4 1 iv1 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 5 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 6 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 7 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 8 2 iv1 dv2 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 9 3 iv2 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> 10 3 iv2 dv1 include1 == 0 include2 != 3 include3 >… lm(d… linear mo…
#> # ℹ 182 more rows
#> # ℹ 14 more variables: model_function <chr>, parameter <chr>,
#> # coefficient <dbl>, se <dbl>, ci <dbl>, ci_low <dbl>, ci_high <dbl>,
#> # t <dbl>, df_error <int>, p <dbl>, model_performance <list>,
#> # model_warnings <list>, model_messages <list>, pipeline_code <list>
If you set .unpack_specs = 'long'
, your decisions get
stacked into two columns: decision_set
and
alternatives
. This format is nice for plotting a particular
result from a multiverse analyses per different decision
alternatives.
multiverse_results |>
reveal_model_performance(.unpack_specs = "long")
#> # A tibble: 288 × 15
#> decision decision_set alternatives model_function model_parameters aic
#> <chr> <chr> <chr> <chr> <list> <dbl>
#> 1 1 ivs iv1 lm <prmtrs_m [4 × 9]> 838.
#> 2 1 dvs dv1 lm <prmtrs_m [4 × 9]> 838.
#> 3 1 include1 include1 == 0 lm <prmtrs_m [4 × 9]> 838.
#> 4 1 include2 include2 != 3 lm <prmtrs_m [4 × 9]> 838.
#> 5 1 include3 include3 > -2.5 lm <prmtrs_m [4 × 9]> 838.
#> 6 1 model linear model lm <prmtrs_m [4 × 9]> 838.
#> 7 2 ivs iv1 lm <prmtrs_m [4 × 9]> 831.
#> 8 2 dvs dv2 lm <prmtrs_m [4 × 9]> 831.
#> 9 2 include1 include1 == 0 lm <prmtrs_m [4 × 9]> 831.
#> 10 2 include2 include2 != 3 lm <prmtrs_m [4 × 9]> 831.
#> # ℹ 278 more rows
#> # ℹ 9 more variables: aicc <dbl>, bic <dbl>, r2 <dbl>, r2_adjusted <dbl>,
#> # rmse <dbl>, sigma <dbl>, model_warnings <list>, model_messages <list>,
#> # pipeline_code <list>
Unpacking specifications alongside specific results allows us to examine the effects of our pipeline decisions.
A powerful way to organize these results is to summarize a specific
results column, say the r2 values of our model over the
entire multiverse. condense()
takes a result column and
summarizes it with the .how
argument, which takes a list in
the form of
list(<a name you pick> = <summary function>)
.
.how
will create a column named like so
<column being condsensed>_<summary function name provided>
.
For this case, we have r2_mean
and
r2_median
.
# model performance r2 summaries
multiverse_results |>
reveal_model_performance() |>
condense(r2, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> r2_mean r2_median r2_list
#> <dbl> <dbl> <list>
#> 1 0.00776 0.00585 <dbl [48]>
# model parameters for our predictor of interest
multiverse_results |>
reveal_model_parameters() |>
filter(str_detect(parameter, "iv")) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 1 × 3
#> coefficient_mean coefficient_median coefficient_list
#> <dbl> <dbl> <list>
#> 1 -0.00606 -0.0114 <dbl [96]>
In the last example, we have filtered our multiverse results to look
at our predictors iv*
to see what the mean and median
effect was (over all combinations of decisions) on our outcomes.
However, we had three versions of our predictor and two outcomes, so
combining dplyr::group_by()
with condense()
might be more informative:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
filter(str_detect(parameter, "iv")) |>
group_by(ivs, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 6 × 5
#> # Groups: ivs [3]
#> ivs dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 iv1 dv1 0.0377 0.0300 <dbl [16]>
#> 2 iv1 dv2 -0.0265 -0.0317 <dbl [16]>
#> 3 iv2 dv1 0.00177 -0.00132 <dbl [16]>
#> 4 iv2 dv2 -0.00699 -0.00879 <dbl [16]>
#> 5 iv3 dv1 -0.00322 0.0156 <dbl [16]>
#> 6 iv3 dv2 -0.0391 -0.0427 <dbl [16]>
If we were interested in all the terms of the model, we can leverage
group_by
further:
multiverse_results |>
reveal_model_parameters(.unpack_specs = "wide") |>
group_by(parameter, dvs) |>
condense(coefficient, list(mean = mean, median = median))
#> # A tibble: 16 × 5
#> # Groups: parameter [8]
#> parameter dvs coefficient_mean coefficient_median coefficient_list
#> <chr> <chr> <dbl> <dbl> <list>
#> 1 (Intercept) dv1 0.102 0.0987 <dbl [24]>
#> 2 (Intercept) dv2 -0.0393 -0.0363 <dbl [24]>
#> 3 iv1 dv1 0.0120 0.0130 <dbl [8]>
#> 4 iv1 dv2 -0.0516 -0.0506 <dbl [8]>
#> 5 iv1:mod dv1 0.0633 0.0699 <dbl [8]>
#> 6 iv1:mod dv2 -0.00149 0.000479 <dbl [8]>
#> 7 iv2 dv1 0.0130 0.0151 <dbl [8]>
#> 8 iv2 dv2 -0.00547 -0.00879 <dbl [8]>
#> 9 iv2:mod dv1 -0.00946 -0.00811 <dbl [8]>
#> 10 iv2:mod dv2 -0.00852 -0.00955 <dbl [8]>
#> 11 iv3 dv1 -0.0667 -0.0677 <dbl [8]>
#> 12 iv3 dv2 -0.0395 -0.0427 <dbl [8]>
#> 13 iv3:mod dv1 0.0602 0.0609 <dbl [8]>
#> 14 iv3:mod dv2 -0.0386 -0.0395 <dbl [8]>
#> 15 mod dv1 0.0663 0.0653 <dbl [24]>
#> 16 mod dv2 -0.0455 -0.0474 <dbl [24]>