matrixset
MatricesThere are two ways to apply functions to the matrices of a
matrixset
object. The first one is through the
apply_*
family, which will be covered here.
The second is through mutate_matrix()
, covered in the
next section.
There are 3 functions in the apply_*
family:
apply_matrix()
: The functions must take a matrix as
input. In base R, this is similar to simply calling
fun(matrix_object)
.apply_row()
: The functions must take a vector as input.
The vector will be a matrix row. In base R, this is akin to
apply(matrix_object, 1, fun, simplify = FALSE)
.apply_column()
: The functions must take a vector as
input. The vector will be a matrix column. In base R, this is similar to
apply(matrix_object, 2, fun, simplify = FALSE)
.Each of these function will loop on the matrixset
object’s matrices to apply the functions. In the case of
apply_row()
and apply_column()
, an additional
loop on the margin (row or column, as applicable) is executed, so that
the functions are applied to each matrix and margin.
To see the functions in action, we will use the following object:
animals_ms
#> matrixset of 2 28 × 2 matrices
#>
#> matrix_set: msr
#> A 28 × 2 <dbl> matrix
#> body brain
#> Mountain beaver 1.35 8.10
#> ... ... ...
#> Pig 192.00 180.00
#>
#> matrix_set: log_msr
#> A 28 × 2 <dbl> matrix
#> body brain
#> Mountain beaver 0.30 2.09
#> ... ... ...
#> Pig 5.26 5.19
#>
#>
#> row_info:
#> # A tibble: 28 × 3
#> .rowname is_extinct class
#> <chr> <lgl> <chr>
#> 1 Mountain beaver FALSE Rodent
#> 2 Cow FALSE Ruminant
#> 3 Grey wolf FALSE Canine
#> 4 Goat FALSE Ruminant
#> 5 Guinea pig FALSE Rodent
#> 6 Dipliodocus TRUE Dinosaurs
#> 7 Asian elephant FALSE Elephantidae
#> 8 Donkey FALSE Equidae
#> 9 Horse FALSE Equidae
#> 10 Potar monkey FALSE Primate
#> # ℹ 18 more rows
#>
#>
#> column_info:
#> # A tibble: 2 × 2
#> .colname unit
#> <chr> <chr>
#> 1 body kg
#> 2 brain g
We will use the following custom printing functions for compactness purposes.
show_matrix <- function(x) {
if (nrow(x) > 4) {
newx <- head(x, 4)
storage.mode(newx) <- "character"
newx <- rbind(newx, rep("...", ncol(x)))
} else newx <- x
newx
}
show_vector <- function(x) {
newx <- if (length(x) > 4) {
c(as.character(x[1:4]), "...")
} else x
newx
}
show_lst <- function(x) {
lapply(x, function(u) {
if (is.matrix(u)) show_matrix(u) else if (is.vector(u)) show_vector(u) else u
})
}
So now, let’s see the apply_matrix()
in action.
library(magrittr)
library(purrr)
out <- animals_ms %>%
apply_matrix(exp,
~ mean(.m, trim=.1),
foo=asinh,
pow = ~ 2^.m,
reg = ~ {
is_alive <- !is_extinct
lm(.m ~ is_alive + class)
})
#> Warning: Formatting NULL matrices was deprecated in matrixset 0.4.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
# out[[1]] %>% map(~ if (is.matrix(.x)) {head(.x, 5)} else .x)
show_lst(out[[1]])
#> $exp
#> body brain
#> Mountain beaver "3.85742553069697" "3294.46807528384"
#> Cow "8.84981281719581e+201" "5.08821958272978e+183"
#> Grey wolf "5996785676464821" "7.91025688556692e+51"
#> Goat "1029402857448.45" "8.78750163583702e+49"
#> "..." "..."
#>
#> $`mean(.m, trim = 0.1)`
#> [1] 335.1291
#>
#> $foo
#> body brain
#> Mountain beaver "1.10857244179685" "2.78880004092018"
#> Cow "6.83518574234833" "6.74052075680554"
#> Grey wolf "4.28598038575143" "5.47648105816811"
#> Goat "4.01346111184316" "5.43809821197888"
#> "..." "..."
#>
#> $pow
#> body brain
#> Mountain beaver "2.54912125463852" "274.374006409291"
#> Cow "9.52682052708738e+139" "2.16614819853189e+127"
#> Grey wolf "86381301347.2935" "9.39906129562518e+35"
#> Goat "212075099.808884" "4.15383748682786e+34"
#> "..." "..."
#>
#> $reg
#>
#> Call:
#> lm(formula = .m ~ is_alive + class)
#>
#> Coefficients:
#> body brain
#> (Intercept) 36033.33 91.50
#> is_aliveTRUE -35997.00 28.00
#> classDinosaurs NA NA
#> classElephantidae 4564.17 5038.00
#> classEquidae 317.72 417.50
#> classFeline 15.32 -28.20
#> classMacropodidae -1.33 -63.50
#> classPrimate 31.26 372.50
#> classRodent -35.44 -114.67
#> classRuminant 232.96 228.75
#> classSus 155.67 60.50
#> classTalpidae -36.21 -116.50
We have showcased several features of the apply_*
functions:
You probably have noticed the use of .m
. This is a
pronoun that is accessible inside apply_matrix()
and refers
to the current matrix in the internal loop. Similar pronouns exists for
apply_row()
and apply_column()
, and they are
respecticely .i
and .j
.
The returned object is a list of lists. The first layer is for each matrix and the second layer is for each function call.
Let’s now showcase the row/column version with a
apply_column()
example:
out <- animals_ms %>%
apply_column(exp,
~ mean(.j, trim=.1),
foo=asinh,
pow = ~ 2^.j,
reg = ~ {
is_alive <- !is_extinct
lm(.j ~ is_alive + class)
})
out[[1]] %>% map(show_lst)
#> $body
#> $body$exp
#> [1] "3.85742553069697" "8.84981281719581e+201" "5996785676464821"
#> [4] "1029402857448.45" "..."
#>
#> $body$`mean(.j, trim = 0.1)`
#> [1] 879.0059
#>
#> $body$foo
#> [1] "1.10857244179685" "6.83518574234833" "4.28598038575143" "4.01346111184316"
#> [5] "..."
#>
#> $body$pow
#> [1] "2.54912125463852" "9.52682052708738e+139" "86381301347.2935"
#> [4] "212075099.808884" "..."
#>
#> $body$reg
#>
#> Call:
#> lm(formula = .j ~ is_alive + class)
#>
#> Coefficients:
#> (Intercept) is_aliveTRUE classDinosaurs classElephantidae
#> 36033.33 -35997.00 NA 4564.17
#> classEquidae classFeline classMacropodidae classPrimate
#> 317.72 15.32 -1.33 31.26
#> classRodent classRuminant classSus classTalpidae
#> -35.44 232.96 155.67 -36.21
#>
#>
#>
#> $brain
#> $brain$exp
#> [1] "3294.46807528384" "5.08821958272978e+183" "7.91025688556692e+51"
#> [4] "8.78750163583702e+49" "..."
#>
#> $brain$`mean(.j, trim = 0.1)`
#> [1] 240.425
#>
#> $brain$foo
#> [1] "2.78880004092018" "6.74052075680554" "5.47648105816811" "5.43809821197888"
#> [5] "..."
#>
#> $brain$pow
#> [1] "274.374006409291" "2.16614819853189e+127" "9.39906129562518e+35"
#> [4] "4.15383748682786e+34" "..."
#>
#> $brain$reg
#>
#> Call:
#> lm(formula = .j ~ is_alive + class)
#>
#> Coefficients:
#> (Intercept) is_aliveTRUE classDinosaurs classElephantidae
#> 91.5 28.0 NA 5038.0
#> classEquidae classFeline classMacropodidae classPrimate
#> 417.5 -28.2 -63.5 372.5
#> classRodent classRuminant classSus classTalpidae
#> -114.7 228.7 60.5 -116.5
The idea is similar, but in the returned object, there is a third
list layer: the first layer for the matrices, the second layer for the
columns (it would be rows for apply_row()
) and the third
layer for the functions.
Note as well the use of the .j
pronoun instead of
.m
.
The apply_*
functions understand data grouping and will
execute on the proper matrix/vector subsets.
animals_ms %>%
row_group_by(class) %>%
apply_matrix(exp,
~ mean(.m, trim=.1),
foo=asinh,
pow = ~ 2^.m,
reg = ~ {
is_alive <- !is_extinct
lm(.m ~ is_alive)
})
#> $msr
#> # A tibble: 11 × 2
#> class .vals
#> <chr> <list>
#> 1 Canine <named list [5]>
#> 2 Dinosaurs <named list [5]>
#> 3 Elephantidae <named list [5]>
#> 4 Equidae <named list [5]>
#> 5 Feline <named list [5]>
#> 6 Macropodidae <named list [5]>
#> 7 Primate <named list [5]>
#> 8 Rodent <named list [5]>
#> 9 Ruminant <named list [5]>
#> 10 Sus <named list [5]>
#> 11 Talpidae <named list [5]>
#>
#> $log_msr
#> # A tibble: 11 × 2
#> class .vals
#> <chr> <list>
#> 1 Canine <named list [5]>
#> 2 Dinosaurs <named list [5]>
#> 3 Elephantidae <named list [5]>
#> 4 Equidae <named list [5]>
#> 5 Feline <named list [5]>
#> 6 Macropodidae <named list [5]>
#> 7 Primate <named list [5]>
#> 8 Rodent <named list [5]>
#> 9 Ruminant <named list [5]>
#> 10 Sus <named list [5]>
#> 11 Talpidae <named list [5]>
As one can see, the output format differs in situation of grouping.
We still end up with a list with an element for each matrix, but each of
these element is now a tibble
.
Each tibble has a column called .vals
, where the
function results are stored. This column is a list, one element per
group. The group labels are given by the other columns of the tibble.
For a given group, things are like the ungrouped version: further
sub-lists for rows/columns - if applicable - and function values.
Similar to the apply()
function that has a
simplify
argument, the output structured can be simplified,
baring two conditions:
is.vector
returns TRUE
.If the conditions are met, each apply_*
function has two
simplified version available: _dfl
and
dfw
.
Below is the _dfl
flavor in action. We point out two
things to notice:
apply_column_dfl
(and _dfw
), a
.column
column stores the column ID (.row
for
apply_row_*
).lm
result in a list
so that
the outcome is vector.animals_ms %>%
apply_matrix_dfl(~ mean(.m, trim=.1),
MAD=mad,
reg = ~ {
is_alive <- !is_extinct
list(lm(.m ~ is_alive + class))
})
#> $msr
#> # A tibble: 1 × 3
#> `mean(.m, trim = 0.1)` MAD reg
#> <dbl> <dbl> <list>
#> 1 335. 155. <mlm>
#>
#> $log_msr
#> # A tibble: 1 × 3
#> `mean(.m, trim = 0.1)` MAD reg
#> <dbl> <dbl> <list>
#> 1 4.18 2.35 <mlm>
animals_ms %>%
apply_column_dfl(~ mean(.j, trim=.1),
MAD=mad,
reg = ~ {
is_alive <- !is_extinct
list(lm(.j ~ is_alive + class))
})
#> $msr
#> # A tibble: 2 × 4
#> .colname `mean(.j, trim = 0.1)` MAD reg
#> <chr> <dbl> <dbl> <list>
#> 1 body 879. 79.5 <lm>
#> 2 brain 240. 193. <lm>
#>
#> $log_msr
#> # A tibble: 2 × 4
#> .colname `mean(.j, trim = 0.1)` MAD reg
#> <chr> <dbl> <dbl> <list>
#> 1 body 3.78 3.38 <lm>
#> 2 brain 4.49 1.71 <lm>
If using apply_column_dfw
in this context, you wouldn’t
notice a difference in output format.
The difference between the two lies when the vectors are of length > 1.
animals_ms %>%
apply_row_dfl(rg = ~ range(.i),
qt = ~ quantile(.i, probs = c(.25, .75)))
#> $msr
#> # A tibble: 56 × 5
#> .rowname rg.name rg qt.name qt
#> <chr> <chr> <dbl> <chr> <dbl>
#> 1 Mountain beaver ..1 1.35 25% 3.04
#> 2 Mountain beaver ..2 8.1 75% 6.41
#> 3 Cow ..1 423 25% 434.
#> 4 Cow ..2 465 75% 454.
#> 5 Grey wolf ..1 36.3 25% 57.1
#> 6 Grey wolf ..2 120. 75% 98.7
#> 7 Goat ..1 27.7 25% 49.5
#> 8 Goat ..2 115 75% 93.2
#> 9 Guinea pig ..1 1.04 25% 2.16
#> 10 Guinea pig ..2 5.5 75% 4.38
#> # ℹ 46 more rows
#>
#> $log_msr
#> # A tibble: 56 × 5
#> .rowname rg.name rg qt.name qt
#> <chr> <chr> <dbl> <chr> <dbl>
#> 1 Mountain beaver ..1 0.300 25% 0.748
#> 2 Mountain beaver ..2 2.09 75% 1.64
#> 3 Cow ..1 6.05 25% 6.07
#> 4 Cow ..2 6.14 75% 6.12
#> 5 Grey wolf ..1 3.59 25% 3.89
#> 6 Grey wolf ..2 4.78 75% 4.49
#> 7 Goat ..1 3.32 25% 3.68
#> 8 Goat ..2 4.74 75% 4.39
#> 9 Guinea pig ..1 0.0392 25% 0.456
#> 10 Guinea pig ..2 1.70 75% 1.29
#> # ℹ 46 more rows
animals_ms %>%
apply_row_dfw(rg = ~ range(.i),
qt = ~ quantile(.i, probs = c(.25, .75)))
#> $msr
#> # A tibble: 28 × 5
#> .rowname `rg ..1` `rg ..2` `qt 25%` `qt 75%`
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Mountain beaver 1.35 8.1 3.04 6.41
#> 2 Cow 423 465 434. 454.
#> 3 Grey wolf 36.3 120. 57.1 98.7
#> 4 Goat 27.7 115 49.5 93.2
#> 5 Guinea pig 1.04 5.5 2.16 4.38
#> 6 Dipliodocus 50 11700 2962. 8788.
#> 7 Asian elephant 2547 4603 3061 4089
#> 8 Donkey 187. 419 245. 361.
#> 9 Horse 521 655 554. 622.
#> 10 Potar monkey 10 115 36.2 88.8
#> # ℹ 18 more rows
#>
#> $log_msr
#> # A tibble: 28 × 5
#> .rowname `rg ..1` `rg ..2` `qt 25%` `qt 75%`
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Mountain beaver 0.300 2.09 0.748 1.64
#> 2 Cow 6.05 6.14 6.07 6.12
#> 3 Grey wolf 3.59 4.78 3.89 4.49
#> 4 Goat 3.32 4.74 3.68 4.39
#> 5 Guinea pig 0.0392 1.70 0.456 1.29
#> 6 Dipliodocus 3.91 9.37 5.28 8.00
#> 7 Asian elephant 7.84 8.43 7.99 8.29
#> 8 Donkey 5.23 6.04 5.43 5.84
#> 9 Horse 6.26 6.48 6.31 6.43
#> 10 Potar monkey 2.30 4.74 2.91 4.13
#> # ℹ 18 more rows
We can observe three things:
It may happen that you need to get information about the current group. For this reason, the following context functions are made available:
current_n_row()
and current_n_column()
.
They each give the number of rows and columns, respectively, of the
current matrix.
They are the context equivalent of nrow()
and
ncol()
.
current_row_info()
and
current_column_info()
. They give access to the current
row/column annotation data frame. The are the context equivlent of
row_info()
and column_info()
.
row_pos()
and column_pos()
. They give
the current row/column indices. The indices are the the ones before
matrix subsetting.
row_rel_pos()
and column_rel_pos()
.
They give the row/column indices relative to the current matrix. They
are equivalent to
seq_len(current_n_row())
/seq_len(current_n_column())
.
For instance, a simple way of knowing the number of animals per group could be
animals_ms %>%
row_group_by(class) %>%
apply_matrix_dfl(n = ~ current_n_row()) %>%
.$msr
#> # A tibble: 11 × 2
#> class n
#> <chr> <int>
#> 1 Canine 1
#> 2 Dinosaurs 3
#> 3 Elephantidae 2
#> 4 Equidae 2
#> 5 Feline 2
#> 6 Macropodidae 1
#> 7 Primate 5
#> 8 Rodent 6
#> 9 Ruminant 4
#> 10 Sus 1
#> 11 Talpidae 1
The context functions can also be of use when one or more traits are shared (in name) between rows and columns.
Here’s a pseudo-code example:
It may happen that a variable in the calling environment shares its
name with a trait of a matrixset
object.
You can make it explicit which version of the variable you are using
the pronouns .data
(the trait annotation version) and
.env
.
reg_expr <- expr({
is_alive <- !is_extinct
list(lm(.j ~ is_alive + class))
})
animals_ms %>%
apply_column_dfl(~ mean(.j, trim=.1),
MAD=mad,
reg = ~ !!reg_expr)
#> $msr
#> # A tibble: 2 × 4
#> .colname `mean(.j, trim = 0.1)` MAD reg
#> <chr> <dbl> <dbl> <list>
#> 1 body 879. 79.5 <lm>
#> 2 brain 240. 193. <lm>
#>
#> $log_msr
#> # A tibble: 2 × 4
#> .colname `mean(.j, trim = 0.1)` MAD reg
#> <chr> <dbl> <dbl> <list>
#> 1 body 3.78 3.38 <lm>
#> 2 brain 4.49 1.71 <lm>