We will use the same example as in the introduction vignette, the
Animals
object.
library(tidyverse)
animals <- as.matrix(MASS::Animals)
log_animals <- log(animals)
animal_info <- MASS::Animals %>%
rownames_to_column("Animal") %>%
mutate(is_extinct = case_when(Animal %in% c("Dipliodocus", "Triceratops", "Brachiosaurus") ~ TRUE,
TRUE ~ FALSE),
class = case_when(Animal %in% c("Mountain beaver", "Guinea pig", "Golden hamster", "Mouse", "Rabbit", "Rat") ~ "Rodent",
Animal %in% c("Potar monkey", "Gorilla", "Human", "Rhesus monkey", "Chimpanzee") ~ "Primate",
Animal %in% c("Cow", "Goat", "Giraffe", "Sheep") ~ "Ruminant",
Animal %in% c("Asian elephant", "African elephant") ~ "Elephantidae",
Animal %in% c("Grey wolf") ~ "Canine",
Animal %in% c("Cat", "Jaguar") ~ "Feline",
Animal %in% c("Donkey", "Horse") ~ "Equidae",
Animal == "Pig" ~ "Sus",
Animal == "Mole" ~ "Talpidae",
Animal == "Kangaroo" ~ "Macropodidae",
TRUE ~ "Dinosaurs")) %>%
select(-body, -brain)
Annotations are internally stored as [tibble::tibble()] objects and can be viewed as simple data bases. As such, a key is needed to uniquely identify the rows or the columns. This key is the rownames for row annotation and colnames for column annotation.
This key is called the tag and, unless specified otherwise at the
matrixset
creation, is stored as
.rowname
/.colname
. This special tag can almost
be used as any other annotation traits - see Applying Functions.
When using an external data.frame
to create new
annotations, the data frame must contain this key - it doesn’t have to
be called .rowname
/.colname
- in a
single column.
Moreover, the key values must correspond to rownames/colnames. Values that do not match will simply left out.
To use the annotation at creation, simply use a command like this
ms <- matrixset(msr = animals, log_msr = log_animals, row_info = animal_info,
row_key = "Animal")
ms
#> matrixset of 2 28 × 2 matrices
#>
#> matrix_set: msr
#> A 28 × 2 <dbl> matrix
#> body brain
#> Mountain beaver 1.35 8.10
#> ... ... ...
#> Pig 192.00 180.00
#>
#> matrix_set: log_msr
#> A 28 × 2 <dbl> matrix
#> body brain
#> Mountain beaver 0.30 2.09
#> ... ... ...
#> Pig 5.26 5.19
#>
#>
#> row_info:
#> # A tibble: 28 × 3
#> .rowname is_extinct class
#> <chr> <lgl> <chr>
#> 1 Mountain beaver FALSE Rodent
#> 2 Cow FALSE Ruminant
#> 3 Grey wolf FALSE Canine
#> 4 Goat FALSE Ruminant
#> 5 Guinea pig FALSE Rodent
#> 6 Dipliodocus TRUE Dinosaurs
#> 7 Asian elephant FALSE Elephantidae
#> 8 Donkey FALSE Equidae
#> 9 Horse FALSE Equidae
#> 10 Potar monkey FALSE Primate
#> # ℹ 18 more rows
#>
#>
#> column_info:
#> # A tibble: 2 × 1
#> .colname
#> <chr>
#> 1 body
#> 2 brain
Notice how we used the row_key
argument to specify how
to link the two objects together.
The internal tibble
can be replaced by a new one. This
could be an interesting possibility to add annotations to an existing
matrixset
object where none were registered.
To do so, you can simply do
For the operation to work, a column called .rowname
(or
more generally, what is returned by row_tag()
) must be part
of the data frame.
The column equivalents are column_info
and
column_tag
.
Annotation tibble replacement works even if annotations were registered. Be aware of two things:
This is equivalent to performing a mutating join (default:
[dplyr::left_join()], though all mutating joins - except cross-joins -
are available via the type
argument) between the
matrixset
(.ms
) object’s annotation
tibble
and a data.frame
(.y
).
The by
argument will determine how to join the two
data.frame
s together, so it is not necessary for
y
to have a .rowname
/.colname
column. But when the by
argument is not provided, a natural
join is performed.
One behavior that differs with a true mutating join, is that when a
row from .ms
matches more than one row in .y
,
no row duplication will be performed. Instead, a condition error will be
issued. This is to preserve the matrixset
property that all
row names (and column names) must be unique.
matrixset(msr = animals, log_msr = log_animals) %>%
join_row_info(animal_info, by = c(".rowname" = "Animal"))
Indeed! In using
join_row_info()
/join_column_info()
,
.y
can be a matrixset
object, in which case
the appropriate annotation tibble
will be used.
The only difference is when using the default by = NULL
argument. In that case the row/column tag of each object is used.
If you are familiar with dplyr::mutate()
, then you know
almost everything you need to know about using
annotate_row()
and annotate_column()
.
ms <- matrixset(msr = animals, log_msr = log_animals) %>%
join_row_info(animal_info, by = c(".rowname" = "Animal")) %>%
annotate_column(unit = case_when(.colname == "body" ~ "kg",
TRUE ~ "g")) %>%
annotate_column(us_unit = case_when(unit == "kg" ~ "lb",
TRUE ~ "oz"))
column_info(ms)
#> # A tibble: 2 × 3
#> .colname unit us_unit
#> <chr> <chr> <chr>
#> 1 body kg lb
#> 2 brain g oz
You can decide that you don’t need two unit systems and keep only one
Applying functions to a matrixset
’s matrices is covered
in the Applying Functions vignette.
The idea here is the same, but with the added benefit that the
function result is stored directly as annotation for the
matrixset
object.
ms %>%
annotate_row_from_apply(msr, ratio_brain_body = ~ .i[2]/(10*.i[1])) %>%
row_info()
#> # A tibble: 28 × 4
#> .rowname is_extinct class ratio_brain_body
#> <chr> <lgl> <chr> <dbl>
#> 1 Mountain beaver FALSE Rodent 0.6
#> 2 Cow FALSE Ruminant 0.0910
#> 3 Grey wolf FALSE Canine 0.329
#> 4 Goat FALSE Ruminant 0.416
#> 5 Guinea pig FALSE Rodent 0.529
#> 6 Dipliodocus TRUE Dinosaurs 0.000427
#> 7 Asian elephant FALSE Elephantidae 0.181
#> 8 Donkey FALSE Equidae 0.224
#> 9 Horse FALSE Equidae 0.126
#> 10 Potar monkey FALSE Primate 1.15
#> # ℹ 18 more rows
When groups are registered, results are spread using
tidyr::pivot_wider()
.
ms %>%
row_group_by(class) %>%
annotate_column_from_apply(msr, mean) %>%
column_info()
#> # A tibble: 2 × 13
#> .colname unit Canine Dinosaurs Elephantidae Equidae Feline Macropodidae
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 body kg 36.3 36033. 4600. 354. 51.6 35
#> 2 brain g 120. 91.5 5158. 537 91.3 56
#> # ℹ 5 more variables: Primate <dbl>, Rodent <dbl>, Ruminant <dbl>, Sus <dbl>,
#> # Talpidae <dbl>