Working with large matrices

The filematrix package can be used for matrices of any size. The most convenient way of working with small and moderately sized matrices is to quickly save and load them via fm.create.from.matrix and fm.load respectively.

However, the main purpose of the filematrix package is to allow users to work with matrices many times larger than the amount of computer memory. Such matrices can only be accessed by parts.

Let us setup a sufficiently large matrix for code examples below:

library(filematrix)
fm = fm.create(
        filenamebase = tempfile(),
        nrow = 5000,
        ncol = 10000,
        type = "double")

Accessing all matrix elements

The fastest way to read or write all elements of a filematrix is to work with columns sequentially, multiple columns at a time. It is much faster than accessing a filematrix by rows.

The three examples below illustrate how this can be done for such tasks as - filling filematrix with values, - calculating column sums, and - calculating row sums.

Filling in matrix values

Let us fill in the matrix with random values 512 columns at a time.

step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
    fr = (part-1)*step1 + 1
    to = min(part*step1, runto)
    message( "Filling in columns ", fr, " to ", to)
    fm[,fr:to] = runif(nrow(fm) * (to-fr+1))
}
## Filling in columns 1 to 512
## Filling in columns 513 to 1024
## Filling in columns 1025 to 1536
## Filling in columns 1537 to 2048
## Filling in columns 2049 to 2560
## Filling in columns 2561 to 3072
## Filling in columns 3073 to 3584
## Filling in columns 3585 to 4096
## Filling in columns 4097 to 4608
## Filling in columns 4609 to 5120
## Filling in columns 5121 to 5632
## Filling in columns 5633 to 6144
## Filling in columns 6145 to 6656
## Filling in columns 6657 to 7168
## Filling in columns 7169 to 7680
## Filling in columns 7681 to 8192
## Filling in columns 8193 to 8704
## Filling in columns 8705 to 9216
## Filling in columns 9217 to 9728
## Filling in columns 9729 to 10000
rm(part, step1, runto, nsteps, fr, to)

Calculating column sums

Let us calculate column sums of the filematrix, 256 columns at a time.

fmcolsums = double(ncol(fm))

step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
    fr = (part-1)*step1 + 1
    to = min(part*step1, runto)
    message("Calculating column sums, processing columns ", fr, " to ", to)
    fmcolsums[fr:to] = colSums(fm[,fr:to])
}
## Calculating column sums, processing columns 1 to 512
## Calculating column sums, processing columns 513 to 1024
## Calculating column sums, processing columns 1025 to 1536
## Calculating column sums, processing columns 1537 to 2048
## Calculating column sums, processing columns 2049 to 2560
## Calculating column sums, processing columns 2561 to 3072
## Calculating column sums, processing columns 3073 to 3584
## Calculating column sums, processing columns 3585 to 4096
## Calculating column sums, processing columns 4097 to 4608
## Calculating column sums, processing columns 4609 to 5120
## Calculating column sums, processing columns 5121 to 5632
## Calculating column sums, processing columns 5633 to 6144
## Calculating column sums, processing columns 6145 to 6656
## Calculating column sums, processing columns 6657 to 7168
## Calculating column sums, processing columns 7169 to 7680
## Calculating column sums, processing columns 7681 to 8192
## Calculating column sums, processing columns 8193 to 8704
## Calculating column sums, processing columns 8705 to 9216
## Calculating column sums, processing columns 9217 to 9728
## Calculating column sums, processing columns 9729 to 10000
rm(part, step1, runto, nsteps, fr, to)

message("Sums of first and last columns are ", 
        fmcolsums[1], " and ", tail(fmcolsums,1))
## Sums of first and last columns are 2521.9568372427 and 2494.61230152845

Calculating row sums

Let us calculate column sums of the filematrix, 256 columns at a time.

fmrowsums = double(nrow(fm))

step1 = 512
runto = ncol(fm)
nsteps = ceiling(runto/step1)
for( part in seq_len(nsteps) ) { # part = 1
    fr = (part-1)*step1 + 1
    to = min(part*step1, runto)
    message("Calculating row sums, processing columns ", fr, " to ", to)
    fmrowsums = fmrowsums + rowSums(fm[,fr:to])
}
## Calculating row sums, processing columns 1 to 512
## Calculating row sums, processing columns 513 to 1024
## Calculating row sums, processing columns 1025 to 1536
## Calculating row sums, processing columns 1537 to 2048
## Calculating row sums, processing columns 2049 to 2560
## Calculating row sums, processing columns 2561 to 3072
## Calculating row sums, processing columns 3073 to 3584
## Calculating row sums, processing columns 3585 to 4096
## Calculating row sums, processing columns 4097 to 4608
## Calculating row sums, processing columns 4609 to 5120
## Calculating row sums, processing columns 5121 to 5632
## Calculating row sums, processing columns 5633 to 6144
## Calculating row sums, processing columns 6145 to 6656
## Calculating row sums, processing columns 6657 to 7168
## Calculating row sums, processing columns 7169 to 7680
## Calculating row sums, processing columns 7681 to 8192
## Calculating row sums, processing columns 8193 to 8704
## Calculating row sums, processing columns 8705 to 9216
## Calculating row sums, processing columns 9217 to 9728
## Calculating row sums, processing columns 9729 to 10000
rm(part, step1, runto, nsteps, fr, to)

message("Sums of first and last rows are ", 
        fmrowsums[1], " and ", tail(fmrowsums,1))
## Sums of first and last rows are 5000.74764741189 and 4971.86387187196

Removing filematrix after use

closeAndDeleteFiles(fm)

Version information

sessionInfo()
## R version 3.4.3 (2017-11-30)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] filematrix_1.3 knitr_1.18    
## 
## loaded via a namespace (and not attached):
##  [1] compiler_3.4.3  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
##  [5] htmltools_0.3.6 tools_3.4.3     yaml_2.1.16     Rcpp_0.12.14   
##  [9] stringi_1.1.6   rmarkdown_1.8   stringr_1.2.0   digest_0.6.14  
## [13] evaluate_0.10.1