D4TAlink.light is an R package integrating D4TAlink’s R methods. D4TAlink.light enables seamless compliance with FAIR data and ALCOA principles.
D4TAlink.light’s key features:
D4TAlink is a software suite for the management of data analytics projects developed by SQU4RE.
See also:
Install from CRAN:
Install latest version from Bitbucket:
if (!require("devtools", quietly = TRUE))
install.packages("devtools")
devtools::install_bitbucket("SQ4/d4talink.light",subdir="D4TAlink.light")
Note that you may need to install: - the Bioconductor package
Biobase
(instructions), and - Rtools
(cran.r-project.org/bin/windows).
if (!require("D4TAlink.light", quietly = TRUE)) install.packages("D4TAlink.light")
library(D4TAlink.light)
setTaskAuthor("Doe Johns")
setTaskSponsor("myClient")
setTaskRoot("~/myDataRepository", dirCreate = TRUE)
package
refers here to a work
package)task1 <- initTask(project = "DiseaseABC",
package = "myStudy",
taskname = "2022-09-01_myFirstAnalysis")
task2 <- initTask(project = "DiseaseABC",
package = "myStudy",
taskname = "2022-09-05_mySecondAnalysis")
mytask <- loadTask(project = "DiseaseABC",
package = "myStudy",
taskname = "2022-09-05_mySecondAnalysis")
d <- list(letters = data.frame(a = LETTERS, b = letters, c = 1:length(letters)),
other = data.frame(a = 1:3, b = 11:13))
saveBinary(d, mytask, "myTables")
excelfilename <- saveReportXls(d, mytask, "tables")
pdffilename <- pdfReport(mytask, "myPlot", dim = c(150, 150)) # 150mm x 150mm
plot(pi)
dev.off()
csvfile <- reportFn(mytask, "someData", "csv")
p <- data.frame(a = LETTERS, b = letters, c = 1:length(letters))
write.table(p, csvfile)
print(csvfile)
# May require having run 'tinytex::install_tinytex()'
docfile <- renderTaskRmd(mytask)
if (require("Biobase", quietly = TRUE)) Biobase::openPDF(docfile)
Once the R package loaded, user must set D4TAlink’s global parameters, namely the name of the data analyst and the name of the study sponsor.
The location of the data file repository, must then be defined. Indeed, D4TAlink manages data and information in flat files within a structured directory tree.
As described further below, other parameters can be defined.
Note that D4TAlink’s parameters can be set via the .Renviron
file located in the system home directory.
D4TAlink_author="Dow Johns"
D4TAlink_sponsor="CompanyA"
D4TAlink_root="/SOME/WHERE/D4TAlink_example001"
D4TAlink_rmdtempl="/SOME/WHERE/my.Rmd"
D4TAlink_rscripttempl="/SOME/WHERE/my.R"
D4TAlink_pathgen="pathsDefault"
A data analysis workflow usually comprises a succession of distinct analyses tasks. A typical analysis workflow would comprise the following tasks:
Coding these successive tasks using a single analysis script is bad practice for multiple reasons. Firstly, the analysis scripts become lengthy and thus difficult to write, review and maintain. Further, this prevents code reuse and it hinders project agility. Finally, this complexifies collaboration between stake holders.
D4TAlink defines the ‘analysis task’ as a central concept. A data analysis workflow consists of a succession of tasks that could be arborescent.
Each task is assigned to a work package, which is assigned to a project, and each project is assigned to a sponsor.
To create an analysis task in R use the following calls.
# Set the base parameters
library(D4TAlink.light)
setTaskAuthor("Doe Johns")
setTaskSponsor("mySponsor")
setTaskRoot(file.path(tempdir(),"D4TAlink_example001"),dirCreate=TRUE)
# create a task
task <- initTask(project="myProject",
package="myPackage",
taskname=sprintf("%s_myTask",format(Sys.time(),"%Y%m%d")))
print(listTasks())
Each task has it’s own directory structure. The task contains storage for five types of data:
The location of these data can be obtained using respectively the
functions reportDir
, datasourceDir
,
progDir
, docDir
, and
binaryDir
.
For traceability, the files within a task have specifically the
format [TASK_NAME]_[DATA_TYPE].[EXTENSTION]
, where
DATA_TYPE
is a short string describing the content of the
file, and EXTENSION
the file tyle (e.g., pdf
or xlsx
). By convention TASK_NAME
has a date
as prefix with format %Y%m%d_
, and DATA_TYPE
does not contain underscores or dots, _
or
.
.
The function listTaskFiles
returns a list of files
associated with a task:
Similarly, the function listTasks
returns a list of all
tasks in the repository:
Documentation of a task is typically authored using R markdown files (Rmd). D4TAlink precognises to have one Rmd
file per task. D4TAlink.light
provides functions to create
and render these files.
Creation of an R markdown file from template:
Rendering of the markdown file into the task documentation directory:
For some tasks an R script may also be needed. A task script can be created from the default template:
To output a report file in the output directory of a task, use the following.
XLSX
d <- list(letters=data.frame(a=LETTERS,b=letters,c=1:length(letters)),
other=data.frame(a=1:3,b=11:13))
file <- saveReportXls(d,task,"tables")
print(file)
file <- pdfReport(task,c("plots",1),dim=c(100,100))
hist(rnorm(100))
dev.off()
Biobase::openPDF(file)
PNG
JPEG
Other
Tasks each constituting an element in a stepwise analysis process.
Data can be transferred from a task to another. To do so, R objects must
be stored by the parent task using the call
saveBinary(object,task,"ojectType")
. The child task may
then load the data from the parent task using the call
saveBinary(loadTask(...),"ojectType")
.
Saving data in a parent task:
d <- list(letters = data.frame(a=LETTERS,b=letters,c=1:length(letters)),
other = data.frame(a=1:3,b=11:13))
task <- initTask(project="myProject",
package="myPackage",
taskname="20220801_parentTask")
file <- saveBinary(d,task,"someData")
print(file)
Loading data from a child task:
The R markdown and script templates can be set using the functions
setTaskRmdTemplate
and setTaskRscriptTemplate
as follows.
The available path generation functions are
pathsDefault
, pathsGLPG
, and
pathsPMS
.
Further, the path path th the template can be set in the .Renviron
file:
D4TAlink_rmdtempl="/SOME/WHERE/my.Rmd"
D4TAlink_rscripttempl="/SOME/WHERE/my.R"
The directory structure can be customized, by creating a directory
using the command setTaskStructure
as follows.
fun <- function(project,package,taskname,sponsor) {
basePath <- file.path("%ROOT%",sponsor,project,package)
paths <- list(
root = "%ROOT%",
datasrc = file.path(basePath, "raw", "data_source"),
data = file.path(basePath, "output","adhoc",taskname),
bin = file.path(basePath, "output","adhoc",taskname,"bin"),
code = file.path(basePath, "progs"),
doc = file.path(basePath, "docs"),
log = file.path(basePath, "output","log")
)
}
setTaskStructure(fun)
The available path generation functions are
pathsDefault
, pathsGLPG
, and
pathsPMS
.
Further, the path generator can be set in the .Renviron
file, the available functions being ‘pathsDefault’, ‘pathsGLPG’, and
‘pathsPMS’:
D4TAlink_pathgen="pathsDefault"