The aim of andurinha is provides tools to make spectroscopic data processing easier and faster. It allows to find and select peaks based on the second derivative or absorbance sum spectrum. Furthermore, it supplies functions for graphic support, which makes the workflow more user friendly.
# install.packages("devtools")
::install_github("noemiallefs/andurinha") devtools
There are two common situations when importing the spectroscopic data.
In the first case, data may be in the same file with the structure:
First column: wave numbers.
The following columns: the samples absorbances.
In that case, import the data with the most suitable function.
In the second case, data may be in separated files with the structure:
First column: wave numbers.
Second column: sample absorbance.
In that case, the function importSpectra()
can be used;
to do so the files extension should be .csv
and they must
be in the same directory - this folder must not contain any other
file.
<- importSpectra("/path/to/your/spectraFiles/", sep = ";")
spectra head(spectra)
#> WN A B C
#> 1 399 0.011 0.008 0.009
#> 2 401 0.008 0.006 0.006
#> 3 403 0.006 0.005 0.006
#> 4 405 0.005 0.005 0.005
#> 5 407 0.005 0.005 0.003
#> 6 409 0.003 0.004 0.002
The function findPeaks()
verifies the spectra quality,
finds peaks (surprise!) and allows to select the most relevant
ones based on the absorbance or second derivative sum spectrum. To use
it the data must be in the appropriate format, it means that the object
class must be a data frame
with the structure:
First column: wave numbers.
The following columns: samples absorbances.
This function has five arguments:
resolution
: the equipment measurement resolution,
which is by default 4 cm-1.
minAbs
: the cut off value to check spectra quality,
which is by default a spectrum absorbance maximum of 0.1.
cutOff
: the second derivative or absorbance sum
spectrum cut off to reduce the raw peaks table, which is by default
NULL
.
scale
: by default is TRUE
and data is
scaled as Z-scores. FALSE
should be used in case you do not
want to scale it.
ndd
: by default is TRUE
and peaks are
searched based on the second derivative sum spectrum. FALSE
should be used in case you want to search them based on absorbance sum
spectrum.
This function - with all the arguments by default - returns a
list
with four data frames
:
dataZ: the standardised data by Z-scores.
secondDerivative: the second derivative values of the data.
sumSpectrum_peaksTable: the peaks wave numbers and their second derivative or absorbance sum spectrum values.
peaksTable: the peaks wave numbers and their absorbance for each spectrum.
By default, if there is any spectrum with a maximum absorbance lower
than 0.1 a warning will be returned; in case this shows up and you want
to continue, you should modify the minAbs
value. Once the
quality control has been passed, by default the data is scaled - to skip
it use scale = FASLE
- the next steps will depend on the
selected method for finding peaks:
Absorbance sum spectrum: in this case the absorbance sum spectrum is calculated and the peaks are searched based on it.
Second derivative sum spectrum: in this case the second derivative of the absorbance data is calculated and then the peaks are searched based on the sum spectrum.
# Search peaks based on absorbance sum spectrum
# with standarised absorbance data
<- findPeaks(andurinhaData, ndd = FALSE)
fp.abs summary(fp.abs)
dim(fp.abs$sumSpectrum_peaksTable)
# Search peaks based on second derivative sum spectrum
# with standarised absorbance data
<- findPeaks(andurinhaData)
fp.ndd summary(fp.ndd)
dim(fp.ndd$sumSpectrum_peaksTable)
# Search peaks based on second derivative sum spectrum
# with no standarised absorbance data
<- findPeaks(andurinhaData, scale = FALSE)
fp.nZs summary(fp.nZs)
dim(fp.nZs$sumSpectrum_peaksTable)
To visualised both the raw data and the processed data by
findPeaks()
; the functions gOverview()
and
plotPeaks()
may be applied.
gOverview()
:Gives a graphic summary of the data. This function has the arguments:
data_abs
: to provide a data frame with the
absorbance data. The structure should be: wave numbers in the first
column and samples absorbance in the following columns.
data_ndd
: to provide a data frame with the second
derivative data. The structure should be: wave numbers in the first
column and samples second derivative values in the following
columns.
fontFamily
: to change the plot font.
# Graphic overview of the raw data
gOverview(andurinhaData)
# Graphic overview of the processed data
# Peaks searched based on the second derivative sum spectrum
# with standarised absorbance data
gOverview(fp.ndd$dataZ, fp.ndd$secondDerivative)
plotPeaks()
:Makes a graphic representation of the peaks that have been found over
the second derivative or absorbance sum spectra. This plot, together
with the sumSpectrum_peaksTable, allows to choose the
desired cutOff
value to reduce the peaks table by running
again findPeaks()
. This function has the arguments:
peaksWN
: to provide a vector with the peaks wave
numbers.
data_abs
: to provide a data frame with the
absorbance data. The structure should be: wave numbers in the first
column and samples absorbance in the following columns.
data_ndd
: to provide a data frame with the second
derivative data. The structure should be: wave numbers in the first
column and samples second derivative values in the following
columns.
fontFamily
: to change the plot font.
# Peaks searched based on absorbance sum spectrum
plotPeaks(fp.abs[[3]]$WN,
data_abs = fp.abs$dataZ)
# Peaks searched based on the second derivative sum spectrum
plotPeaks(fp.ndd[[4]]$WN,
data_abs = fp.ndd$dataZ,
data_ndd = fp.ndd$secondDerivative)
To reduce the peaks table a cut off must be selected; this may be
based on the second derivative or on the absorbance sum spectra values.
Therefore, the sumSpectrum_peaksTable
must be kept in mind;
to make the choice easier it may be ordered and then filtered. The
function plotPeaks()
may be very useful to make the
choice.
When the peaks search is made based on the absorbance sum spectrum, the number of peaks found will be lower than that obtained when using the second derivative sum spectrum. Due to the smaller number of peaks selection may be not needed. This makes the workflow easier, but this method might not find all relevant peaks. Owing to the search based on absorbance sum spectrum is less efficient than the search based on the second derivative sum spectrum, we recommend the second method. But this will need a little bit more user work. The following examples show the differences between both methods:
# Select cutOff based on absorbance sum spectrum
# to clean your peaks table
round(fp.abs$sumSpectrum_peaksTable, 2) %>%
arrange(desc(sumSpectrum))
# In that case cleaning may not be necesary
# Select cutOff based on second derivative sum spectrum
# to clean your peaks table
round(fp.ndd$sumSpectrum_peaksTable, 2) %>%
arrange(desc(sumSpectrum)) %>%
filter(sumSpectrum > 0.18)
# In that case a cut off of 0.25 my be selected
When the cut off has been chosen then run findPeaks()
changing the cutOff
value to the desired one.
# Run finPeaks() with the new cutOff
# based on the second derivative sum spectrum
<- findPeaks(andurinhaData, cutOff = 0.25) fp.ndd2
Let’s see the result!
# plotPeaks
# based on absorbance sum spectrum
# no cleaning needed
plotPeaks(fp.ndd2[[3]]$WN,
data_abs = fp.ndd2$dataZ)
# plotPeaks
# based on the second derivative sum spectrum
plotPeaks(fp.ndd2[[4]]$WN,
data_abs = fp.ndd2$dataZ,
data_ndd = fp.ndd2$secondDerivative)
Noemi Alvarez Fernandez and Antonio Martinez Cortizas (2020). andurinha: Make Spectroscopic Data Processing Easier. R package version 0.0.2. https://github.com/noemiallefs/andurinha