This package contains two functions useful to compute the incubation period distribution from outbreak data. The inputs needed for each patient are given as a data.frame
or linelist
object and must contain:
The function empirical_incubation_dist()
computes the discrete probability distribution by giving equal weight to each patient. Thus, in the case of N
patients, the n
possible exposure dates of a given patient get the overall weight 1/(n*N)
. The function returns a data frame with column incubation_period
containing the different incubation periods with a time step of one day and their relative_frequency
.
The function fit_gamma_incubation_dist()
takes the same inputs, but directly samples from the empirical distribution and fits a discrete gamma distribution to it by the means of fit_disc_gamma
.
Load environment:
library(magrittr)
library(epitrix)
library(distcrete)
library(ggplot2)
Make a linelist object containing toy data with several possible exposure dates for each case:
sim_linelist(15)
ll <-
0:15
x <- distcrete("gamma", 1, shape = 12, rate = 3, w = 0)$d(x)
y <- function(i) {
mkexposures <-- sample(x, size = sample.int(5, size = 1), replace = FALSE, prob = y)
i
} sapply(ll$date_of_onset, mkexposures)
exposures <-$dates_exposure <- exposures
ll
print(ll)
#> id date_of_onset date_of_report gender outcome
#> 1 1 2020-01-23 2020-02-01 male recovery
#> 2 2 2020-02-14 2020-02-18 male death
#> 3 3 2020-01-25 2020-01-29 female recovery
#> 4 4 2020-01-16 2020-01-30 male recovery
#> 5 5 2020-01-22 2020-01-28 male death
#> 6 6 2020-01-26 2020-01-31 male recovery
#> 7 7 2020-02-09 2020-02-16 female recovery
#> 8 8 2020-02-17 2020-02-24 female recovery
#> 9 9 2020-01-14 2020-01-20 male recovery
#> 10 10 2020-02-22 2020-03-12 male recovery
#> 11 11 2020-02-26 2020-03-04 male recovery
#> 12 12 2020-01-06 2020-01-10 male recovery
#> 13 13 2020-02-23 2020-02-29 female recovery
#> 14 14 2020-01-08 2020-01-16 female recovery
#> 15 15 2020-01-21 2020-01-26 male recovery
#> dates_exposure
#> 1 18281, 18280
#> 2 18303, 18305
#> 3 18282
#> 4 18274, 18273, 18275, 18272, 18271
#> 5 18279
#> 6 18281
#> 7 18297, 18298
#> 8 18306, 18304, 18305, 18307
#> 9 18270, 18272
#> 10 18308, 18311, 18310, 18313, 18312
#> 11 18315, 18316, 18314, 18317, 18313
#> 12 18264, 18263, 18265, 18262
#> 13 18313, 18312, 18310, 18309
#> 14 18264, 18265, 18266
#> 15 18279, 18277, 18280, 18278
Empirical distribution:
empirical_incubation_dist(ll, date_of_onset, dates_exposure)
incubation_period_dist <-print(incubation_period_dist)
#> # A tibble: 7 × 2
#> incubation_period relative_frequency
#> <dbl> <dbl>
#> 1 0 0
#> 2 1 0.06
#> 3 2 0.107
#> 4 3 0.262
#> 5 4 0.312
#> 6 5 0.149
#> 7 6 0.11
ggplot(incubation_period_dist, aes(incubation_period, relative_frequency)) +
geom_col()
Fit discrete gamma:
fit_gamma_incubation_dist(ll, date_of_onset, dates_exposure)
fit <-print(fit)
#> $mu
#> [1] 4.229868
#>
#> $cv
#> [1] 0.32265
#>
#> $sd
#> [1] 1.364767
#>
#> $ll
#> [1] -1729.577
#>
#> $converged
#> [1] TRUE
#>
#> $distribution
#> A discrete distribution
#> name: gamma
#> parameters:
#> shape: 9.60586714704713
#> scale: 0.440342153837883
c(0:10)
x = fit$distribution$d(x)
y =ggplot(data.frame(x = x, y = y), aes(x, y)) +
geom_col(data = incubation_period_dist, aes(incubation_period, relative_frequency)) +
geom_point(stat="identity", col = "red", size = 3) +
geom_line(stat="identity", col = "red")
Note that if the possible exposure dates are consecutive for all patients then empirical_incubation_dist()
and fit_gamma_incubation_dist()
can take date ranges as inputs instead of lists of individual exposure dates (see help for details).