Several vignettes explored how to pool rows and/or columns to deal with the near singularity of the recapture matrix. A new function has been introduced SPAS.autopool() that attempts to automate this process.
The automatic pooling algorithm follows recommendations in Schwarz and Taylor (1998) and proceeds as follows:
The values of min.released,min.inspected,min.recaps can be passed as arguments to the function.
The function returns a suggested pooling vector for the rows and columns and a reduced data matrix if there are rows that are all zero or columns that are all zero.
The SPAS.fit.model() function also now has a autopool argument where an automated pooling will be attempted rather than you passing the pooling vectors.
The example from the Harrison River will be used:
library(SPAS)
#> ***** SPAS: Stratified Petersen Analysis System - Version 2024.1.31 2024-01-31) *****
#>
#> Help available with help(package='SPAS')
#> Several vignettes are available. See browseVignettes(package="SPAS")
This will load the SPAS fitting functions and any associated packages needed by SPAS.
The data should be stored as an \(s+1\) by \(t+1\) (before pooling) matrix. The \(s \times t\) upper left matrix is the number of animals released in row stratum \(i\) and recovered in column stratum \(j\). Row \(s+1\) contains the total number of UNMARKED animals recovered in column stratum \(j\). Column \(t+1\) contains the number of animals marked in each row stratum but not recovered in any column stratum. The \(rawdata[s+1, t+1]\) is not used and can be set to 0 or NA. The sum of the entries in each of the first \(s\) rows is then the number of animals marked in each row stratum. The sum of the entries in each of the first \(t\) columns is then the number of animals captured (marked and unmarked) in each column stratum. The row/column names of the matrix may be set to identify the entries in the output.
Here is the raw data for the Harrison River. Notice that very small number of releases and recoveries in week 6.
harrison.2011.chinook.F.csv <- textConnection("
4 , 2 , 1 , 1 , 0 , 0 , 130
12 , 7 , 14 , 1 , 3 , 0 , 330
7 , 11 , 41 , 9 , 1 , 1 , 790
1 , 13 , 40 , 12 , 9 , 1 , 667
0 , 1 , 8 , 8 , 3 , 0 , 309
0 , 0 , 0 , 0 , 0 , 1 , 65
744 , 1187 , 2136 , 951 , 608 , 127 , 0")
har.data <- as.matrix(read.csv(harrison.2011.chinook.F.csv, header=FALSE))
har.data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
A total of 138 fish were tagged and released in week 1. Of these fish, 4 were recovered down stream in column stratum 1; 2 were recovered in column stratum 2; and 130 were never seen again. A total of 744 UNMARKED fish were recovered in column stratum 1.
You can add row and column names to the matrix which will used when the data are printed.
We invoke the SPAS.autopool() function and look at the output:
res <- SPAS.autopool(har.data)
res
#> $rawdata
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> $reddata
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> $row.pool
#> [1] 1 2 3 4 6 6
#>
#> $col.pool
#> [1] 1 2 3 4 5 6
#>
#> $summary
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130 1
#> [2,] 12 7 14 1 3 0 330 2
#> [3,] 7 11 41 9 1 1 790 3
#> [4,] 1 13 40 12 9 1 667 4
#> [5,] 0 1 8 8 3 0 309 6
#> [6,] 0 0 0 0 0 1 65 6
#> [7,] 744 1187 2136 951 608 127 0 NA
#> [8,] 1 2 3 4 5 6 NA NA
There were no rows or columns that were all zero, so the reduced data (res) is the same as the original data.
The suggested pooling vector for columns indicates no pooling was done (all entries are the same). The suggested pooling vector for rows, suggests rows 5 and 6 be pooled as indicated by the duplicate entries of 6 in the pooling vector.
Finally, the reduced data and suggest pooling vectors are presented in one display.
We can use the autopool argument in the SPAS.fit.model to do automated pooling prior to the fit.
mod..1 <- SPAS.fit.model(har.data,
model.id="Automated pooling",
autopool=TRUE)
#> Using nlminb to find conditional MLE
#> outer mgc: 1881.784
#> outer mgc: 4389.186
#> outer mgc: 1468.236
#> outer mgc: 422.6119
#> outer mgc: 113.8934
#> outer mgc: 26.50151
#> outer mgc: 43.40223
#> outer mgc: 55.72318
#> outer mgc: 15.80935
#> outer mgc: 22.2736
#> outer mgc: 3.903875
#> outer mgc: 0.8192351
#> outer mgc: 10.92489
#> outer mgc: 6.158485
#> outer mgc: 2.88843
#> outer mgc: 9.989736
#> outer mgc: 7.862851
#> outer mgc: 19.72383
#> outer mgc: 4.490593
#> outer mgc: 1.984147
#> outer mgc: 1.565641
#> outer mgc: 5.72284
#> outer mgc: 15.25525
#> outer mgc: 5.787902
#> outer mgc: 16.57644
#> outer mgc: 4.078164
#> outer mgc: 19.07256
#> outer mgc: 2.186887
#> outer mgc: 3.40589
#> outer mgc: 3.06535
#> outer mgc: 7.242766
#> outer mgc: 2.785986
#> outer mgc: 5.301617
#> outer mgc: 1.128007
#> outer mgc: 2.102554
#> outer mgc: 0.5954426
#> outer mgc: 2.630831
#> outer mgc: 0.3571193
#> outer mgc: 3.785503
#> outer mgc: 0.1577663
#> outer mgc: 0.98001
#> outer mgc: 0.5945414
#> outer mgc: 1.18846
#> outer mgc: 0.3909604
#> outer mgc: 1.043338
#> outer mgc: 0.204174
#> outer mgc: 0.8648443
#> outer mgc: 0.1053468
#> outer mgc: 0.520781
#> outer mgc: 0.07018663
#> outer mgc: 0.2067039
#> outer mgc: 0.0523693
#> outer mgc: 0.05568652
#> outer mgc: 0.02337035
#> outer mgc: 0.01194841
#> outer mgc: 0.005044315
#> Convergence codes from nlminb 0 relative convergence (4)
#> Finding conditional estimate of N
SPAS.print.model(mod..1)
#> Model Name: Automated pooling
#> Date of Fit: 2024-01-25 12:19
#> Version of OPEN SPAS used : SPAS-R 2023-03-31
#>
#> Raw data
#> V1 V2 V3 V4 V5 V6 V7
#> [1,] 4 2 1 1 0 0 130
#> [2,] 12 7 14 1 3 0 330
#> [3,] 7 11 41 9 1 1 790
#> [4,] 1 13 40 12 9 1 667
#> [5,] 0 1 8 8 3 0 309
#> [6,] 0 0 0 0 0 1 65
#> [7,] 744 1187 2136 951 608 127 0
#>
#> Row pooling setup : 1 2 3 4 6 6
#> Col pooling setup : 1 2 3 4 5 6
#> Physical pooling : TRUE
#> Theta pooling : FALSE
#> CJS pooling : FALSE
#>
#>
#> Chapman estimator of population size 70135 (SE 4503 )
#>
#>
#> Raw data AFTER PHYSICAL (but not logical) POOLING
#> pool1 pool2 pool3 pool4 pool5 pool6 V7
#> pool1 4 2 1 1 0 0 130
#> pool2 12 7 14 1 3 0 330
#> pool3 7 11 41 9 1 1 790
#> pool4 1 13 40 12 9 1 667
#> pool6 0 1 8 8 3 1 374
#> 744 1187 2136 951 608 127 0
#>
#> Condition number of XX' where X= (physically) pooled matrix is 3461.147
#> Condition number of XX' after logical pooling 3461.147
#>
#> Large value of kappa (>1000) indicate that rows are approximately proportional which is not good
#>
#> Conditional Log-Likelihood: 47415.98 ; np: 40 ; AICc: -94751.97
#>
#> Code/Message from optimization is: 0 relative convergence (4)
#>
#> Estimates
#> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1 3.6 2.7 0.8 0.8 0.0 0.1 130 0.005 195.8
#> pool2 12.0 7.0 14.0 1.0 3.0 0.0 330 1.000 0.0
#> pool3 7.0 11.0 41.0 9.0 1.0 1.0 790 1.000 0.0
#> pool4 1.0 13.8 38.0 11.3 10.6 1.3 667 0.022 44.1
#> pool6 0.0 1.1 7.6 7.6 3.5 1.3 374 0.025 39.6
#> est unmarked 744.0 1185.0 2139.0 952.0 606.0 126.0 0 NA NA
#> Pop Est
#> pool1 27162
#> pool2 367
#> pool3 860
#> pool4 33542
#> pool6 16035
#> est unmarked 77966
#>
#> SE of above estimates
#> pool1 pool2 pool3 pool4 pool5 pool6 psi cap.prob exp factor
#> pool1 1.5 1.3 0.8 0.8 0.0 0.5 11.4 0.002 85.5
#> pool2 3.5 2.6 3.7 1.0 1.7 0.0 18.2 0.000 0.0
#> pool3 2.6 3.3 6.4 3.0 1.0 1.0 28.1 0.000 0.0
#> pool4 1.0 3.7 6.2 3.2 2.5 1.3 25.8 0.009 17.6
#> pool6 0.0 1.1 2.6 3.0 2.0 1.3 19.3 0.035 57.1
#> est unmarked NA NA NA NA NA NA 0.0 NA NA
#> Pop Est
#> pool1 11794
#> pool2 0
#> pool3 0
#> pool4 13070
#> pool6 22539
#> est unmarked 15462
#>
#>
#> Chisquare gof cutoff : 0.1
#> Chisquare gof value : 1.131646
#> Chisquare gof df : 1
#> Chisquare gof p : 0.2874245
The automated pooling combined the last two rows, but the fit is not entire satisfactory because of the high condition number on X’X.
Some further bespoke pooling is likely necessary.
Schwarz, C. J., & Taylor, C. G. (1998). The use of the stratified-Petersen estimator in fisheries management with an illustration of estimating the number of pink salmon (Oncorhynchus gorbuscha) that return to spawn in the Fraser River. Canadian Journal of Fisheries and Aquatic Sciences, 55, 281–296. https://doi.org/10.1139/f97-238