thresholder {caret} | R Documentation |
This function uses the resampling results from a train
object to generate performance statistics over a set of probability
thresholds for two-class problems.
thresholder(x, threshold, final = TRUE)
x |
A |
threshold |
A numeric vector of candidate probability thresholds between [0,1]. If the class probability corresponding to the first level of the outcome is greater than the threshold, the data point is classified as that level. |
final |
A logical: should only the final tuning parameters
chosen by |
A data frame with columns for each of the tuning parameters
from the model along with an additional column called
prob_threshold
for the probability threshold. There are
also columns for summary statistics averaged over resamples with
column names Sensitivity
, Specificity
, J
,
Dist
. The last two correspond to Youden's J statistic
and the distance to the best possible cutoff (i.e. perfect
sensitivity and specificity).
## Not run: set.seed(2444) dat <- twoClassSim(500, intercept = -10) table(dat$Class) ctrl <- trainControl(method = "cv", classProbs = TRUE, savePredictions = "all", summaryFunction = twoClassSummary) set.seed(2863) mod <- train(Class ~ ., data = dat, method = "rda", tuneLength = 4, metric = "ROC", trControl = ctrl) resample_stats <- thresholder(mod, threshold = seq(.5, 1, by = 0.05), final = TRUE) ggplot(resample_stats, aes(x = prob_threshold, y = J)) + geom_point() ggplot(resample_stats, aes(x = prob_threshold, y = Dist)) + geom_point() ggplot(resample_stats, aes(x = prob_threshold, y = Sensitivity)) + geom_point() + geom_point(aes(y = Specificity), col = "red") ## End(Not run)