lm_tidiers {broom} | R Documentation |
These methods tidy the coefficients of a linear model into a summary, augment the original data with information on the fitted values and residuals, and construct a one-row glance of the model's statistics.
## S3 method for class 'lm' tidy(x, conf.int = FALSE, conf.level = 0.95, exponentiate = FALSE, quick = FALSE, ...) ## S3 method for class 'summary.lm' tidy(x, ...) ## S3 method for class 'lm' augment(x, data = stats::model.frame(x), newdata, type.predict, type.residuals, ...) ## S3 method for class 'lm' glance(x, ...) ## S3 method for class 'summary.lm' glance(x, ...)
x |
lm object |
conf.int |
whether to include a confidence interval |
conf.level |
confidence level of the interval, used only if
|
exponentiate |
whether to exponentiate the coefficient estimates and confidence intervals (typical for logistic regression) |
quick |
whether to compute a smaller and faster version, containing
only the |
... |
extra arguments (not used) |
data |
Original data, defaults to the extracting it from the model |
newdata |
If provided, performs predictions on the new data |
type.predict |
Type of prediction to compute for a GLM; passed on to
|
type.residuals |
Type of residuals to compute for a GLM; passed on to
|
If you have missing values in your model data, you may need to refit
the model with na.action = na.exclude
.
If conf.int=TRUE
, the confidence interval is computed with
the confint
function.
While tidy
is supported for "mlm" objects, augment
and
glance
are not.
When the modeling was performed with na.action = "na.omit"
(as is the typical default), rows with NA in the initial data are omitted
entirely from the augmented data frame. When the modeling was performed
with na.action = "na.exclude"
, one should provide the original data
as a second argument, at which point the augmented data will contain those
rows (typically with NAs in place of the new columns). If the original data
is not provided to augment
and na.action = "na.exclude"
, a
warning is raised and the incomplete rows are dropped.
Code and documentation for augment.lm
originated in the
ggplot2 package, where it was called fortify.lm
All tidying methods return a data.frame
without rownames.
The structure depends on the method chosen.
tidy.lm
returns one row for each coefficient, with five columns:
term |
The term in the linear model being estimated and tested |
estimate |
The estimated coefficient |
std.error |
The standard error from the linear model |
statistic |
t-statistic |
p.value |
two-sided p-value |
If the linear model is an "mlm" object (multiple linear model), there is an additional column:
response |
Which response column the coefficients correspond to (typically Y1, Y2, etc) |
If conf.int=TRUE
, it also includes columns for conf.low
and
conf.high
, computed with confint
.
When newdata
is not supplied augment.lm
returns
one row for each observation, with seven columns added to the original
data:
.hat |
Diagonal of the hat matrix |
.sigma |
Estimate of residual standard deviation when corresponding observation is dropped from model |
.cooksd |
Cooks distance, |
.fitted |
Fitted values of model |
.se.fit |
Standard errors of fitted values |
.resid |
Residuals |
.std.resid |
Standardised residuals |
(Some unusual "lm" objects, such as "rlm" from MASS, may omit
.cooksd
and .std.resid
. "gam" from mgcv omits
.sigma
)
When newdata
is supplied, augment.lm
returns one row for each
observation, with three columns added to the new data:
.fitted |
Fitted values of model |
.se.fit |
Standard errors of fitted values |
.resid |
Residuals of fitted values on the new data |
glance.lm
returns a one-row data.frame with the columns
r.squared |
The percent of variance explained by the model |
adj.r.squared |
r.squared adjusted based on the degrees of freedom |
sigma |
The square root of the estimated residual variance |
statistic |
F-statistic |
p.value |
p-value from the F test, describing whether the full regression is significant |
df |
Degrees of freedom used by the coefficients |
logLik |
the data's log-likelihood under the model |
AIC |
the Akaike Information Criterion |
BIC |
the Bayesian Information Criterion |
deviance |
deviance |
df.residual |
residual degrees of freedom |
library(ggplot2) library(dplyr) mod <- lm(mpg ~ wt + qsec, data = mtcars) tidy(mod) glance(mod) # coefficient plot d <- tidy(mod) %>% mutate(low = estimate - std.error, high = estimate + std.error) ggplot(d, aes(estimate, term, xmin = low, xmax = high, height = 0)) + geom_point() + geom_vline(xintercept = 0) + geom_errorbarh() head(augment(mod)) head(augment(mod, mtcars)) # predict on new data newdata <- mtcars %>% head(6) %>% mutate(wt = wt + 1) augment(mod, newdata = newdata) au <- augment(mod, data = mtcars) plot(mod, which = 1) qplot(.fitted, .resid, data = au) + geom_hline(yintercept = 0) + geom_smooth(se = FALSE) qplot(.fitted, .std.resid, data = au) + geom_hline(yintercept = 0) + geom_smooth(se = FALSE) qplot(.fitted, .std.resid, data = au, colour = factor(cyl)) qplot(mpg, .std.resid, data = au, colour = factor(cyl)) plot(mod, which = 2) qplot(sample =.std.resid, data = au, stat = "qq") + geom_abline() plot(mod, which = 3) qplot(.fitted, sqrt(abs(.std.resid)), data = au) + geom_smooth(se = FALSE) plot(mod, which = 4) qplot(seq_along(.cooksd), .cooksd, data = au) plot(mod, which = 5) qplot(.hat, .std.resid, data = au) + geom_smooth(se = FALSE) ggplot(au, aes(.hat, .std.resid)) + geom_vline(size = 2, colour = "white", xintercept = 0) + geom_hline(size = 2, colour = "white", yintercept = 0) + geom_point() + geom_smooth(se = FALSE) qplot(.hat, .std.resid, data = au, size = .cooksd) + geom_smooth(se = FALSE, size = 0.5) plot(mod, which = 6) ggplot(au, aes(.hat, .cooksd)) + geom_vline(xintercept = 0, colour = NA) + geom_abline(slope = seq(0, 3, by = 0.5), colour = "white") + geom_smooth(se = FALSE) + geom_point() qplot(.hat, .cooksd, size = .cooksd / .hat, data = au) + scale_size_area() # column-wise models a <- matrix(rnorm(20), nrow = 10) b <- a + rnorm(length(a)) result <- lm(b ~ a) tidy(result)