stm_tidiers {tidytext} | R Documentation |
Tidy topic models fit by the stm package. The arguments and return values
are similar to lda_tidiers
.
## S3 method for class 'STM' tidy(x, matrix = c("beta", "gamma", "theta"), log = FALSE, document_names = NULL, ...) ## S3 method for class 'STM' augment(x, data, ...) ## S3 method for class 'STM' glance(x, ...)
x |
An STM fitted model object, created by |
matrix |
Whether to tidy the beta (per-term-per-topic, default) or gamma/theta (per-document-per-topic) matrix. The stm package calls this the theta matrix, but other topic modeling packages call this gamma. |
log |
Whether beta/gamma/theta should be on a log scale, default FALSE |
document_names |
Optional vector of document names for use with per-document-per-topic tidying |
... |
Extra arguments, not used |
data |
For |
tidy
returns a tidied version of either the beta or gamma matrix.
augment
must be provided a data argument, either a
dfm
or a table containing one row per original
document-term pair, such as is returned by tdm_tidiers, containing
columns document
and term
. It returns that same data as a table
with an additional column .topic
with the topic assignment for that
document-term combination.
glance
always returns a one-row table, with columns
Number of topics in the model
Number of documents in the model
Number of terms in the model
Number of iterations used
If an LDA model, the parameter of the Dirichlet distribution for topics over documents
If matrix == "beta"
(default), returns a table with one row per topic and term,
with columns
Topic, as an integer
Term
Probability of a term generated from a topic according to the structural topic model
If matrix == "gamma"
, returns a table with one row per topic and document,
with columns
Topic, as an integer
Document name (if given in vector of document_names
) or
ID as an integer
Probability of topic given document
## Not run: if (requireNamespace("stm", quietly = TRUE) && requireNamespace("quanteda", quietly = TRUE)) { library(dplyr) library(ggplot2) library(stm) library(quanteda) inaug <- dfm(data_corpus_inaugural, remove = stopwords("english"), remove_punct = TRUE) topic_model <- stm(inaug, K = 3, verbose = FALSE, init.type = "Spectral") # tidy the word-topic combinations td_beta <- tidy(topic_model) td_beta # Examine the three topics td_beta %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% ggplot(aes(term, beta)) + geom_col() + facet_wrap(~ topic, scales = "free") + coord_flip() # tidy the document-topic combinations, with optional document names td_gamma <- tidy(topic_model, matrix = "gamma", document_names = rownames(inaug)) td_gamma # find the assignments of each word in each document assignments <- augment(topic_model, inaug) assignments } ## End(Not run)