This vignette describes how can time series be derived from a topic model using document’s dates and optionally document’s sentiment. Please refer to the “Basic usage” vignette for an introduction to topic model estimation.

Internal dates and sentiment

The example dataset shipped with the package already contains two docvars: .date and .sentiment. Using these exact names, these two will be considered as internal dates and internal sentiment by sentopics when creating a topic model. Those values may be accessed or modified through the helper functions sentopics_date() and sentopics_sentiment().

library("xts")
library("data.table")
library("sentopics")
data("ECB_press_conferences_tokens")
head(docvars(ECB_press_conferences_tokens))
#        .date doc_id                                        title
# 1 1998-06-09      1 ECB Press conference: Introductory statement
# 2 1998-06-09      1 ECB Press conference: Introductory statement
# 3 1998-06-09      1 ECB Press conference: Introductory statement
# 4 1998-06-09      1 ECB Press conference: Introductory statement
# 5 1998-06-09      1 ECB Press conference: Introductory statement
# 6 1998-06-09      1 ECB Press conference: Introductory statement
#                                                               section_title
# 1 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 2 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 3 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 4 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 5 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 6 Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
#    .sentiment
# 1 -0.01470588
# 2 -0.02500000
# 3  0.00000000
# 4  0.00000000
# 5  0.00000000
# 6  0.00000000
set.seed(123)
lda <- LDA(ECB_press_conferences_tokens, K = 9, alpha = 1, beta = 0.001)
head(sentopics_date(lda))
#       .id      .date
#    <char>     <Date>
# 1:    1_1 1998-06-09
# 2:    1_2 1998-06-09
# 3:    1_3 1998-06-09
# 4:    1_4 1998-06-09
# 5:    1_5 1998-06-09
# 6:    1_6 1998-06-09
head(sentopics_sentiment(lda))
#       .id  .sentiment
#    <char>       <num>
# 1:    1_1 -0.01470588
# 2:    1_2 -0.02500000
# 3:    1_3  0.00000000
# 4:    1_4  0.00000000
# 5:    1_5  0.00000000
# 6:    1_6  0.00000000

For this example, the documents’ sentiment were computed using the sentometrics package. For further details on this sentiment computation, please refer to the script used in /data-raw/ on GitHub.

Now that the lda object contains dates and sentiment, we already have enough information to compute a sentiment index using sentiment_series() which aggregates document per period. By default, it returns a xts object.

xts_sent <- sentiment_series(lda, period = "month", rolling_window = 6)
plot(xts_sent)

Estimating the topic model will allow enriching this sentiment series with topical content. The model should be estimated until it returns satisfactory topics. Labeling the topics facilitates the subsequent analysis.

lda <- fit(lda, 1000)
sentopics_labels(lda) <- list(
  topic = c(
    "Economic growth & Inflation", "Banking", "Payment services",
    "European single market", "Monetary policy & Negative rate",
    "Monetary policy & Price stability", "Others", "Banking supervision",
    "Financial markets"
  )
)
plot(lda)

The estimated topic model adds a layer of topical proportions to the existing documents. This appears clearly when using melt() on the model. Leveraging on the topic and sentiment information at the document level we can compute the share of sentiment that belong to a given topic.

document_datas <- sentopics::melt(lda, include_docvars = TRUE)
head(document_datas)
#                          topic       prob      .date    .id doc_id
#                         <fctr>      <num>     <Date> <char> <char>
# 1: Economic growth & Inflation 0.07692308 1998-06-09    1_1      1
# 2: Economic growth & Inflation 0.03125000 1998-06-09    1_2      1
# 3: Economic growth & Inflation 0.06666667 1998-06-09    1_3      1
# 4: Economic growth & Inflation 0.10526316 1998-06-09    1_4      1
# 5: Economic growth & Inflation 0.05000000 1998-06-09    1_5      1
# 6: Economic growth & Inflation 0.04347826 1998-06-09    1_6      1
#                                           title
#                                          <char>
# 1: ECB Press conference: Introductory statement
# 2: ECB Press conference: Introductory statement
# 3: ECB Press conference: Introductory statement
# 4: ECB Press conference: Introductory statement
# 5: ECB Press conference: Introductory statement
# 6: ECB Press conference: Introductory statement
#                                                                section_title
#                                                                       <char>
# 1: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 2: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 3: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 4: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 5: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
# 6: Willem F. Duisenberg, President of the European Central Bank, 9 June 1998
#     .sentiment .sentiment_scaled
#          <num>             <num>
# 1: -0.01470588        -2.8778373
# 2: -0.02500000        -4.6258752
# 3:  0.00000000        -0.3806404
# 4:  0.00000000        -0.3806404
# 5:  0.00000000        -0.3806404
# 6:  0.00000000        -0.3806404
head(document_datas[, list(.date, topic, share_of_sentiment = prob * .sentiment), keyby = ".id"])
# Key: <.id>
#       .id      .date                             topic share_of_sentiment
#    <char>     <Date>                            <fctr>              <num>
# 1:  100_1 2006-06-08       Economic growth & Inflation        0.002545455
# 2:  100_1 2006-06-08                           Banking        0.005090909
# 3:  100_1 2006-06-08                  Payment services        0.002545455
# 4:  100_1 2006-06-08            European single market        0.002545455
# 5:  100_1 2006-06-08   Monetary policy & Negative rate        0.033090909
# 6:  100_1 2006-06-08 Monetary policy & Price stability        0.002545455

Using this share of sentiment and the documents’ date, one may compute two additional outputs: a breakdown of the sentiment time series and a time series of the sentiment expressed by each topic. The difference between the two outputs rely on the aggregation between documents. The breakdown averages documents’ share of sentiment with an equal weighting, whereas computing the sentiment expressed by a topic requires weighting documents by their attention to this given topic. These two aggregations are implemented through the sentiment_breakdown() and sentiment_topics() functions.

head(na.omit(sentiment_breakdown(lda, period = "month", rolling_window = 6)))
#             sentiment Economic growth & Inflation     Banking Payment services
# 1998-11-01 -1.0298569                  -0.1151266 -0.07707704      -0.12772970
# 1998-12-01 -1.0735310                  -0.1210672 -0.11188915      -0.13411588
# 1999-01-01 -0.9010599                  -0.1172294 -0.09564143      -0.08154807
# 1999-02-01 -1.1255928                  -0.1479729 -0.11047193      -0.09931550
# 1999-03-01 -1.2070330                  -0.1985380 -0.12178704      -0.08551730
# 1999-04-01 -1.4144708                  -0.2335188 -0.14502420      -0.10177635
#            European single market Monetary policy & Negative rate
# 1998-11-01            -0.07785681                      -0.2273722
# 1998-12-01            -0.08116835                      -0.1873457
# 1999-01-01            -0.07046946                      -0.1351671
# 1999-02-01            -0.09471892                      -0.1828060
# 1999-03-01            -0.11260427                      -0.1895354
# 1999-04-01            -0.11947086                      -0.2441200
#            Monetary policy & Price stability      Others Banking supervision
# 1998-11-01                       -0.06613374 -0.06733011          -0.1180358
# 1998-12-01                       -0.06897499 -0.06746480          -0.1268454
# 1999-01-01                       -0.04998243 -0.06777561          -0.1113562
# 1999-02-01                       -0.06554454 -0.08750756          -0.1362076
# 1999-03-01                       -0.06821770 -0.09840632          -0.1276992
# 1999-04-01                       -0.09015224 -0.11476663          -0.1471136
#            Financial markets
# 1998-11-01        -0.1531949
# 1998-12-01        -0.1746596
# 1999-01-01        -0.1718902
# 1999-02-01        -0.2010478
# 1999-03-01        -0.2047278
# 1999-04-01        -0.2185282
head(na.omit(sentiment_topics(lda, period = "month", rolling_window = 6)))
#            Economic growth & Inflation   Banking Payment services
# 1998-11-01                   -1.456447 -1.177686       -1.4257697
# 1998-12-01                   -1.552844 -1.441269       -1.5192832
# 1999-01-01                   -1.469062 -1.204640       -0.9547131
# 1999-02-01                   -1.768688 -1.208604       -1.2344803
# 1999-03-01                   -2.100009 -1.098580       -1.2065969
# 1999-04-01                   -2.405278 -1.266237       -1.4085298
#            European single market Monetary policy & Negative rate
# 1998-11-01              -1.149206                      -0.6988520
# 1998-12-01              -1.156467                      -0.6212429
# 1999-01-01              -1.055477                      -0.4440184
# 1999-02-01              -1.259723                      -0.7466193
# 1999-03-01              -1.303717                      -0.9473358
# 1999-04-01              -1.364087                      -1.2799911
#            Monetary policy & Price stability     Others Banking supervision
# 1998-11-01                        -0.8472344 -0.6396175           -1.632095
# 1998-12-01                        -0.8272929 -0.6404772           -1.796555
# 1999-01-01                        -0.6291579 -0.6114367           -1.571457
# 1999-02-01                        -0.7423391 -0.8948703           -1.734508
# 1999-03-01                        -0.7329459 -1.0225091           -1.388107
# 1999-04-01                        -0.9294124 -1.2413070           -1.598685
#            Financial markets
# 1998-11-01         -1.554975
# 1998-12-01         -1.873536
# 1999-01-01         -1.870179
# 1999-02-01         -2.071598
# 1999-03-01         -1.995909
# 1999-04-01         -2.080132

Furthermore, these functions have embedded plot options, that are directly accessible using the plot_ prefix.

plot_sentiment_breakdown(lda, period = "month", rolling_window = 6)

plot_sentiment_topics(lda, period = "month", rolling_window = 6)

Topical time series

Internal dates and sentiment