--- title: "WDL and WIG Model Specs" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{WDL and WIG Model Specs} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(rwig) |> suppressPackageStartupMessages() ``` In this vignette, I will show how to set up the control parameters (hyper-parameters) needed for the WDL and WIG models. The `wdl_specs()` is a list of lists, and consists of 5 parts (lists): `wdl_control`, `tokenizer_control`, `word2vec_control`, `barycenter_control`, `optimizer_control`. The `wig_specs()` is the same as `wdl_specs()`, with additional `wig_control`. ## `wig_control` This is the options only needed for `wig_specs()`. By default, it is ```{r, eval=FALSE} wig_control = list( group_unit = "month", svd_method = "docs", standardize = TRUE ) ``` 1. `group_unit` dictates at which level of time to group the documents, and it will be passed to `lubridate::floor_date()` as the `unit` argument. The default option is "month" to obtain monthly time series index, and other options can be specified following the `unit` argument in `lubridate::floor_date()`. 2. `svd_method` can be either "docs" or "topics". The "doc" method means the Truncated SVD will be applied on the reconstructed documents to get the index directly; whereas "topics" means TSVD will be applied to the topics matrix before the construction of the index. The latter one is the one originally proposed in Xie (2020). 3. `standardize`: bool, whether or not to standardize the result index as mean 100 and standard deviation 1. This is default to be true, following Baker et al. (2016), Xie (2020). ## `wdl_control` This is the options supplied to the WDL modelling, and is used for both `wdl_specs()` and `wig_specs()`. 1. `num_topics`: number of topics for the topic modeling 2. `batch_size`: batch size for the training purpose 3. `epochs`: epochs (i.e. number of passes) for the training data 4. `shuffle`: bool, whether to shuffle the input data randomly 5. `verbose`: bool, whether to print out useful diagnostic information ## `tokenizer_control` Arguments for `tokenizers::tokenize_word_stems()`. ## `word2vec_control` Arguments for `word2vec::word2vec()`, but with the following default parameters: ```{r, eval=FALSE} type = "cbow" dim = 10 min_count = 1 ``` ## `barycenter_control` Identical to `barycenter_control` in `barycenter()` function, but with default ```{r, eval=FALSE} with_grad = TRUE ``` ## `optimizer_control` Parameters to control the optimizer (SGD, Adam, AdamW). ```{r, eval=FALSE} optimizer_control = list( optimizer = "adamw", lr = .005, decay = .01, beta1 = .9, beta2 = .999, eps = 1e-8 ) ``` The default optimizer is AdamW ("adamw"), but you can also choose vanilla SGD ("sgd") or the vanilla ("adam"). You can also set the learning rate `lr` in your hyper-parameter search. The other default parameters should mostly be untouched for most people, unless you know exactly what you are doing. For a reference, you can see Section 7.1 in Xie (2025), and the references within. ## See Also See also `vignette("wdl-model")`, `vignette("wig-model")`. ## References Baker, S. R., Bloom, N., & Davis, S. J. (2016). Measuring economic policy uncertainty. *The Quarterly Journal of Economics*, 131(4), 1593–1636. https://doi.org/10.1093/qje/qjw024 Peyré, G., & Cuturi, M. (2019). Computational Optimal Transport: With Applications to Data Science. *Foundations and Trends® in Machine Learning*, 11(5–6), 355–607. https://doi.org/10.1561/2200000073 Schmitz, M. A., Heitz, M., Bonneel, N., Ngolè, F., Coeurjolly, D., Cuturi, M., Peyré, G., & Starck, J.-L. (2018). Wasserstein dictionary learning: Optimal transport-based unsupervised nonlinear dictionary learning. *SIAM Journal on Imaging Sciences*, 11(1), 643–678. https://doi.org/10.1137/17M1140431 Xie, F. (2020). Wasserstein index generation model: Automatic generation of time-series index with application to economic policy uncertainty. *Economics Letters*, 186, 108874. https://doi.org/10.1016/j.econlet.2019.108874 Xie, F. (2025). Deriving the Gradients of Some Popular Optimal Transport Algorithms (No. arXiv:2504.08722). *arXiv*. https://doi.org/10.48550/arXiv.2504.08722