
<!-- README.md is generated from README.Rmd. Please edit that file -->

<!-- badges: start -->

[![CRAN
Status](https://www.r-pkg.org/badges/version/topics)](https://CRAN.R-project.org/package=topics)
[![DOI](https://zenodo.org/badge/785738351.svg)](https://zenodo.org/doi/10.5281/zenodo.11165377)
[![Github build
status](https://github.com/theharmonylab/topics/workflows/R-CMD-check/badge.svg)](https://github.com/theharmonylab/topics/actions)
[![Project Status:
Active](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active)
[![Lifecycle:
experimental](https://img.shields.io/badge/lifecycle-experimental-blue.svg)](https://lifecycle.r-lib.org/articles/stages.html#maturing-1)
[![codecov](https://codecov.io/gh/theharmonylab/topics/graph/badge.svg?token=7ZTWBNIVCX)](https://app.codecov.io/gh/theharmonylab/topics/)
[![CRAN
Downloads](https://cranlogs.r-pkg.org/badges/grand-total/topics)](https://CRAN.R-project.org/package=topics)
<!-- badges: end -->

# topics <a href="https://r-topics.org"><img src="man/figures/logo.png" align="right" height="138" alt="topics website" /></a>

## Overview

An R-package for analyzing natural language implementing Differential
Language Analysis using words, phrases and topics. <br> <br> Check out
our tutorial paper: [Multiple Methods for Visualizing Human Language: A
Tutorial for Social and Behavioural
Scientists](https://osf.io/preprints/psyarxiv/nxfvr_v1/). If you use the
topics package, please cite this tutorial in your work. <br> <br> The
`topics` package is part of the *R Language Analysis Suite*, including
`talk`, `text` and `topics`.

- [`talk`](https://www.r-talk.org/) transforms voice recordings into
  text, audio features, or embeddings.<br> <br>
- [`text`](https://www.r-text.org/) provides many language tasks such as
  converting digital text into word embeddings.<br> <br> `talk` and
  `text` offer access to Large Language Models from Hugging Face.<br>
  <br>
- [`topics`](https://www.r-topics.org/) visualizes language patterns
  into words, phrases or topics to generate psychological insights. <br>
  The `topics` package supports the `text` package in analysing and
  visualizing topics from BERTtopics.<br> <br>

<img src="man/figures/talk_text_topics.svg" style="width:50.0%" />

<br> When using the `topics` package, please cite:

Ackermann L., Zhuojun G. & Kjell O.N.E. (2024). An R-package for
visualizing text in topics. <https://github.com/theharmonylab/topics>.
`DOI:zenodo.org/records/11165378`.

## Differential Language Analysis with Topics and N-grams

The `topics` pipeline is designed for a seamless transition from raw
text to statistically grounded visualizations. It is composed of the
following steps:

**1. Data Preprocessing** Transform raw text into a Document-Term Matrix
(DTM) or extract n-grams. This step handles cleaning, including the
removal of stopwords and punctuation, to prepare data for modeling or
frequency analysis.

**2. Model Training** For topic modeling, an LDA (Latent Dirichlet
Allocation) model is trained on the DTM. Users can specify the number of
topics and iterations to optimize the thematic representation of the
corpus.

**3. Model Inference** The model inference step uses the trained LDA
model to infer the topic-term distribution across documents, converting
qualitative text into quantitative topic loadings.

**4. Statistical Analysis** Perform Differential Language Analysis (DLA)
using `topicsTest()`. The analysis now supports: \* **Automatic
Detection:** Intelligent per-variable method detection (e.g.,
automatically applying logistic regression for binary factors and linear
regression for continuous data). \* **Multi-Element Analysis:**
Statistically test both LDA topics and n-grams. \* **Rigorous
Controls:** Support for control variables and various p-value adjustment
methods for multiple comparisons (e.g., FDR, Bonferroni, Holm).

**5. Visualization** Generate publication-ready visualizations of your
results: \* **Wordclouds:** Create clouds of significant topics where
word size reflects the contribution to the theme. \* **N-gram Plots:**
Directly visualize the relationship between specific phrases and your
variables of interest.

<img src="man/figures/one_dim.png" style="width:75.0%"
alt="One-dimensional plots based on words and phrases (top) and LDA topics (bottom)." />
<br> <br> <br>

<img src="man/figures/two_dim.png" style="width:75.0%"
alt="A two-dimensional plot showing topics related to depression versus worry responses (x-axis) and low versus high depression severity (y-axis)." />
<br> <br>
