---
title: "Getting Started with pairwiseLLM"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting Started with pairwiseLLM}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval = TRUE
)

library(pairwiseLLM)
library(dplyr)
```

# 1. Introduction

`pairwiseLLM` provides a unified workflow for generating and analyzing **pairwise comparisons of writing quality** using LLM APIs (OpenAI, Anthropic, Gemini, Together), and local models via Ollama..

A typical workflow:

1. Select writing samples  
2. Construct pairwise comparison sets  
3. Submit comparisons to an LLM (live or batch API)  
4. Parse model outputs  
5. Fit Bradley–Terry or Elo models to obtain latent writing-quality scores  

For prompt evaluation and positional-bias diagnostics, see:

* [`vignette("prompt-template-bias")`](https://shmercer.github.io/pairwiseLLM/articles/prompt-template-bias.html)

For advanced batch processing workflows, see:

* [`vignette("advanced-batch-workflows")`](https://shmercer.github.io/pairwiseLLM/articles/advanced-batch-workflows.html)

---

# 2. Setting API Keys

`pairwiseLLM` reads provider keys **only from environment variables**, never from R options or global variables.

| Provider  | Environment Variable |
|----------|----------------------|
| [OpenAI](https://openai.com/api/)   | OPENAI_API_KEY       |
| [Anthropic](https://console.anthropic.com/)| ANTHROPIC_API_KEY    |
| [Gemini](https://aistudio.google.com/)   | GEMINI_API_KEY       |
| [Together](https://www.together.ai/) | TOGETHER_API_KEY     |

You should put these in your `~/.Renviron`:

```
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="..."
GEMINI_API_KEY="..."
TOGETHER_API_KEY="..."
```

Check which keys are available:

```
library(pairwiseLLM)

check_llm_api_keys()
#> All known LLM API keys are set: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, TOGETHER_API_KEY.
#> # A tibble: 4 × 4
#>   backend   service        env_var           has_key
#> 1 openai    OpenAI         OPENAI_API_KEY    TRUE
#> 2 anthropic Anthropic      ANTHROPIC_API_KEY TRUE
#> 3 gemini    Google Gemini  GEMINI_API_KEY    TRUE
#> 4 together  Together.ai    TOGETHER_API_KEY  TRUE
```

[Ollama](https://ollama.com/) runs locally and does not require an API key, just that the Ollama server is running.

---

# 3. Example Writing Data

The package ships with 20 authentic student writing samples:

```{r}
data("example_writing_samples", package = "pairwiseLLM")
dplyr::slice_head(example_writing_samples, n = 3)
```

Each sample has:

- `ID`  
- `text`  

---

# 4. Constructing Pairwise Comparisons

Create all unordered pairs:

```{r}
pairs <- example_writing_samples |>
  make_pairs()

dplyr::slice_head(pairs, n = 5)
```

Sample a subset of pairs:

```{r}
pairs_small <- sample_pairs(pairs, n_pairs = 10, seed = 123)
```

Randomize SAMPLE_1 / SAMPLE_2 order:

```{r}
pairs_small <- randomize_pair_order(pairs_small, seed = 99)
```

---

# 5. Traits and Prompt Templates

## 5.1 Using a built-in trait

```{r}
td <- trait_description("overall_quality")
td
```

Or define your own:

```{r}
td_custom <- trait_description(
  custom_name = "Clarity",
  custom_description = "How clearly and effectively ideas are expressed."
)
```

## 5.2 Using or customizing prompt templates

Load default prompt:

```{r}
tmpl <- set_prompt_template()
cat(substr(tmpl, 1, 300))
```

Placeholders required:

- `{TRAIT_NAME}`
- `{TRAIT_DESCRIPTION}`
- `{SAMPLE_1}`
- `{SAMPLE_2}`

Load a template from file:

```{r, eval=FALSE}
set_prompt_template(file = "my_template.txt")
```

---

# 6. Live Pairwise Comparisons

The unified wrapper works for **OpenAI, Anthropic, Gemini, Together, and Ollama.**

```{r, eval=FALSE}
res_live <- submit_llm_pairs(
  pairs             = pairs_small,
  backend           = "openai", # also "anthropic", "gemini", "together", "ollama"
  model             = "gpt-4o",
  trait_name        = td$name,
  trait_description = td$description,
  prompt_template   = tmpl
)
```

Preview results:

```{r, eval=FALSE}
dplyr::slice_head(res_live, 5)
```

Each row includes:

- `pair_id`
- `sample1_id`, `sample2_id`
- parsed `<BETTER_SAMPLE>` tag → `better_sample` and `better_id`
- (optionally) raw model output

---

# 7. Preparing Data for BT or Elo Modeling

Convert LLM output to a 3-column BT dataset:

```{r, eval=FALSE}
# res_live: output from submit_llm_pairs()
bt_data <- build_bt_data(res_live)
dplyr::slice_head(bt_data, 5)
```

and/or a dataset for Elo modeling:

```{r, eval=FALSE}
# res_live: output from submit_llm_pairs()
elo_data <- build_elo_data(res_live)
```
---

# 8. Bradley–Terry Modeling

Fit model:

```{r, eval=FALSE}
bt_fit <- fit_bt_model(bt_data)
```

Summarize results:

```{r, eval=FALSE}
summarize_bt_fit(bt_fit)
```

The output includes:

- latent θ ability scores  
- SEs  
- reliability (sirt engine)  

---

# 9. Elo Modeling

```{r, eval=FALSE}
elo_fit <- fit_elo_model(elo_data, runs = 5)
elo_fit
```

Outputs:

- Elo ratings for each sample  
- unweighted and weighted reliability  
- trial counts  

---

# 10. Batch APIs (Large Jobs)

## 10.1 Submit a batch

```{r, eval=FALSE}
batch <- llm_submit_pairs_batch(
  backend            = "openai",
  model              = "gpt-4o",
  pairs              = pairs_small,
  trait_name         = td$name,
  trait_description  = td$description,
  prompt_template    = tmpl
)
```

## 10.2 Download results

```{r, eval=FALSE}
res_batch <- llm_download_batch_results(batch)
head(res_batch)
```

---

# 11. Backend-Specific Tools

Most users use the unified interface, but backend helpers are available.

### 11.1 OpenAI
- `submit_openai_pairs_live()`
- `build_openai_batch_requests()`
- `run_openai_batch_pipeline()`
- `parse_openai_batch_output()`

### 11.2 Anthropic
- `submit_anthropic_pairs_live()`
- `build_anthropic_batch_requests()`
- `run_anthropic_batch_pipeline()`
- `parse_anthropic_batch_output()`

### 11.3 Google Gemini
- `submit_gemini_pairs_live()`
- `build_gemini_batch_requests()`
- `run_gemini_batch_pipeline()`
- `parse_gemini_batch_output()`

### 11.4 Together.ai  (live only)
- `together_compare_pair_live()`
- `submit_together_pairs_live()`

### 11.5 Ollama (local, live only)
- `ollama_compare_pair_live()`
- `submit_ollama_pairs_live()`

---

# 12. Troubleshooting

### Missing API keys  
```{r}
check_llm_api_keys()
```

### Getting chain-of-thought leakage  
Use the default template or set `include_thoughts = FALSE`.

### Timeouts  
Use batch APIs for >40 pairs.

### Positional bias  
Use `compute_reverse_consistency()` + `check_positional_bias()` (see [vignette("prompt-template-bias")](https://shmercer.github.io/pairwiseLLM/articles/prompt-template-bias.html) for a full example).

---

# 13. Citation

> Mercer, S. (2025). *Getting started with pairwiseLLM* (Version 1.0.0) [R package vignette]. 
In *pairwiseLLM: Pairwise Comparison Tools for Large Language Model-Based Writing Evaluation*. 
https://shmercer.github.io/pairwiseLLM/