---
title: "Moving Averages for Trend Analysis"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Moving Averages for Trend Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 8,
  fig.height = 5,
  warning = FALSE,
  message = FALSE
)
```

```{r setup}
library(trendseries)
library(dplyr)
library(ggplot2)
library(tidyr)

# Load data
data("vehicles", "ibcbr", "electric", package = "trendseries")
```

## Introduction

Moving averages are one of the most intuitive and widely-used tools for extracting trends from time series data. The basic idea is simple: **average nearby observations to smooth out random fluctuations**.

This vignette explores the different types of moving averages available in `trendseries`, when to use each one, and how to choose appropriate parameters.

### When to Use Moving Averages

Moving averages work well when:
- You want a simple, interpretable trend
- Your data has short-term noise you want to filter out
- You're doing preliminary exploratory analysis
- You need a trend that's easy to explain to non-technical audiences

They're less suitable when:
- Your data has strong seasonal patterns (use STL instead)
- You need to preserve specific features like peaks or valleys (use Savitzky-Golay)
- You're analyzing business cycles (use HP, BK, or CF filters)

## Simple Moving Average: The Foundation

The simple moving average (MA) calculates the mean of the last n observations. It's the easiest method to understand and implement.

### How It Works

For a 12-month moving average, each point is the average of the current month plus the previous 11 months:

```
MA(t) = (X(t) + X(t-1) + X(t-2) + ... + X(t-11)) / 12
```

### Basic Example

Let's start with vehicle production data:

```{r ma-basic}
# Use recent data (last 5 years)
vehicles_recent <- vehicles |>
  slice_tail(n = 60)

# Apply 12-month moving average
vehicles_ma <- vehicles_recent |>
  augment_trends(
    value_col = "production",
    methods = "ma",
    window = 12
  )

# View results
head(vehicles_ma)
```

Let's visualize the smoothing effect:

```{r ma-plot}
# Prepare plot data
plot_data <- vehicles_ma |>
  select(date, production, trend_ma) |>
  pivot_longer(
    cols = c(production, trend_ma),
    names_to = "series",
    values_to = "value"
  ) |>
  mutate(
    series = ifelse(series == "production", "Original Data", "12-Month MA")
  )

# Plot
ggplot(plot_data, aes(x = date, y = value, color = series)) +
  geom_line(linewidth = 0.9) +
  labs(
    title = "Vehicle Production: Simple Moving Average",
    subtitle = "12-month window smooths out month-to-month variation",
    x = "Date",
    y = "Production (thousands of units)",
    color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

The moving average (in teal/blue) clearly shows the underlying trend by filtering out the month-to-month noise.

## Choosing the Right Window Size

The window size (period) determines how smooth your trend will be:

- **Small windows** (3-6): More responsive, track changes quickly, but may include noise
- **Medium windows** (12): Balance between smoothness and responsiveness (one year for monthly data)
- **Large windows** (24+): Very smooth, but slow to react to changes

Let's compare different window sizes:

```{r window-comparison}
# Apply different window sizes
windows_to_test <- c(3, 6, 12, 24)

# Start with original data
vehicles_windows <- vehicles_recent

# Add each window size
for (w in windows_to_test) {
  temp <- vehicles_recent |>
    augment_trends(value_col = "production", methods = "ma", window = w) |>
    select(trend_ma)

  names(temp) <- paste0("ma_", w, "m")
  vehicles_windows <- bind_cols(vehicles_windows, temp)
}

# Prepare for plotting
plot_data <- vehicles_windows |>
  select(date, production, starts_with("ma_")) |>
  pivot_longer(
    cols = c(production, starts_with("ma_")),
    names_to = "method",
    values_to = "value"
  ) |>
  mutate(
    method = case_when(
      method == "production" ~ "Original",
      method == "ma_3m" ~ "3-month MA",
      method == "ma_6m" ~ "6-month MA",
      method == "ma_12m" ~ "12-month MA",
      method == "ma_24m" ~ "24-month MA"
    ),
    method = factor(method, levels = c("Original", "3-month MA", "6-month MA",
                                       "12-month MA", "24-month MA"))
  )

# Plot
ggplot(plot_data, aes(x = date, y = value, color = method)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Effect of Window Size on Moving Average",
    subtitle = "Larger windows = smoother trends, but slower to react",
    x = "Date",
    y = "Production (thousands of units)",
    color = "Method"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

Notice how the 24-month MA is very smooth but "lags" behind changes, while the 3-month MA tracks the data closely but still shows some fluctuation.

### Window Size Guidelines

For **monthly data**:
- Short-term analysis: 3-6 months
- Medium-term trends: 12 months (annual cycle)
- Long-term trends: 24-36 months

For **quarterly data**:
- Short-term: 2-4 quarters
- Medium-term: 4-8 quarters
- Long-term: 8-12 quarters

## Understanding Alignment: Center vs Right vs Left

Moving averages can be calculated with different **alignments**, which determines which observations are used to calculate each point. This is a critical choice that affects both the trend's properties and when NAs appear in the result.

### The Three Alignment Options

1. **Center alignment** (default): Uses observations both before and after each point
   - Most common for general trend extraction
   - Produces NAs at both the beginning and end of the series
   - **Non-causal**: uses future information

2. **Right alignment** (causal): Uses only past observations
   - Critical for real-time analysis and forecasting
   - Produces NAs only at the beginning
   - **No look-ahead bias**: suitable for backtesting strategies
   - Also called "trailing" or "backward-looking"

3. **Left alignment** (anti-causal): Uses only future observations
   - Rarely used in practice
   - Produces NAs only at the end
   - Useful for specific smoothing applications

### When to Use Each Alignment

**Use center alignment when:**
- Doing historical analysis where all data is available
- You want the smoothest possible trend
- The symmetric window makes sense for your application

**Use right alignment when:**
- Building forecasting models (avoid look-ahead bias)
- Backtesting trading strategies or economic indicators
- Analyzing data in real-time (can't use future data)
- Need causal filters for time series econometrics

**Use left alignment when:**
- Specific smoothing applications that need forward-looking averages
- Very rarely used in economic analysis

### Visualizing Different Alignments

Let's compare the three alignments using vehicle production data:

```{r align-comparison}
# Apply 12-month moving average with different alignments
vehicles_align <- vehicles_recent |>
  augment_trends(
    value_col = "production",
    methods = "ma",
    window = 12,
    align = "center"
  ) |>
  rename(trend_center = trend_ma)

# Add right alignment
vehicles_align <- vehicles_align |>
  augment_trends(
    value_col = "production",
    methods = "ma",
    window = 12,
    align = "right"
  ) |>
  rename(trend_right = trend_ma)

# Add left alignment
vehicles_align <- vehicles_align |>
  augment_trends(
    value_col = "production",
    methods = "ma",
    window = 12,
    align = "left"
  ) |>
  rename(trend_left = trend_ma)

# Prepare for plotting
plot_data <- vehicles_align |>
  select(date, production, starts_with("trend_")) |>
  pivot_longer(
    cols = starts_with("trend_"),
    names_to = "alignment",
    values_to = "value"
  ) |>
  mutate(
    alignment = case_when(
      alignment == "trend_center" ~ "Center (default)",
      alignment == "trend_right" ~ "Right (causal)",
      alignment == "trend_left" ~ "Left (anti-causal)"
    ),
    alignment = factor(
      alignment,
      levels = c("Center (default)", "Right (causal)", "Left (anti-causal)")
    )
  )

# Plot
ggplot(plot_data, aes(x = date, y = value, color = alignment)) +
  geom_line(linewidth = 0.9, alpha = 0.8) +
  labs(
    title = "Moving Average Alignment Comparison",
    subtitle = "12-month window with different alignments",
    x = "Date",
    y = "Production (thousands of units)",
    color = "Alignment"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

Notice how:
- **Center** is smoothest and symmetric
- **Right** lags behind center (uses only past data)
- **Left** leads ahead of center (uses only future data)

### Practical Example: Real-Time Forecasting

For real-time analysis, right alignment is essential. Let's simulate what a forecaster would have seen at different points in time:

```{r realtime-example}
# Simulate real-time analysis: what would we see in Dec 2022?
cutoff_date <- as.Date("2022-12-31")

# Data available up to cutoff
historical_data <- vehicles |>
  filter(date <= cutoff_date)

# Apply right-aligned MA (what we could compute in real-time)
realtime_ma <- historical_data |>
  augment_trends(
    value_col = "production",
    methods = "ma",
    window = 12,
    align = "right"
  )

# Show last 6 months of trend
realtime_ma |>
  slice_tail(n = 6) |>
  select(date, production, trend_ma)
```

With right alignment, the trend is available immediately as new data arrives, making it suitable for real-time monitoring dashboards and nowcasting applications.

### Alignment and Missing Values

Different alignments produce NAs in different locations:

```{r na-pattern}
# Check NA pattern for each alignment
na_summary <- vehicles_align |>
  summarise(
    center_nas = sum(is.na(trend_center)),
    right_nas = sum(is.na(trend_right)),
    left_nas = sum(is.na(trend_left))
  )

na_summary
```

For a 12-month window:
- **Center**: ~6 NAs at start and ~6 at end
- **Right**: ~11 NAs at start, none at end (can compute trend up to present)
- **Left**: None at start, ~11 NAs at end

## Exponentially Weighted Moving Average (EWMA)

Unlike simple MA which weights all observations equally, EWMA gives **more weight to recent observations**. This makes it more responsive to recent changes.

### How It Works

EWMA uses a smoothing parameter α (alpha) between 0 and 1:

```
EWMA(t) = α × X(t) + (1 - α) × EWMA(t-1)
```

- Higher α (e.g., 0.7): More responsive to recent data
- Lower α (e.g., 0.1): Smoother, similar to long-window MA

### Comparing MA and EWMA

```{r ewma-comparison}
# Apply both methods separately (EWMA cannot use both window and smoothing)
# First: MA with window parameter
vehicles_ma <- vehicles_recent |>
  augment_trends(
    value_col = "production",
    methods = "ma",
    window = 12
  )

# Second: EWMA with smoothing (alpha) parameter
vehicles_ewma <- vehicles_recent |>
  augment_trends(
    value_col = "production",
    methods = "ewma",
    smoothing = 0.3
  )

# Combine the results
vehicles_ma_ewma <- vehicles_recent |>
  left_join(
    select(vehicles_ma, date, trend_ma),
    by = "date"
  ) |>
  left_join(
    select(vehicles_ewma, date, trend_ewma),
    by = "date"
  )

# Prepare for plotting
plot_data <- vehicles_ma_ewma |>
  select(date, production, trend_ma, trend_ewma) |>
  pivot_longer(
    cols = c(production, trend_ma, trend_ewma),
    names_to = "method",
    values_to = "value"
  ) |>
  mutate(
    method = case_when(
      method == "production" ~ "Original",
      method == "trend_ma" ~ "12-month MA",
      method == "trend_ewma" ~ "EWMA (α=0.3)"
    )
  )

# Plot
ggplot(plot_data, aes(x = date, y = value, color = method)) +
  geom_line(linewidth = 0.9) +
  labs(
    title = "Simple MA vs EWMA",
    subtitle = "EWMA emphasizes recent observations more than simple MA",
    x = "Date",
    y = "Production (thousands of units)",
    color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

### Choosing Alpha for EWMA

Let's see how different alpha values affect the trend:

```{r ewma-alpha}
# Test different alpha values
alphas <- c(0.1, 0.3, 0.5, 0.8)

vehicles_alphas <- vehicles_recent

for (a in alphas) {
  temp <- vehicles_recent |>
    augment_trends(value_col = "production", methods = "ewma", smoothing = a) |>
    select(trend_ewma)

  names(temp) <- paste0("ewma_", a)
  vehicles_alphas <- bind_cols(vehicles_alphas, temp)
}

# Plot
plot_data <- vehicles_alphas |>
  select(date, production, starts_with("ewma_")) |>
  pivot_longer(
    cols = c(production, starts_with("ewma_")),
    names_to = "method",
    values_to = "value"
  ) |>
  mutate(
    method = case_when(
      method == "production" ~ "Original",
      method == "ewma_0.1" ~ "α = 0.1 (smooth)",
      method == "ewma_0.3" ~ "α = 0.3",
      method == "ewma_0.5" ~ "α = 0.5",
      method == "ewma_0.8" ~ "α = 0.8 (responsive)"
    )
  )

ggplot(plot_data, aes(x = date, y = value, color = method)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "EWMA with Different Alpha Values",
    subtitle = "Higher alpha = more weight on recent data",
    x = "Date",
    y = "Production (thousands of units)",
    color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

**Guidelines for alpha**:
- Smooth trend: α = 0.1 to 0.2
- Balanced: α = 0.3 to 0.4
- Responsive: α = 0.5 to 0.7
- Very responsive: α = 0.8+

## Advanced Moving Averages

The `trendseries` package includes several advanced MA methods designed to reduce lag while maintaining smoothness.

### Comparing Advanced Methods

```{r advanced-ma}
# Apply multiple advanced MA methods
# Note: EWMA uses smoothing, other methods use window
# Apply window-based methods
vehicles_window_methods <- vehicles_recent |>
  augment_trends(
    value_col = "production",
    methods = c("ma", "wma"),
    window = 12
  )

# Apply EWMA with smoothing parameter
vehicles_ewma_method <- vehicles_recent |>
  augment_trends(
    value_col = "production",
    methods = "ewma",
    smoothing = 0.3
  )

# Combine results
vehicles_advanced <- vehicles_recent |>
  left_join(
    select(vehicles_window_methods, date, starts_with("trend_")),
    by = "date"
  ) |>
  left_join(
    select(vehicles_ewma_method, date, trend_ewma),
    by = "date"
  )

# Prepare for plotting
plot_data <- vehicles_advanced |>
  select(date, production, starts_with("trend_")) |>
  pivot_longer(
    cols = c(production, starts_with("trend_")),
    names_to = "method",
    values_to = "value"
  ) |>
  mutate(
    method = case_when(
      method == "production" ~ "Original",
      method == "trend_ma" ~ "Simple MA",
      method == "trend_ewma" ~ "EWMA",
      method == "trend_wma" ~ "Weighted MA"
    )
  )

# Plot
ggplot(plot_data, aes(x = date, y = value, color = method)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Advanced Moving Average Methods",
    subtitle = "Weighted MA and EWMA reduce lag compared to simple MA",
    x = "Date",
    y = "Production (thousands of units)",
    color = "Method"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

### Method Characteristics

| Method | Smoothness | Responsiveness | Complexity | Best For |
|--------|-----------|----------------|------------|----------|
| **MA** | High | Low | Very Simple | Stable trends, teaching |
| **EWMA** | Medium | Medium | Simple | General purpose, recent data matters |
| **Weighted MA** | Medium | Medium | Simple | Emphasizing recent observations |

## Practical Applications

### Application 1: Identifying Trend Changes

Moving averages help identify when trends change direction. Let's look at the IBC-Br economic activity index:

```{r trend-changes}
# Get recent IBC-Br data
ibcbr_recent <- ibcbr |>
  slice_tail(n = 72)

# Apply EWMA for responsiveness
ibcbr_trend <- ibcbr_recent |>
  augment_trends(
    value_col = "index",
    methods = "ewma",
    smoothing = 0.25
  )

# Prepare plot
plot_data <- ibcbr_trend |>
  select(date, index, trend_ewma) |>
  pivot_longer(
    cols = c(index, trend_ewma),
    names_to = "series",
    values_to = "value"
  ) |>
  mutate(
    series = ifelse(series == "index", "Original", "EWMA Trend")
  )

# Plot
ggplot(plot_data, aes(x = date, y = value, color = series)) +
  geom_line(linewidth = 0.9) +
  labs(
    title = "IBC-Br Economic Activity Index",
    subtitle = "EWMA trend helps identify economic turning points",
    x = "Date",
    y = "Index Value",
    color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

### Application 2: Seasonal vs Non-Seasonal Data

Moving averages work differently on seasonal data. Let's compare electricity consumption (seasonal) with vehicle production (less seasonal):

```{r seasonal-comparison}
# Get recent electricity data (seasonal)
electric_recent <- electric |>
  slice_tail(n = 60)

# Apply same 12-month MA to both series
electric_ma <- electric_recent |>
  augment_trends(value_col = "consumption", methods = "ma", window = 12)

vehicles_ma_comp <- vehicles_recent |>
  augment_trends(value_col = "production", methods = "ma", window = 12)

# Create plots
p1 <- electric_ma |>
  select(date, consumption, trend_ma) |>
  pivot_longer(cols = c(consumption, trend_ma), names_to = "series") |>
  mutate(series = ifelse(series == "consumption", "Original", "12-month MA")) |>
  ggplot(aes(x = date, y = value, color = series)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Electricity (Seasonal)",
    x = NULL,
    y = "GWh",
    color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

p2 <- vehicles_ma_comp |>
  select(date, production, trend_ma) |>
  pivot_longer(cols = c(production, trend_ma), names_to = "series") |>
  mutate(series = ifelse(series == "production", "Original", "12-month MA")) |>
  ggplot(aes(x = date, y = value, color = series)) +
  geom_line(linewidth = 0.8) +
  labs(
    title = "Vehicles (Less Seasonal)",
    x = NULL,
    y = "Thousands",
    color = NULL
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")

# Display plots
print(p1)
print(p2)
```

**Key insight**: For strongly seasonal data like electricity consumption, a 12-month MA removes the seasonal pattern effectively. For less seasonal data like vehicle production, the MA primarily smooths out irregular fluctuations.

### Application 3: Cross-Series Comparison

When comparing multiple economic indicators, moving averages help focus on the underlying trends:

```{r cross-series}
# Prepare data for three indicators
multi_series <- bind_rows(
  ibcbr_recent |>
    select(date, value = index) |>
    mutate(indicator = "Economic Activity"),

  vehicles_recent |>
    select(date, value = production) |>
    mutate(indicator = "Vehicle Production"),

  electric_recent |>
    select(date, value = consumption) |>
    mutate(indicator = "Electricity")
)

# Apply EWMA to all series
multi_trends <- multi_series |>
  group_by(indicator) |>
  augment_trends(
    value_col = "value",
    methods = "ewma",
    frequency = 12,
    smoothing = 0.2
  ) |>
  ungroup()

# Normalize trends to first observation = 100
multi_normalized <- multi_trends |>
  group_by(indicator) |>
  mutate(
    trend_normalized = (trend_ewma / first(trend_ewma)) * 100
  ) |>
  ungroup()

# Plot normalized trends
ggplot(multi_normalized, aes(x = date, y = trend_normalized, color = indicator)) +
  geom_line(linewidth = 1) +
  labs(
    title = "Comparing Economic Indicators: EWMA Trends",
    subtitle = "Normalized to first observation = 100",
    x = "Date",
    y = "Index (normalized)",
    color = "Indicator"
  ) +
  theme_minimal() +
  theme(legend.position = "bottom")
```

This reveals how different sectors of the economy moved together or diverged over time.

## Choosing the Right Moving Average

Here's a practical decision guide:

### Start Here: Basic Questions

1. **Do you need something simple and interpretable?**
   - → Use **Simple MA** with window = 12 (monthly) or 4 (quarterly)

2. **Does recent data matter more than old data?**
   - → Use **EWMA** with α = 0.2-0.4

3. **Is the trend changing quickly and you need to catch it?**
   - → Use **EWMA** with α = 0.5-0.7 or **Zero-Lag EMA**

4. **Do you need weighted averaging with more emphasis on recent data?**
   - → Use **Weighted MA** or **Zero-Lag EMA**

5. **Is your data strongly seasonal?**
   - → Consider **STL decomposition** instead (see advanced vignette)

### Parameter Selection Quick Reference

For **monthly data**:
```{r eval=FALSE}
# Conservative (smooth)
data |> augment_trends(value_col = "value", methods = "ma", window = 24)
data |> augment_trends(value_col = "value", methods = "ewma", smoothing = 0.15)

# Balanced (recommended starting point)
data |> augment_trends(value_col = "value", methods = "ma", window = 12)
data |> augment_trends(value_col = "value", methods = "ewma", smoothing = 0.3)

# Responsive (catches changes quickly)
data |> augment_trends(value_col = "value", methods = "ma", window = 6)
data |> augment_trends(value_col = "value", methods = "ewma", smoothing = 0.6)
```

For **quarterly data**:
```{r eval=FALSE}
# Conservative
data |> augment_trends(value_col = "value", methods = "ma", window = 8)

# Balanced
data |> augment_trends(value_col = "value", methods = "ma", window = 4)

# Responsive
data |> augment_trends(value_col = "value", methods = "ewma", smoothing = 0.5)
```

## Common Pitfalls and Solutions

### Pitfall 1: Window Too Small
**Problem**: Trend still looks noisy
**Solution**: Increase window size or use EWMA with lower α

### Pitfall 2: Window Too Large
**Problem**: Trend lags behind recent changes
**Solution**: Decrease window size, use EWMA/DEMA, or try Hull MA

### Pitfall 3: Missing Values at Edges
**Problem**: MA produces NA values at the start/end
**Solution**: This is expected - MAs need complete windows. Use methods like HP filter or Kalman smoother if you need values at edges.

### Pitfall 4: Using MA on Trending Data
**Problem**: MA doesn't remove overall upward/downward trend
**Solution**: Moving averages extract trends, they don't remove them. If you want to detrend data, consider first-differencing or HP filter gap analysis.

## Summary

Moving averages are versatile tools for trend extraction:

- **Simple MA**: Best for stable trends and when you need interpretability
- **EWMA**: Great general-purpose choice when recent data is more important
- **Weighted MA/Zero-Lag EMA**: Use when you need both smoothness and responsiveness

**Key parameters**:
- Window size (for MA): 12 months typical for monthly data
- Alpha (for EWMA): 0.2-0.4 for most applications

**Remember**: Always visualize your results and experiment with parameters. The "best" method and parameters depend on your specific data and analytical goals.

## Further Reading

- For seasonal data: See the "Advanced Methods" vignette on STL decomposition
- For business cycle analysis: See the "Economic Filters" vignette on HP, BK, and CF filters
- For general introduction: See the "Getting Started" vignette

## Appendix: Mathematical Details

For readers interested in the mathematical foundations:

### Simple Moving Average

$$\text{MA}_t = \frac{1}{n} \sum_{i=0}^{n-1} X_{t-i}$$

where $X_t$ is the value at time $t$, and $n$ is the window size.

### Exponentially Weighted Moving Average

$$\text{EWMA}_t = \alpha \cdot X_t + (1-\alpha) \cdot \text{EWMA}_{t-1}$$

where $0 < \alpha \leq 1$ is the smoothing parameter. Alternatively expressed as:

$$\text{EWMA}_t = \alpha \sum_{i=0}^{\infty} (1-\alpha)^i X_{t-i}$$

This shows EWMA as an infinite weighted sum with exponentially decaying weights.

### Weighted Moving Average

$$\text{WMA}_t = \frac{\sum_{i=0}^{n-1} w_i \cdot X_{t-i}}{\sum_{i=0}^{n-1} w_i}$$

where $w_i$ are the weights (typically $w_i = n-i$, giving more weight to recent observations) and $n$ is the window size.