--- title: "Hurdle Demand Models" author: "Brent Kaplan" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Hurdle Demand Models} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dev = "png", dpi = 144, fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) library(beezdemand) ``` ## Introduction The `beezdemand` package includes functionality for fitting **two-part mixed effects hurdle demand models** using Template Model Builder (TMB). This approach is particularly useful when: - Zero consumption values are **informative** (i.e., represent true non-consumption rather than censored data) - You want to model both the **probability of consuming** and the **intensity of consumption** simultaneously - Individual-level heterogeneity is important for both parts of the model ### When to Use Hurdle Models vs. Standard Models | Scenario | Recommended Approach | |----------|---------------------| | Few zeros, zeros are measurement artifacts | `fit_demand_fixed()` or `fit_demand_mixed()` | | Many zeros, zeros represent true non-consumption | `fit_demand_hurdle()` | | Need to understand factors affecting whether someone consumes | `fit_demand_hurdle()` | | Need individual-level estimates of "quitting price" | `fit_demand_hurdle()` | ## Model Specification The hurdle model has two parts: ### Part I: Binary Model (Probability of Zero Consumption) $$\text{logit}(\pi_{ij}) = \beta_0 + \beta_1 \cdot \log(\text{price} + \epsilon) + a_i$$ Where: - $\pi_{ij}$ = probability of zero consumption for subject $i$ at price $j$ - $\beta_0$ = intercept (baseline log-odds of zero consumption) - $\beta_1$ = slope (effect of log-price on log-odds of zero) - $\epsilon$ = small constant (default 0.001) for handling zero prices - $a_i$ = subject-specific random intercept ### Part II: Continuous Model (Consumption Given Positive) **With 3 random effects:** $$\log(Q_{ij}) = (\log Q_0 + b_i) + k \cdot (\exp(-(\alpha + c_i) \cdot \text{price}) - 1) + \varepsilon_{ij}$$ **With 2 random effects:** $$\log(Q_{ij}) = (\log Q_0 + b_i) + k \cdot (\exp(-\alpha \cdot \text{price}) - 1) + \varepsilon_{ij}$$ Where: - $Q_0$ = intensity (consumption at price 0) - $k$ = scaling parameter for exponential decay - $\alpha$ = elasticity parameter - $b_i$ = subject-specific random effect on intensity - $c_i$ = subject-specific random effect on elasticity (3-RE model only) ### Random Effects Structure The random effects follow a multivariate normal distribution: - **2-RE model:** $(a_i, b_i) \sim \text{MVN}(0, \Sigma_{2 \times 2})$ - **3-RE model:** $(a_i, b_i, c_i) \sim \text{MVN}(0, \Sigma_{3 \times 3})$ ## Getting Started ### Data Requirements Your data should be in **long format** with columns for: - Subject identifier - Price - Consumption (including zeros) ```{r data-example, echo=TRUE} library(beezdemand) # Example data structure knitr::kable( head(apt, 10), caption = "Example APT data structure (first 10 rows)" ) ``` ### Basic Model Fitting ```{r basic-fit} # Fit 2-RE model (simpler, faster) fit2 <- fit_demand_hurdle( data = apt, y_var = "y", x_var = "x", id_var = "id", random_effects = c("zeros", "q0"), # 2 random effects verbose = 0 ) ``` ```{r fit-3re, eval=FALSE} # Fit 3-RE model (more flexible) fit3 <- fit_demand_hurdle( data = apt, y_var = "y", x_var = "x", id_var = "id", random_effects = c("zeros", "q0", "alpha"), # 3 random effects verbose = 1 ) ``` ### Interpreting Output ```{r summary-example} # View summary summary(fit2) # Extract coefficients coef(fit2) # Standardized tidy summaries tidy(fit2) |> head() glance(fit2) # Get subject-specific parameters head(get_subject_pars(fit2)) ``` ## Diagnostics Use `check_demand_model()` and the plotting helpers for quick post-fit checks. ```{r diagnostics} check_demand_model(fit2) plot_residuals(fit2)$fitted plot_qq(fit2) ``` ## Understanding Results ### Fixed Effects | Parameter | Interpretation | |-----------|---------------| | `beta0` | Part I intercept: baseline log-odds of zero consumption | | `beta1` | Part I slope: change in log-odds per unit increase in log(price) | | `logQ0` | Log of intensity parameter (population average) | | `k` | Scaling parameter for demand decay | | `alpha` | Elasticity parameter (population average for 2-RE, mean for 3-RE) | ### Subject-Specific Parameters The `subject_pars` data frame contains: | Parameter | Description | |-----------|-------------| | `a_i` | Random effect for Part I (zeros probability) | | `b_i` | Random effect for Part II (intensity) | | `c_i` | Random effect for alpha (3-RE model only) | | `Q0` | Individual intensity: $\exp(\log Q_0 + b_i)$ | | `alpha` | Individual elasticity: $\alpha + c_i$ (or just $\alpha$ for 2-RE) | | `breakpoint` | Price where P(zero) = 0.5: $\exp(-(\beta_0 + a_i) / \beta_1) - \epsilon$ | | `Pmax` | Price at maximum expenditure | | `Omax` | Maximum expenditure | ## Model Selection: 2-RE vs 3-RE ### When to Use Each - **2-RE model**: When you believe elasticity ($\alpha$) is relatively constant across subjects - **3-RE model**: When you believe elasticity varies meaningfully between subjects ### Likelihood Ratio Test ```{r model-comparison, eval=FALSE} # Compare nested models compare_hurdle_models(fit3, fit2) # Unified model comparison (AIC/BIC + LRT when appropriate) compare_models(fit3, fit2) # Output: # Likelihood Ratio Test # ===================== # Model n_RE LogLik df AIC BIC # Full (3 RE) 3 -1234.56 12 2493.12 2543.21 # Reduced (2 RE) 2 -1245.78 9 2509.56 2547.89 # # LR statistic: 22.44 # df: 3 # p-value: 5.2e-05 ``` A significant p-value suggests the 3-RE model provides a better fit. ## Visualization ```{r plotting-demand, fig.cap="Population demand curve from 2-RE hurdle model."} # Population demand curve plot(fit2, type = "demand") ``` ```{r plotting-probability, fig.cap="Probability of zero consumption as a function of price."} # Probability of zero consumption plot(fit2, type = "probability") ``` ```{r plotting-extra, eval=FALSE} # Distribution of individual parameters plot(fit2, type = "parameters") plot(fit2, type = "parameters", parameters = c("Q0", "alpha", "Pmax")) # Individual demand curves plot(fit2, type = "individual", subjects = c("1", "2", "3", "4")) # Single subject with observed data plot_subject(fit2, subject_id = "1", show_data = TRUE, show_population = TRUE) ``` ## Simulation and Validation ### Simulating Data The `simulate_hurdle_data()` function generates data from the hurdle model: ```{r simulate-basic, eval=FALSE} # Simulate with default parameters sim_data <- simulate_hurdle_data( n_subjects = 100, seed = 123 ) head(sim_data) # id x y delta a_i b_i # 1 1 0.00 12.345678 0 -0.523456 0.1234567 # 2 1 0.50 11.234567 0 -0.523456 0.1234567 # ... # Custom parameters sim_custom <- simulate_hurdle_data( n_subjects = 100, logQ0 = log(15), # Q0 = 15 alpha = 0.1, # Lower elasticity k = 3, # Higher k (ensures Pmax exists) stop_at_zero = FALSE, # All prices for all subjects seed = 456 ) ``` ### Monte Carlo Simulation Studies The `run_hurdle_monte_carlo()` function assesses model performance through simulation. **Note**: Monte Carlo simulations are computationally intensive and not run during vignette building. The example below shows typical usage and expected output format. ```{r monte-carlo, eval=FALSE} # Run Monte Carlo study mc_results <- run_hurdle_monte_carlo( n_sim = 100, # Number of simulations n_subjects = 100, # Subjects per simulation n_random_effects = 2, # 2-RE model verbose = TRUE, seed = 123 ) # View summary print_mc_summary(mc_results) # Monte Carlo Simulation Summary # ============================== # # Simulations: 100 attempted, 95 converged (95.0%) # # Parameter True Mean_Est Bias Rel_Bias% Emp_SE Mean_SE SE_Ratio Coverage_95% N # beta0 -2.0 -2.01 -0.01 0.5 0.12 0.11 0.92 94.7 95 # beta1 1.0 1.02 0.02 2.0 0.08 0.08 1.00 95.8 95 # logQ0 2.3 2.29 -0.01 -0.4 0.05 0.05 1.00 94.7 95 # k 2.0 2.03 0.03 1.5 0.15 0.14 0.93 93.7 95 # alpha 0.5 0.51 0.01 2.0 0.04 0.04 1.00 95.8 95 # ... ``` ### Interpreting Monte Carlo Results | Metric | Target | Interpretation | |--------|--------|---------------| | Bias | ~0 | Estimates should be unbiased | | Relative Bias | < 5% | Acceptable bias relative to true value | | SE Ratio | ~1.0 | Model SEs match empirical variability | | Coverage 95% | ~95% | CIs should contain true value 95% of time | ## Integration with beezdemand Workflow ### Combining with Other Analyses ```{r integration, eval=FALSE} # Fit hurdle model hurdle_fit <- fit_demand_hurdle(data, y_var = "y", x_var = "x", id_var = "id", random_effects = c("zeros", "q0"), verbose = 0 ) # Extract subject parameters hurdle_pars <- get_subject_pars(hurdle_fit) # These can be merged with other analyses # e.g., correlate with individual characteristics cor(hurdle_pars$Q0, subject_characteristics$age) cor(hurdle_pars$breakpoint, subject_characteristics$dependence_score) ``` ### Exporting Results ```{r export, eval=FALSE} # Subject parameters write.csv(get_subject_pars(hurdle_fit), "hurdle_subject_parameters.csv") # Summary statistics summ <- get_hurdle_param_summary(hurdle_fit) print(summ) ``` ## Technical Details ### TMB Backend The hurdle model is implemented using Template Model Builder (TMB), which provides: - Efficient Laplace approximation for marginal likelihood - Automatic differentiation for fast optimization - Standard errors via the delta method ### Control Parameters ```{r control-params, eval=FALSE} fit <- fit_demand_hurdle( data, y_var = "y", x_var = "x", id_var = "id", epsilon = 0.001, # Constant for log(price + epsilon) tmb_control = list( max_iter = 200, # Maximum iterations eval_max = 1000, # Maximum function evaluations trace = 0 # Optimization trace level ), verbose = 1 # 0 = silent, 1 = progress, 2 = detailed ) ``` ### Custom Starting Values For difficult optimization problems: ```{r custom-starts, eval=FALSE} custom_starts <- list( beta0 = -3.0, beta1 = 1.5, logQ0 = log(8), k = 2.5, alpha = 0.1, logsigma_a = 0.5, logsigma_b = -0.5, logsigma_e = -1.0, rho_ab_raw = 0 ) fit <- fit_demand_hurdle(data, y_var = "y", x_var = "x", id_var = "id", start_values = custom_starts, verbose = 1 ) ``` ## References Zhao, T., Luo, X., Chu, H., Le, C.T., Epstein, L.H. and Thomas, J.L. (2016), A two-part mixed effects model for cigarette purchase task data. *Jrnl Exper Analysis Behavior*, 106: 242-253. https://doi.org/10.1002/jeab.228 Hursh, S. R., & Silberberg, A. (2008). Economic demand and essential value. *Psychological Review*, 115(1), 186-198. ## See Also - `vignette("beezdemand")` -- Getting started with beezdemand - `vignette("model-selection")` -- Choosing the right model class - `vignette("fixed-demand")` -- Fixed-effect demand modeling - `vignette("mixed-demand")` -- Mixed-effects nonlinear demand models - `vignette("mixed-demand-advanced")` -- Advanced mixed-effects topics - `vignette("cross-price-models")` -- Cross-price demand analysis - `vignette("group-comparisons")` -- Group comparisons - `vignette("migration-guide")` -- Migrating from `FitCurves()`