This subsection outlines the steps for constructing, running
simulations, and estimating a univariate Hawkes model. To begin, create
an hspec object, which defines the Hawkes model. The S4
class hspec contains slots for the model parameters:
mu, alpha, beta,
dimens, rmark, and impact.
In a univariate model, the basic parameters of the
model—mu, alpha, and beta—can be
given as numeric values. If numeric values are provided, they will be
converted to matrices. Below is an example of a univariate Hawkes model
without a mark.
mu1 <- 0.3; alpha1 <- 1.2; beta1 <- 1.5
hspec1 <- new("hspec", mu = mu1, alpha = alpha1, beta = beta1)
show(hspec1)
#> An object of class "hspec" of 1-dimensional Hawkes process
#>
#> Slot mu:
#> [,1]
#> [1,] 0.3
#>
#> Slot alpha:
#> [,1]
#> [1,] 1.2
#>
#> Slot beta:
#> [,1]
#> [1,] 1.5The function hsim implements simulation where the input
arguments are hspec, size, and the initial
values of the intensity component process,
lambda_component0, and the initial values of the Hawkes
processes, N0. More precisely, the intensity process of the
basic univariate Hawkes model is represented by
\[ \lambda(t) = \mu + \int_{-\infty}^t \alpha e^{-\beta (t-s)} d N(s) = \mu + \lambda_c(0) e^{-\beta t} + \int_0^t \alpha e^{-\beta (t-s)} d N(s) \]
where the lambda_component0 denotes
\[ \lambda_c(0) = \int_{-\infty}^0 \alpha e^{\beta s} d N(s). \]
If lambda_component0 is not provided, the internally
determined initial values for the intensity process are used. If
size is sufficiently large, the exact value of
lambda_component0 may not be important. The default initial
value of the counting process, N0, is zero.
set.seed(1107)
res1 <- hsim(hspec1, size = 1000)
summary(res1)
#> -------------------------------------------------------
#> Simulation result of exponential (marked) Hawkes model.
#> Realized path :
#> arrival N1 mark lambda1
#> [1,] 0.00000 0 0 0.90000
#> [2,] 0.97794 1 1 0.43838
#> [3,] 1.09001 2 1 1.43128
#> [4,] 1.28999 3 1 2.02711
#> [5,] 1.53225 4 1 2.33527
#> [6,] 1.65001 5 1 3.01139
#> [7,] 2.51807 6 1 1.36377
#> [8,] 2.81710 7 1 1.74553
#> [9,] 2.87547 8 1 2.72378
#> [10,] 3.16415 9 1 2.65016
#> [11,] 3.51378 10 1 2.40131
#> [12,] 4.22355 11 1 1.43843
#> [13,] 16.96752 12 1 0.30000
#> [14,] 17.71654 13 1 0.69015
#> [15,] 19.10293 14 1 0.49874
#> [16,] 24.06354 15 1 0.30082
#> [17,] 24.09256 16 1 1.44967
#> [18,] 28.40173 17 1 0.30366
#> [19,] 28.53743 18 1 1.28198
#> [20,] 28.56702 19 1 2.38725
#> ... with 980 more rows
#> -------------------------------------------------------The results of hsim is an S3 class hreal,
which consists of hspec, inter_arrival,
arrival, type, mark,
N, Nc, lambda,
lambda_component, rambda,
rambda_component.
hspec is the model specification.
inter_arrival is the inter-arrival time of every
event.
arrival is the cumulative sum of
inter_arrival.
type is the type of events, i.e., \(i\) for \(N_i\), and is used for a multivariate
model.
mark is a numeric vector that represents additional
information for the event.
lambda represents \(\lambda\), which is the left continuous and
right limit version.
The right continuous version of intensity is
rambda.
lambda_component represents \(\lambda_{ij}\), and
rambda_component is the right continuous version.
inter_arrival, type, mark,
N, and Nc start at zero. Using the
summary() function, one can print the first 20 elements of
arrival, N, and lambda. The
print() function can also be used.
By definition, we have
lambda == mu + lambda_component.
# first and third columns are the same
cbind(res1$lambda[1:5], res1$lambda_component[1:5], mu1 + res1$lambda_component[1:5])
#> [,1] [,2] [,3]
#> [1,] 0.900000 0.600000 0.900000
#> [2,] 0.438383 0.138383 0.438383
#> [3,] 1.431282 1.131282 1.431282
#> [4,] 2.027111 1.727111 2.027111
#> [5,] 2.335269 2.035269 2.335269For all rows except the first, rambda equals
lambda + alpha in this model.
# second and third columns are the same
cbind(res1$lambda[1:5], res1$rambda[1:5], res1$lambda[1:5] + alpha1)
#> [,1] [,2] [,3]
#> [1,] 0.900000 0.900000 2.100000
#> [2,] 0.438383 1.638383 1.638383
#> [3,] 1.431282 2.631282 2.631282
#> [4,] 2.027111 3.227111 3.227111
#> [5,] 2.335269 3.535269 3.535269Additionally, verify that the exponential decay is accurately represented in the model.
# By definition, the following two are equal:
res1$lambda[2:6]
#> [1] 0.438383 1.431282 2.027111 2.335269 3.011391
mu1 + (res1$rambda[1:5] - mu1) * exp(-beta1 * res1$inter_arrival[2:6])
#> [1] 0.438383 1.431282 2.027111 2.335269 3.011391The log-likelihood function is calculated using the
logLik method. In this context, the inter-arrival times and
hspec are provided as inputs to the function.
logLik(hspec1, inter_arrival = res1$inter_arrival)
#> The initial values for intensity processes are not provided. Internally determined initial values are used.
#> loglikelihood
#> -214.2385The likelihood estimation is performed using the hfit
function. The specification of the initial parameter values,
hspec0, is required. Note that only
inter_arrival is needed for this univariate model. For more
accurate simulation, it is recommended to specify lambda0,
the initial value of the lambda component. If lambda0 is
not provided, the function uses internally determined initial values. By
default, the BFGS method is employed for numerical optimization.
# initial value for numerical optimization
mu0 <- 0.5; alpha0 <- 1.0; beta0 <- 1.8
hspec0 <- new("hspec", mu = mu0, alpha = alpha0, beta = beta0)
# the intial values are provided through hspec
mle <- hfit(hspec0, inter_arrival = res1$inter_arrival)
summary(mle)
#> --------------------------------------------
#> Maximum Likelihood estimation
#> BFGS maximization, 24 iterations
#> Return code 0: successful convergence
#> Log-Likelihood: -213.4658
#> 3 free parameters
#> Estimates:
#> Estimate Std. error t value Pr(> t)
#> mu1 0.33641 0.03475 9.682 <2e-16 ***
#> alpha1 1.16654 0.09608 12.141 <2e-16 ***
#> beta1 1.52270 0.12468 12.213 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------The intensity process of a basic bivariate Hawkes model is defined by
\[ \lambda_1(t) = \mu_1 + \int_{-\infty}^t \alpha_{11} e^{-\beta_{11}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{12} e^{-\beta_{12}(t-s)} d N_2(s), \]
\[ \lambda_2(t) = \mu_2 + \int_{-\infty}^t \alpha_{21} e^{-\beta_{21}(t-s)} d N_1(s) + \int_{-\infty}^t \alpha_{22} e^{-\beta_{22}(t-s)} d N_2(s). \]
In a bivariate model, the parameters within the slots of
hspec are matrices. Specifically, mu is a
2-by-1 matrix, while alpha and beta are 2-by-2
matrices.
\[ \mu = \begin{bmatrix} \mu_1 \\ \mu_2 \end{bmatrix}, \quad \alpha = \begin{bmatrix} \alpha_{11} & \alpha_{12} \\ \alpha_{21} & \alpha_{22} \end{bmatrix}, \quad \beta = \begin{bmatrix} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{bmatrix} \]
rmark is a random number generating function for marks
and is not used in non-mark models. lambda_component0 is a
2-by-2 matrix that represents the initial values of
lambda_component, which includes the set of values
lambda11, lambda12, lambda21, and
lambda22. The intensity processes are represented by
\[ \lambda_1(t) = \mu_1 + \lambda_{11}(t) + \lambda_{12}(t), \]
\[ \lambda_2(t) = \mu_2 + \lambda_{21}(t) + \lambda_{22}(t). \]
The terms \(\lambda_{ij}\) are
referred to as lambda components, and lambda0 represents
$_{ij}(0). The parameterlambda_component0` can be omitted
in this model, in which case internally determined initial values will
be used.
mu2 <- matrix(c(0.2), nrow = 2)
alpha2 <- matrix(c(0.5, 0.9, 0.9, 0.5), nrow = 2, byrow = TRUE)
beta2 <- matrix(c(2.25, 2.25, 2.25, 2.25), nrow = 2, byrow = TRUE)
hspec2 <- new("hspec", mu=mu2, alpha=alpha2, beta=beta2)
print(hspec2)
#> An object of class "hspec" of 2-dimensional Hawkes process
#>
#> Slot mu:
#> [,1]
#> [1,] 0.2
#> [2,] 0.2
#>
#> Slot alpha:
#> [,1] [,2]
#> [1,] 0.5 0.9
#> [2,] 0.9 0.5
#>
#> Slot beta:
#> [,1] [,2]
#> [1,] 2.25 2.25
#> [2,] 2.25 2.25To perform a simulation, use the hsim function.
set.seed(1107)
res2 <- hsim(hspec2, size=1000)
summary(res2)
#> -------------------------------------------------------
#> Simulation result of exponential (marked) Hawkes model.
#> Realized path :
#> arrival N1 N2 mark lambda1 lambda2
#> [1,] 0.00000 0 0 0 0.52941 0.52941
#> [2,] 0.57028 1 0 1 0.29130 0.29130
#> [3,] 1.66175 1 1 1 0.25073 0.28505
#> [4,] 2.17979 1 2 1 0.49638 0.38238
#> [5,] 2.47685 1 3 1 0.81319 0.54975
#> [6,] 2.64001 2 3 1 1.24825 0.78866
#> [7,] 2.70249 3 3 1 1.54519 1.49341
#> [8,] 2.94547 4 3 1 1.26810 1.46968
#> [9,] 3.39313 4 4 1 0.77271 0.99242
#> [10,] 3.52533 4 5 1 1.29379 1.15989
#> [11,] 3.56971 5 5 1 2.00432 1.52115
#> [12,] 3.70761 5 6 1 1.88965 1.82866
#> [13,] 4.30122 5 7 1 0.88106 0.75983
#> [14,] 4.34337 6 7 1 1.63800 1.16393
#> [15,] 4.40222 7 7 1 1.89764 1.83275
#> [16,] 4.58943 8 7 1 1.64219 1.86211
#> [17,] 5.14665 9 7 1 0.75437 0.93131
#> [18,] 5.18186 9 8 1 1.17407 1.70707
#> [19,] 5.36167 9 9 1 1.45050 1.53925
#> [20,] 5.89118 10 9 1 0.85331 0.75875
#> ... with 980 more rows
#> -------------------------------------------------------In multivariate models, type is crucial as it represents
the type of event.
In multivariate models, the column names of N are
N1, N2, N3, and so on.
Similarly, the column names of lambda are
lambda1, lambda2, lambda3, and so
on.
res2$lambda[1:3, ]
#> lambda1 lambda2
#> [1,] 0.5294118 0.5294118
#> [2,] 0.2913028 0.2913028
#> [3,] 0.2507301 0.2850475The column names of lambda_component are
lambda_component11, lambda_component12,
lambda_component13, and so on.
res2$lambda_component[1:3, ]
#> lambda11 lambda12 lambda21 lambda22
#> [1,] 0.11764706 0.211764706 0.21176471 0.117647059
#> [2,] 0.03260813 0.058694641 0.05869464 0.032608134
#> [3,] 0.04569443 0.005035631 0.08224997 0.002797573By definition, the following two expressions are equivalent:
mu2[1] + rowSums(res2$lambda_component[1:5, c("lambda11", "lambda12")])
#> [1] 0.5294118 0.2913028 0.2507301 0.4963769 0.8131889
res2$lambda[1:5, "lambda1"]
#> [1] 0.5294118 0.2913028 0.2507301 0.4963769 0.8131889From the results, we obtain vectors of realized
inter_arrival and type. A bivariate model
requires both inter_arrival and type for
estimation.
The log-likelihood is computed using the logLik
function.
logLik(hspec2, inter_arrival = inter_arrival2, type = type2)
#> The initial values for intensity processes are not provided. Internally determined initial values are used.
#> loglikelihood
#> -974.2809Maximum log-likelihood estimation is performed using the
hfit function. In this process, the parameter values in
hspec0, such as mu, alpha, and
beta, serve as starting points for the numerical
optimization. For illustration purposes, we set
hspec0 <- hspec2. Since the true parameter values are
unknown in practical applications, these initial guesses are used. The
realized inter_arrival and type data are
utilized for estimation.
hspec0 <- hspec2
mle <- hfit(hspec0, inter_arrival = inter_arrival2, type = type2)
summary(mle)
#> --------------------------------------------
#> Maximum Likelihood estimation
#> BFGS maximization, 36 iterations
#> Return code 0: successful convergence
#> Log-Likelihood: -970.1408
#> 4 free parameters
#> Estimates:
#> Estimate Std. error t value Pr(> t)
#> mu1 0.19095 0.01636 11.671 < 2e-16 ***
#> alpha1.1 0.48217 0.07405 6.511 7.45e-11 ***
#> alpha1.2 0.98625 0.09495 10.387 < 2e-16 ***
#> beta1.1 2.07987 0.16952 12.269 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> --------------------------------------------miscTools::stdEr(mle)
#> mu1 alpha1.1 alpha1.2 beta1.1
#> 0.01636127 0.07405118 0.09495461 0.16952428Also see the extended vignette on GitHub.