The gmwmx2
R
package allows to estimate
linear model with correlated residuals in presence of missing data.
More precisely, we assume the following model:
where X ∈ ℝn × p is a design matrix of observed predictors, β ∈ ℝp is the regression parameter vector and ε = {εi}i = 1, …, n is a zero-mean process following an unspecified joint distribution ℱ with positive-definite covariance function Σ(γ) ∈ ℝn × n characterizing the second-order dependence structure of the process and parameterized by the vector γ ∈ Γ ⊂ ℝq.
We then define the a random variable Z = {Zi}i = 1, …, n which describes the missing observation mechanism. More specifically, the vector Z ∈ {0, 1}n is a binary-valued stationary process independent of Y with expectation μ(ϑ) = 𝔼[Zi] ∈ [0, 1), ∀ i, and covariance matrix Λ(ϑ) = 𝕍[Z] ∈ ℝn × n whose structure is assumed known up to the parameter vector ϑ ∈ Υ ⊂ ℝk
We then assume that we only observe the stochastic process $\tilde{\boldsymbol{Y}} = \boldsymbol{Z} \odot \boldsymbol{Y}$, where ⊙ denotes the Hadamard product. Hence, $\tilde{\boldsymbol{Y}} \in \mathbb{R}^n = \boldsymbol{Y} \odot \boldsymbol{Z}$ represents the observed process vector with null elements in the positions where observations are missing.
Using ⊗ to denote the Kronecker product, we then define $\tilde{\boldsymbol{X}} = \boldsymbol{Z} \otimes \boldsymbol{1}^T \odot \boldsymbol{X} \in \mathbb{R}^{n \times p}$ as the design matrix X with zero-valued vectors for the rows where observations are missing in Y (where 1 represents a vector of ones of dimension p).
We estimate the parameters β with the least square estimator:
We compute the estimated residuals as $\hat{\boldsymbol{\varepsilon}} = {\tilde{\boldsymbol{Y}}} - \tilde{{\boldsymbol{X}}} \hat{\boldsymbol{\beta}}$.
We then estimate with the Maximum Likelihood Estimator the parameters ϑ of the missingness process Z assuming that Z is generated from a Markov model with the following transition probabilities:
We then estimate the parameters γ using a Generalized method of Wavelet Moments approach (Guerrier et al. 2013) and using the fact that the variance-covariance matrix of $\hat{\boldsymbol{\varepsilon}}$ is given by:
$$\boldsymbol{\Sigma}_{\hat{\boldsymbol{\varepsilon}}}(\boldsymbol{\gamma}) = [\boldsymbol{\Lambda}(\hat{\boldsymbol{\vartheta}}) + \mu(\hat{\boldsymbol{\vartheta}})^2 \mathbf{1} \mathbf{1}^T ] \odot ( \boldsymbol{I} - \boldsymbol{P})\boldsymbol{\Sigma}(\boldsymbol{\gamma}) (\boldsymbol{I} - \boldsymbol{P})$$
where P = X(XTX)−1XT and I is the identity matrix of dimension n × n.
More precisely, we rely on the result of Xu et al. (2017) that provide a computationally efficient form of the theoretical Allan variance (equivalent to the Haar wavelet variance up to a constant) for zero-mean stochastic processes such as ε to avoid computing these large matrices multiplication in the objective function. Indeed in Xu et al. (2017) they generalize the results in Zhang (2008) to zero-mean non-stationary processes by using averages of the diagonals and super-diagonals of the covariance matrix of ε. What this implies is that the GMWM, which uses this form, does not require the storage of the n × n covariance matrix of ε, but only of a vector of dimension n which is then plugged into an explicit formula consisting in a linear combination of the elements of this vector (these elements being averages of the diagonal and super-diagonals of the covariance matrix).
While the GMWMX as described above and in more details in Voirol et al. (2024), is a general method for
estimating large linear model with complex dependence structure in
presence of missing data, the gmwmx2
R
package
is currently developed specifically to estimate tectonic velocities from
position times series in graticule distance coordinates system (GD)
provided by the Nevada geodetic Laboratory (Blewitt 2024; Blewitt, Hammond, and Kreemer
2018).
To estimate the trajectory model (see e.g., Bevis and Brown (2014) for more details), we construct the design matrix X such that i-th component of the vector Xβ can be described as follows with ti representing the ith ordered time point (epoch) indicated in Modified Julian Date and t0 representing the reference epoch located exactly in the middle of start and end point of the time series:
where a is the initial position at the reference epoch t0, b is the velocity parameter, ch and dh are the periodic motion parameters (h = 1 and h = 2 represent the annual and semi-annual seasonal terms, respectively with $f_1 = \frac{1}{365.25}$ and $f_2 = \frac{2}{365.25}$). The offset terms models earthquakes, equipment changes or human intervention in which ej is the magnitude of the step at epochs tj, r is the total number of offsets, H is the Heaviside step function defined as $H(x):= \begin{cases}1, & x \geq 0 \\ 0, & x<0\end{cases}$ and the last term allow to model post-seismic deformation (see e.g., Sobrero et al. (2020)) where s is the number of post seismic relaxation time specified, tk is the time when the relaxation k starts in Modified Julian Date (MJD), τk is the relaxation period in days for the post-seismic relaxation k and lk is the amplitude of the transient. Note that by default the estimates of the functional parameters are provided in unit/day.
When loading data from a specific station using the function
gmwmx2::download_station_ngl()
, we extract from the Nevada
Geodetic Laboratory the position time series in GD coordinates, the time
steps associated with a equipment or software change and the time steps
associated with an earthquake near the station. All these objects are
stored in a object of class gnss_ts_ngl
.
When applying the function gmwmx2::gmwmx2()
to an object
of class gnss_ts_ngl
, we construct the design matrix X by considering an offset
term for all equipment or software changes steps and all earthquakes
indicated by the NGL. We also specify a post-seismic relaxation term for
all earthquakes indicated by the NGL. If no relaxation time is specified
in the argument vec_earthquakes_relaxation_time
, we
consider a default relaxation time of 365.25 days.
It is generally recognized that noise in GNSS time series is best described by a combination of colored noise plus white noise (He et al. 2017; Langbein 2008; Williams et al. 2004; Bos et al. 2013) where the white noise generally model noise at high frequencies and the colored noise model the lower frequencies of the spectrum. In a large study on the noise properties of GNSS signals, concluded that the optimal noise models for 80–90% of GNSS time series signals are the power law and white noise model or white noise and flicker/pink noise with model. The package currently support both stochastic model specification.
More precisely, the power spectrum of a power-law noise has the following form: P(f) = P0(f/fs)κ where f is the frequency, P0 is a constant, fs the sampling frequency and the exponent κ is called the spectral index.
Many stochastic noise can be expressed as such, for example, a spectral index κ = 0 produces a white noise, a spectral index κ = −2 produces a red noise or random walk and a spectral index κ = −1 produce a flicker noise, also called pink noise.
Granger (1980) and Hosking (1981) showed that power-law noise with a spectral index between −1 and 1 can be obtained by using fractional differencing of Gaussian noise:
(1 − B)−κ/2v
where B is the backward-shift operator (Bvi = vi − 1) and v a vector with independent and identically distributed (IID) Gaussian noise.
Following from Hosking’s definition of the fractional differencing, we obtain
$$ \begin{aligned} (1-B)^{-\kappa / 2} & =\sum_{i=0}^{\infty}\binom{-\kappa / 2}{i}(-B)^i \\ & =1-\frac{\kappa}{2} B-\frac{1}{2} \frac{\kappa}{2}\left(1-\frac{\kappa}{2}\right) B^2+\ldots \\ & =\sum_{i=0}^{\infty} h_i \end{aligned} $$ with the coefficient hi that can be computed using the following recurrence relation (Kasdin and Walter 1992):
$$ \begin{aligned} h_0 & =1 \\ h_i & =\left(i-\frac{\kappa}{2}-1\right) \frac{h_{i-1}}{i} \quad \text { for } i>0 \end{aligned} $$
Assuming an infinite sequence of zero-mean white noise v, with variance σPL2, and a spectral index kappa > −1 then the autocovariance γ(τ) = Cov (Xt, Xt + τ) = 𝔼[(Xt − μ)(Xt + τ − μ)] is (Bos et al. 2008):
$$
\begin{aligned}
& \gamma(0)=\sigma_{p l}^2
\frac{\Gamma(1-\alpha)}{\left(\Gamma\left(1-\frac{\alpha}{2}\right)\right)^2}
\\
&
\gamma(\tau)=\frac{\frac{\alpha}{2}+\tau-1}{-\frac{\alpha}{\alpha}+\tau}
\gamma(\tau-1) \text { for } \tau>0
\end{aligned}
$$ When the argument stochastic_model
is set to
"wn + pl"
, the stochastic model considered includes both
white noise and power-law with the specified above stationary
autocovariance structure. The parameters estimated are: σWN2,
κ (constrained to be greater
than −1) and σPL2.
When the argument stochastic_model
is set to
"wn + fl"
, the stochastic model considered includes both
white noise and flicker noise (not stationary power-law noise with a
spectral index κ = −1) where
the variance covariance of the flicker noise ω is obtained as follows (see e.g.,
(Bos et al. 2008)):
Cov (ω) = σFL2UTU
where $$ \mathbf{U}=\left(\begin{array}{cccc} h_0 & h_1 & \ldots & h_n \\ 0 & h_0 & & h_{n-1} \\ \vdots & & \ddots & \vdots \\ 0 & 0 & \ldots & h_0 \end{array}\right) $$ with the coefficients hi computed considering a spectral index κ = −1.
The stochastic parameters estimated are: σWN2, σFL2 and κ is fixed to −1.
Let us showcase how to estimate the tectonic velocity in for one specific component (North, East or Vertical) of one station.
Let us first load the gmwmx2
package.
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter Estimate Std_Deviation 95% CI Lower 95% CI Upper
## -------------------------------------------------------------
## Intercept -0.70924596 0.00876879 -0.72643246 -0.69205945
## Trend 0.00007198 0.00000833 0.00005565 0.00008831
## Sin (Annual) 0.00135377 0.00034179 0.00068387 0.00202367
## Cos (Annual) -0.00062838 0.00036176 -0.00133741 0.00008065
## Sin (Semi-Annual) 0.00017180 0.00023845 -0.00029556 0.00063915
## Cos (Semi-Annual) 0.00030877 0.00024595 -0.00017328 0.00079082
## Jump: MJD 56248 0.01588617 0.00156627 0.01281633 0.01895601
## Jump: MJD 55391 0.00780478 0.00153504 0.00479616 0.01081339
## Jump: MJD 55563 0.00230763 0.00174920 -0.00112074 0.00573600
## Jump: MJD 55603 0.00286880 0.00237791 -0.00179182 0.00752942
## Jump: MJD 55606 -0.00044386 0.00351636 -0.00733580 0.00644808
## Jump: MJD 56011 -0.00008945 0.00165713 -0.00333736 0.00315846
## Jump: MJD 56085 0.00073436 0.00170239 -0.00260227 0.00407099
## Jump: MJD 56748 -0.00038578 0.00139555 -0.00312101 0.00234945
## Jump: MJD 57281 -0.00145742 0.00153903 -0.00447386 0.00155901
## Earthquake: MJD 55391 0.01576163 0.00742010 0.00121850 0.03030476
## Earthquake: MJD 55563 0.00174787 0.02573410 -0.04869004 0.05218579
## Earthquake: MJD 55603 -0.28371260 0.53172586 -1.32587613 0.75845094
## Earthquake: MJD 55606 0.27719703 0.52581943 -0.75339012 1.30778419
## Earthquake: MJD 56011 -0.00480858 0.01325307 -0.03078411 0.02116696
## Earthquake: MJD 56085 -0.00772227 0.01245006 -0.03212393 0.01667940
## Earthquake: MJD 56748 -0.00910404 0.00542215 -0.01973126 0.00152318
## Earthquake: MJD 57281 -0.02236505 0.00645616 -0.03501889 -0.00971122
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
## White Noise Variance : 0.00000122
## Stationary powerlaw Spectral index: -0.83907240
## Stationary powerlaw Variance: 0.00000336
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
## P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
## P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
## \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 0.17 seconds
## -------------------------------------------------------------
By default, the estimated parameters are provided in m/day, we can
optionally scale the estimated functional parameters so that they are
returned in m/year with the argument scale_parameters
.
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter Estimate Std_Deviation 95% CI Lower 95% CI Upper
## -------------------------------------------------------------
## Intercept -259.05208570 3.20279877 -265.32945595 -252.77471545
## Trend 0.02629027 0.00304291 0.02032629 0.03225426
## Sin (Annual) 0.49446304 0.12483968 0.24978175 0.73914432
## Cos (Annual) -0.22951487 0.13213206 -0.48848895 0.02945921
## Sin (Semi-Annual) 0.06274864 0.08709451 -0.10795347 0.23345075
## Cos (Semi-Annual) 0.11277837 0.08983307 -0.06329122 0.28884795
## Jump: MJD 56248 5.80242306 0.57208186 4.68116322 6.92368289
## Jump: MJD 55391 2.85069445 0.56067178 1.75179795 3.94959095
## Jump: MJD 55563 0.84286159 0.63889600 -0.40935155 2.09507473
## Jump: MJD 55603 1.04782942 0.86853158 -0.65446120 2.75012003
## Jump: MJD 55606 -0.16211875 1.28435053 -2.67939952 2.35516203
## Jump: MJD 56011 -0.03267118 0.60526627 -1.21897127 1.15362890
## Jump: MJD 56085 0.26822620 0.62179917 -0.95047779 1.48693018
## Jump: MJD 56748 -0.14090556 0.50972464 -1.13994749 0.85813637
## Jump: MJD 57281 -0.53232443 0.56212924 -1.63407749 0.56942863
## Earthquake: MJD 55391 5.75693615 2.71019228 0.44505689 11.06881541
## Earthquake: MJD 55563 0.63841028 9.39938149 -17.78403892 19.06085948
## Earthquake: MJD 55603 -103.62602673 194.21287019 -484.27625764 277.02420417
## Earthquake: MJD 55606 101.24621682 192.05554824 -275.17574076 477.66817439
## Earthquake: MJD 56011 -1.75633307 4.84068270 -11.24389683 7.73123069
## Earthquake: MJD 56085 -2.82055781 4.54738426 -11.73326718 6.09215156
## Earthquake: MJD 56748 -3.32525134 1.98044104 -7.20684446 0.55634177
## Earthquake: MJD 57281 -8.16883558 2.35811162 -12.79064944 -3.54702173
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
## White Noise Variance : 0.00000122
## Stationary powerlaw Spectral index: -0.83907240
## Stationary powerlaw Variance: 0.00000336
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
## P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
## P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
## \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 0.17 seconds
## -------------------------------------------------------------
## Summary of Estimated Model
## -------------------------------------------------------------
## Functional parameters
## -------------------------------------------------------------
## Parameter Estimate Std_Deviation 95% CI Lower 95% CI Upper
## -------------------------------------------------------------
## Intercept -0.70924596 0.01136945 -0.73152967 -0.68696225
## Trend 0.00007198 0.00001083 0.00005074 0.00009321
## Sin (Annual) 0.00135377 0.00043414 0.00050287 0.00220467
## Cos (Annual) -0.00062838 0.00045622 -0.00152254 0.00026579
## Sin (Semi-Annual) 0.00017180 0.00027821 -0.00037348 0.00071707
## Cos (Semi-Annual) 0.00030877 0.00028861 -0.00025689 0.00087443
## Jump: MJD 56248 0.01588617 0.00191916 0.01212469 0.01964764
## Jump: MJD 55391 0.00780478 0.00182956 0.00421891 0.01139065
## Jump: MJD 55563 0.00230763 0.00190189 -0.00142001 0.00603527
## Jump: MJD 55603 0.00286880 0.00235741 -0.00175163 0.00748923
## Jump: MJD 55606 -0.00044386 0.00358112 -0.00746272 0.00657500
## Jump: MJD 56011 -0.00008945 0.00192695 -0.00386620 0.00368730
## Jump: MJD 56085 0.00073436 0.00199984 -0.00318524 0.00465397
## Jump: MJD 56748 -0.00038578 0.00187908 -0.00406870 0.00329714
## Jump: MJD 57281 -0.00145742 0.00187987 -0.00514189 0.00222704
## Earthquake: MJD 55391 0.01576163 0.00930876 -0.00248320 0.03400647
## Earthquake: MJD 55563 0.00174787 0.02766816 -0.05248072 0.05597647
## Earthquake: MJD 55603 -0.28371260 0.52755951 -1.31771024 0.75028504
## Earthquake: MJD 55606 0.27719703 0.52174625 -0.74540682 1.29980089
## Earthquake: MJD 56011 -0.00480858 0.01509158 -0.03438754 0.02477039
## Earthquake: MJD 56085 -0.00772227 0.01454627 -0.03623244 0.02078791
## Earthquake: MJD 56748 -0.00910404 0.00725836 -0.02333016 0.00512207
## Earthquake: MJD 57281 -0.02236505 0.00811536 -0.03827086 -0.00645925
## -------------------------------------------------------------
## Stochastic parameters
## -------------------------------------------------------------
## White Noise Variance : 0.00000193
## Flicker Noise Variance: 0.00000251
## -------------------------------------------------------------
## Missingness parameters
## -------------------------------------------------------------
## P(Z_{i+1} = 0 | Z_{i} = 1): 0.00279460
## P(Z_{i+1} = 1 | Z_{i} = 0): 0.31578947
## \hat{E[Z]}: 0.99122807
## -------------------------------------------------------------
## Running time: 0.59 seconds
## -------------------------------------------------------------