Title: | Linear Quantile Mixture Models |
---|---|
Description: | Estimate linear quantile mixtures based on Time-Constant (TC) and/or Time-Varying (TV), discrete, random coefficients. |
Authors: | Maria Francesca Marino [aut, cre], Marco Alfo' [aut], Nicola Salvati [aut], Maria Giovanna Ranalli [aut] |
Maintainer: | Maria Francesca Marino <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0 |
Built: | 2025-01-31 03:22:59 UTC |
Source: | https://github.com/cran/lqmix |
lqmix
The lqmix
package allows for the estimation of finite mixtures of linear quantile regression models based on Time-Constant (TC) and/or Time-Varying (TV), discrete, random coefficients for the analysis of longitudinal data
lqmix
is an R
package devoted to the estimation of a class of linear quantile regression models for longitudinal data, in the presence of Time-Constant (TC) and/or Time-Varying (TV), unit-specific, random coefficients, having unspecific distribution.
The parameters of this distribution, together with all the others characterizing the model, are estimated in a maximum likelihood framework, via an extended Expectation-Maximization algorithm. This approach leads to the estimation of discrete distributions for the random coefficients, which give rise to a likelihood function similar to that of standard finite mixture models (in the case of TC random coefficients only), hidden Markov models (in the case of TV random coefficients only), or mixed hidden Markov models with discrete effects (in the case of both TC and TV random coefficients).
Parameters' standard errors are estimated via a block-bootstrap procedure, while model selection is performed by either maximizing the log-likelihood function, or minimizing the Akaike Information Criterion or the Bayesian Information Criterion.
Missing data are allowed and treated under a Missing at Random assumption.
Maria Francesca Marino [aut,cre], Marco Alfo' [aut], Nicola Salvati [aut], and Maria Giovanna Ranalli [aut]
Maintainer: Maria Francesca Marino <[email protected]>
Alfo' M, Salvati N, Ranalli MG (2017). “Finite Mixtures of Quantiles and M-quantile models.” Statistics and Computing, 27, 547-570.
Aitkin M (1996). “A general maximum likelihood analysis of overdispersion in generalized linear models.” Statistics and Computing, 6, 251-262.
Aitkin M (1999). “A general maximum likelihood analysis of variance components in generalized linear models.” Biometrics, 55, 117–128.
Farcomeni A (2012). “Quantile regression for longitudinal data based on latent Markov subject-specific parameters.” Statistics and Computing, 22.
Bartolucci F, Farcomeni A, Pennoni F (2012). Latent Markov models for longitudinal data. Taylor & Francis.
Zucchini W, MacDonald IL, Langrock R (2017). Hidden Markov models for time series, Monographs on Statistics and Applied Probability. Chapman and Hall/CRC.
Marino MF, Tzavidis N, Alfo' M (2018). “Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences.” Statistical Methods in Medical Research, 27, 2231-2246.
Altman RJ (2007). “Mixed hidden Markov models: an extension of the hidden Markov model to the longitudinal data setting.” Journal of the American Statistical Association, 102, 201–210.
Maruotti A (2011). “Mixed Hidden Markov Models for Longitudinal Data: An Overview.” International Statistical Review, 79. ISSN 1751-5823.
The cd4 data frame is made by a total of 2376 rows and 8 columns providing information on CD4 cell counts of 369 subjects followed for a maximum of 12 measurement occasions.
data(cd4)
data(cd4)
A data frame with 2376 observations on the following 8 variables:
sbj.id
subject id
time.id
time id
count
CD4 count
lcount
log(CD4 count + 1)
time
years since seroconversion
age
age (yrs) centered around 30
packs
packs of cigarettes per day
partners
number of sexual partners
drugs
recreational drug use indicator
cesd
depression score
Multi-center AIDS Cohort Study providing a total of 2376 CD4+ cell counts of 369 HIV-infected men covering a period of approximately eight and half years. The number of measurements for each individual varies from 1 to 12. The CD4+ cell data are highly unbalanced.
Zeger, Scott L., and Peter J. Diggle. "Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters." Biometrics (1994): 689-699.
lqmix
objectPrint the estimated fixed coefficients of a fitted model of class
lqmix
## S3 method for class 'lqmix' coef(object, ...)
## S3 method for class 'lqmix' coef(object, ...)
object |
an |
... |
not used |
Return the estimated fixed coefficients obtained at convergence of the EM algorithm for a fitted model of class
lqmix
lqr
objectPrint the estimated fixed coefficients of a fitted model of class
lqr
## S3 method for class 'lqr' coef(object, ...)
## S3 method for class 'lqr' coef(object, ...)
object |
an |
... |
not used |
Return the estimated coefficients obtained at convergence of the EM algorithm for a fitted model of class
lqr
search_lqmix
objectPrint the estimated fixed coefficients of the optimal fitted model stored in an object of class
search_lqmix
## S3 method for class 'search_lqmix' coef(object, ...)
## S3 method for class 'search_lqmix' coef(object, ...)
object |
an |
... |
not used |
Return the estimated fixed coefficients obtained at convergence of the EM algorithm for the optimal model stored in an object of class
search_lqmix
Compute the density for the three parameter Asymmetric Laplace Distribution
dal(y, mu = 0, sigma = 1, qtl = 0.5, log = FALSE)
dal(y, mu = 0, sigma = 1, qtl = 0.5, log = FALSE)
y |
vector of quantiles |
mu |
location parameter |
sigma |
scale parameter |
qtl |
skewness parameter |
log |
logical; if TRUE, probabilities are log-transformed |
The function computes the density of the Asymmetric Laplace distribution, with location , scale
and skewness
qtl = q
in (0,1),
as discussed by Koenker and Machado (1999) and Yu and Moyeed (2001), according to the following expression
Return the density for the asymmetric Laplace distribution
Koenker R, Machado JAF (1999). “Goodness of fit and related inference processes for quantile regression.” Journal of the american statistical association, 94, 1296–1310.
Yu K, Moyeed RA (2001). “Bayesian quantile regression.” Statistics & Probability Letters, 54, 437–447.
lqmix
objectPrint the log-likelihood of a fitted model of class
lqmix
## S3 method for class 'lqmix' logLik(object, ...)
## S3 method for class 'lqmix' logLik(object, ...)
object |
an |
... |
not used |
Return an object of class
logLik
providing the log-likelihood value at convergence of the EM algorithm for a fitted model of class
lqmix
lqr
objectPrint the log-likelihood of a fitted model of class
lqr
## S3 method for class 'lqr' logLik(object, ...)
## S3 method for class 'lqr' logLik(object, ...)
object |
an |
... |
not used |
Return an object of class
logLik
providing the log-likelihood for a fitted model of class
lqr
search_lqmix
objectPrint the log-likelihood of an optimal fitted model stored in an object of class
search_lqmix
## S3 method for class 'search_lqmix' logLik(object, ...)
## S3 method for class 'search_lqmix' logLik(object, ...)
object |
an |
... |
not used |
Return an object of class
logLik
providing the log-likelihood value at convergence of the EM algorithm for a fitted model of class
lqmix
Estimate a finite mixture of linear quantile regression models with TC and/or TV, discrete, random coefficients, for a given number of components and/or states
lqmix(formula, randomTC = NULL, randomTV = NULL, group, time, G = NULL, m = NULL, data, qtl = 0.5, eps = 10^-5, maxit = 1000, se = TRUE, R = 50, start = 0, parInit = list(betaf = NULL, betarTC = NULL, betarTV = NULL, pg = NULL, delta = NULL, Gamma = NULL, scale = NULL), verbose = TRUE, seed = NULL, parallel = FALSE)
lqmix(formula, randomTC = NULL, randomTV = NULL, group, time, G = NULL, m = NULL, data, qtl = 0.5, eps = 10^-5, maxit = 1000, se = TRUE, R = 50, start = 0, parInit = list(betaf = NULL, betarTC = NULL, betarTV = NULL, pg = NULL, delta = NULL, Gamma = NULL, scale = NULL), verbose = TRUE, seed = NULL, parallel = FALSE)
formula |
an object of class |
randomTC |
a one-sided formula of the form |
randomTV |
a one-sided formula of the form |
group |
a string indicating the grouping variable, i.e., the factor identifying the unit longitudinal measurements refer to |
time |
a string indicating the time variable |
G |
number of mixture components associated to TC random coefficients |
m |
number of states associated to the TV random coefficients |
data |
a data frame containing the variables named in |
qtl |
quantile to be estimated |
eps |
tolerance level for (relative) convergence of the EM algorithm |
maxit |
maximum number of iterations for the EM algorithm |
se |
standard error computation for the optimal model |
R |
number of bootstrap samples for computing standard errors |
start |
type of starting values (0 = deterministic, 1 = random, 2 = initial values in input) |
parInit |
list of initial model parameters when |
verbose |
if set to FALSE, no printed output is given during the function execution |
seed |
an integer value for random numbers generation |
parallel |
if set to TRUE, a parallelized code is use for standard error computation (if se=TRUE) |
The function computes ML estimates for the parameters of a linear quantile mixture model, based on TC and/or TV random coefficients. Estimates are derived by maximizing the (log-)likelihood of a Laplace regression where the location parameter is modeled as a function of fixed coefficients, together with TC and/or TV discrete random coefficients, as proposed by Alfo' et. al (2017), Farcomeni (2012), and Marino et. al (2018), respectively.
The function requires data in long-format and two additional columns indicating the group identifier and the time occasion.
The model is specified by means of the arguments formula
, formulaTC
, and formulaTV
:
formula
is associated to fixed coefficients; formulaTC
is associated to TC random coefficients; formulaTV
is associated to TV random coefficients.
In this latter, only TC variables (predictors) are allowed.
The function admits the presence of missing data, both in terms of drop-outs (monotone missing data) and intermittent missing, under a missing-at-random assumption. Note that, when TV random coefficients are considered, intermittent missingness may cause biased inference.
If se=TRUE
, standard errors based on a block bootstrap procedure are computed.
Return an object of class
lqmix
. This is a list containing the following elements:
betaf |
a vector containing fixed regression coefficients |
betarTC |
a matrix containing the TC random coefficients, if present in the model |
betarTV |
a matrix containing the TV random coefficients, if present in the model |
pg |
the prior probabilities of the finite mixture associated to TC random coefficients, if present in the model |
delta |
the initial probability vector of the hidden Markov chain associated to TV random coefficients, if present in the model |
Gamma |
the transition probability matrix of the hidden Markov chain associated to TV random coefficients, if present in the model |
scale |
the scale parameter |
sigma.e |
the standard deviation of error terms |
lk |
the log-likelihood at convergence of the EM algorithm |
npar |
the total number of model parameters |
aic |
the AIC value |
bic |
the BIC value |
qtl |
the estimated quantile |
G |
the number of mixture components associated to TC random coefficients (if present) |
m |
the number of hidden states associated to TV random coefficients (if present) |
nsbjs |
the number of subjects |
nobs |
the total number of observations |
se.betaf |
the standard errors for fixed regression coefficients |
se.betarTC |
the standard errors for TC random coefficients (if present) |
se.betarTV |
the standard errors for TV random coefficients (if present) |
se.Mprob |
the standard errors for the prior probabilities of the finite mixture associated to TC random coefficients (if present) |
se.Init |
the standard errors for the initial probabilities of the hidden Markov chain associated to TV random coefficients(if present) |
se.Trans |
the standard errors for the transition probabilities of the hidden Markov chain associated to TV random coefficients (if present) |
se.scale |
the standard error for the scale parameter |
miss |
the missingness type |
model |
the estimated model |
call |
the matched call |
Marino MF, Tzavidis N, Alfo' M (2018). “Mixed hidden Markov quantile regression models for longitudinal data with possibly incomplete sequences.” Statistical Methods in Medical Research, 27, 2231-2246.
Altman RJ (2007). “Mixed hidden Markov models: an extension of the hidden Markov model to the longitudinal data setting.” Journal of the American Statistical Association, 102, 201–210.
Maruotti A (2011). “Mixed Hidden Markov Models for Longitudinal Data: An Overview.” International Statistical Review, 79. ISSN 1751-5823.
outTC = lqmix(formula=meas~trt+time+trt:time,randomTC=~1, group="id",time="time",G=2,data=pain,se=TRUE,R=10) outTV = lqmix(formula=meas~trt+time+trt:time,randomTV=~1, group="id",time="time",m=2,data=pain,R=10) outTCTV = lqmix(formula=meas~trt+time+trt:time,randomTC=~time, randomTV=~1,group="id",time="time",m=2,G=2,data=pain,R=10)
outTC = lqmix(formula=meas~trt+time+trt:time,randomTC=~1, group="id",time="time",G=2,data=pain,se=TRUE,R=10) outTV = lqmix(formula=meas~trt+time+trt:time,randomTV=~1, group="id",time="time",m=2,data=pain,R=10) outTCTV = lqmix(formula=meas~trt+time+trt:time,randomTC=~time, randomTV=~1,group="id",time="time",m=2,G=2,data=pain,R=10)
Estimate a linear quantile regression model with no random coefficients
lqr(formula, data, qtl = 0.5, se = TRUE, R = 50, verbose = TRUE, ...)
lqr(formula, data, qtl = 0.5, se = TRUE, R = 50, verbose = TRUE, ...)
formula |
an object of class |
data |
a data frame containing the variables named in |
qtl |
quantile to be estimated |
se |
standard error computation |
R |
number of bootstrap samples for computing standard errors |
verbose |
if set to FALSE, no printed output is given during the function execution |
... |
further arguments to be passed to or from methods |
The function computes ML estimates for the parameters of a linear quantile regression model for independent observations. Estimates are derived by maximizing the (log-)likelihood of a Laplace regression, where the location parameter is modeled as a function of fixed coefficients only.
If se=TRUE
, standard errors based on a bootstrap procedure are computed.
Return an object of class
lqr
. This is a list containing the following elements:
betaf |
a vector containing fixed regression coefficients |
scale |
the scale parameter |
sigma.e |
the standard deviation of error terms |
lk |
the log-likelihood |
npar |
the total number of model parameters |
AIC |
the AIC value |
BIC |
the BIC value |
qtl |
the estimated quantile |
nobs |
the total number of observations |
se.betaf |
the standard errors for fixed regression coefficients |
se.scale |
the standard error for the scale parameter |
model |
the estimated model |
call |
the matched call |
Geraci M, Bottai M (2007). “Quantile regression for longitudinal data using the asymmetric Laplace distribution.” Biostatistics, 8, 140-54.
out0 = lqr(formula=meas~trt+time+trt:time,data=pain,se=TRUE,R=10)
out0 = lqr(formula=meas~trt+time+trt:time,data=pain,se=TRUE,R=10)
The pain data frame is made by a total of 357 rows and 4 columns providing information on pain of 83 women in labor followed for a maximum of 6 measurement occasions
data(pain)
data(pain)
A data frame with 357 observations on the following 5 variables:
id
woman id
meas
a numeric vector of self-reported pain scores on a 100mm line
trt
a dummy variable with values 1 for subjects who received a pain medication and 0 for subjects who received a placebo
time
a numeric vector of times (minutes since randomization) at which pain was measured
The data set consists of repeated measurements of self-reported pain on n = 83 women. 43 women were randomly assigned to a pain medication group and 40 to a placebo group. The response was measured every 30 minutes on a 100-mm line: 0 means no pain and 100 means extreme pain. The number of measurements for each woman varies from 1 to 6. Data are severely skewed, and the skewness changes magnitude, and even sign, over time.
Davis, Charles S. "Semi-parametric and non-parametric methods for the analysis of repeated measurements with applications to clinical trials." Statistics in medicine 10.12 (1991): 1959-1980.
Graphically display component and/or transition probabilities of a fitted model of class
lqmix
## S3 method for class 'lqmix' plot(x, ...)
## S3 method for class 'lqmix' plot(x, ...)
x |
an object of class search_lqmix |
... |
not used |
Graphically display model selection criteria and component and/or transition probabilities of the optimal fitted model of class
search_lqmix
## S3 method for class 'search_lqmix' plot(x, ...)
## S3 method for class 'search_lqmix' plot(x, ...)
x |
an object of class search_lqmix |
... |
not used |
lqmix
objectPrint an object of class
lqmix
## S3 method for class 'lqmix' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'lqmix' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
an |
digits |
a non-null value for digits specifying the minimum number of significant digits to be printed |
... |
not used |
Return an lqmix
object
lqr
objectPrint an object of class
lqr
## S3 method for class 'lqr' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'lqr' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
an |
digits |
a non-null value for digits specifying the minimum number of significant digits to be printed |
... |
not used |
Return an lqr
object
search_lqmix
objectPrint an object of class
search_lqmix
## S3 method for class 'search_lqmix' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'search_lqmix' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
a |
digits |
a non-null value for digits specifying the minimum number of significant digits to be printed |
... |
not used |
Return a search_lqmix
object
lqmix
objectPrint the summary of an object of class
lqmix
## S3 method for class 'summary.lqmix' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'summary.lqmix' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
a summary of an |
digits |
a non-null value for digits specifying the minimum number of significant digits to be printed |
... |
not used |
Return a summary of an lqmix
object
lqr
objectPrint the summary of an an object of class
lqr
## S3 method for class 'summary.lqr' print(x, digits = max(3, getOption("digits") - 3), ...)
## S3 method for class 'summary.lqr' print(x, digits = max(3, getOption("digits") - 3), ...)
x |
a summary of an |
digits |
a non-null value for digits specifying the minimum number of significant digits to be printed |
... |
not used |
Return a summary of an lqr
object
Search the global maximum of the log-likelihood function for a finite mixture of linear quantile regression models with TC and/or TV, discrete, random coefficients, for varying number of components and/or states
search_lqmix(formula, randomTC = NULL, randomTV = NULL, group, time, Gv = NULL, mv = NULL, data, method = "bic", nran = 0, qtl = 0.5, eps = 10^-5, maxit = 1000, se = TRUE, R = 50, verbose = TRUE, seed = NULL, parallel = FALSE)
search_lqmix(formula, randomTC = NULL, randomTV = NULL, group, time, Gv = NULL, mv = NULL, data, method = "bic", nran = 0, qtl = 0.5, eps = 10^-5, maxit = 1000, se = TRUE, R = 50, verbose = TRUE, seed = NULL, parallel = FALSE)
formula |
an object of |
randomTC |
a one-sided formula of the form |
randomTV |
a one-sided formula of the form |
group |
a string indicating the grouping variable, i.e., the factor identifying the unit longitudinal measurements refer to |
time |
a string indicating the time variable |
Gv |
vector of possible number of mixture components associated to TC random coefficients (if present) |
mv |
vector of possible number of states associated to the TV random coefficients (if present) |
data |
a data frame containing the variables named in |
method |
method to use for selecting the optimal model. Possible values are |
nran |
number of repetitions of each random initialization |
qtl |
quantile to be estimated |
eps |
tolerance level for (relative) convergence of the EM algorithm |
maxit |
maximum number of iterations for the EM algorithm |
se |
standard error computation for the optimal model |
R |
number of bootstrap samples for computing standard errors |
verbose |
if set to FALSE, no printed output is given during the function execution |
seed |
an integer value for random numbers generation |
parallel |
if set to TRUE, a parallelized code is use for standard error computation (if se=TRUE) |
The function allows to identify the optimal model specification in terms of number of mixture components and/or hidden states associated to TC and/or TV random coefficients, respectively. This is done by considering a multi-start strategy based on both deterministic and random starting points. The number or random tries is proportional to the number of mixture components and/or hidden states associated to the random coefficients in the model.
If method="lk"
, the optimal model selected by the function is that providing the highest log-likelihood value;
if method="AIC"
, (method="BIC"
, respectively), the optimal model selected by the function is that providing the lowest AIC (BIC, respectively) value.
If se=TRUE
, standard errors based on a block bootstrap procedure are computed for the identified optimal model.
Return an object of class
search_lqmix
. This is a list containing the following elements:
optimal |
the identified optimal model |
allmodels |
the output of each estimated model |
lkv |
the vector of likelihood values for each estimated model |
aicv |
the vector of AIC values for each estimated model |
bicv |
the vector of BIC values for each estimated model |
qtl |
the estimated quantile |
mv |
the vector of possible number of states associated to TV random coefficients (if present) |
Gv |
the vector of possible number of mixture components associated to TC random coefficients (if present) |
method |
the method used to select the optimal model |
call |
the matched call |
sTC = search_lqmix(formula=meas~trt+time+trt:time, randomTC=~1,group="id",time="time",Gv=1:3,method="bic",data=pain,se=FALSE) sTV = search_lqmix(formula=meas~trt+time+trt:time, randomTV=~1,group="id",time="time",mv=1:3,method="bic",data=pain,se=FALSE) sTCTV = search_lqmix(formula=meas~trt+time+trt:time, randomTC=~time,randomTV=~1,group="id",time="time",mv=1:3,Gv=1:3,method="bic",data=pain,se=FALSE)
sTC = search_lqmix(formula=meas~trt+time+trt:time, randomTC=~1,group="id",time="time",Gv=1:3,method="bic",data=pain,se=FALSE) sTV = search_lqmix(formula=meas~trt+time+trt:time, randomTV=~1,group="id",time="time",mv=1:3,method="bic",data=pain,se=FALSE) sTCTV = search_lqmix(formula=meas~trt+time+trt:time, randomTC=~time,randomTV=~1,group="id",time="time",mv=1:3,Gv=1:3,method="bic",data=pain,se=FALSE)
lqmix
ObjectSummary method for the class
lqmix
## S3 method for class 'lqmix' summary(object, ...)
## S3 method for class 'lqmix' summary(object, ...)
object |
an |
... |
not used |
Return an object of class
summary.lqmix
.
This is a list of summary statistics for the fitted linear quantile mixture model given in object
, with the following elements:
fix |
a matrix with estimates, standard errors, Z statistics, and p-values for the fixed regression coefficients |
ranTC |
a matrix with estimates, standard errors, Z statistics, and p-values for the TC random coefficients (if present) |
ranTV |
a matrix with estimates, standard errors, Z statistics, and p-values for the TV random coefficients (if present) |
pg |
a matrix with estimates and standard errors for the prior probabilities of the finite mixture associated to TC random coefficients (if present) |
delta |
a matrix with estimates and standard errors for the initial probabilities of the hidden Markov chain associated to TV random coefficients (if present) |
Gamma |
a matrix with estimates and standard errors for the transition probabilities of the hidden Markov chain associated to TV random coefficients (if present) |
scale |
the scale parameter |
sigma.e |
the standard deviation of error terms |
logLik |
the log-likelihood at convergence of the EM algorithm |
npar |
the total number of model parameters |
AIC |
the AIC value |
BIC |
the BIC value |
qtl |
the estimated quantile |
G |
the number of mixture components associated to TC random coefficients (if present) |
m |
the number of hidden states associated to TV random coefficients (if present) |
nsbj |
the number of subjects |
nobs |
the total number of observations |
miss |
the missingness type |
model |
the estimated model |
call |
the matched call |
lqr
objectSummary method for the class
lqr
## S3 method for class 'lqr' summary(object, ...)
## S3 method for class 'lqr' summary(object, ...)
object |
an |
... |
not used |
Return an object of class
summary.lqr
.
This is a list of summary statistics for the fitted linear quantile regression model given in object
, with the following elements:
fix |
a matrix with estimates, standard errors, Z statistics, and p-values for the regression coefficients |
scale |
the scale parameter |
sigma.e |
the standard deviation of error terms |
lk |
the log-likelihood |
npar |
the total number of model parameters |
aic |
the AIC value |
bic |
the BIC value |
qtl |
the estimated quantile |
nobs |
the total number of observations |
model |
the estimated model |
call |
the matched call |
Compute the variance for the asymmetric Laplace distribution
varAL(sigma, qtl)
varAL(sigma, qtl)
sigma |
scale parameter |
qtl |
skewness parameter |
Return the variance of Asymmetric Laplace random variables for given scale (sigma
) and skewness (qtl
) parameters