Title: | Estimation and Simulation of Drifting Semi-Markov Models |
---|---|
Description: | Performs parametric and non-parametric estimation and simulation of drifting semi-Markov processes. The definition of parametric and non-parametric model specifications is also possible. Furthermore, three different types of drifting semi-Markov models are considered. These models differ in the number of transition matrices and sojourn time distributions used for the computation of a number of semi-Markov kernels, which in turn characterize the drifting semi-Markov kernel. For the parametric model estimation and specification, several discrete distributions are considered for the sojourn times: Uniform, Poisson, Geometric, Discrete Weibull and Negative Binomial. The non-parametric model specification makes no assumptions about the shape of the sojourn time distributions. Semi-Markov models are described in: Barbu, V.S., Limnios, N. (2008) <doi:10.1007/978-0-387-73173-5>. Drifting Markov models are described in: Vergne, N. (2008) <doi:10.2202/1544-6115.1326>. Reliability indicators of Drifting Markov models are described in: Barbu, V. S., Vergne, N. (2019) <doi:10.1007/s11009-018-9682-8>. We acknowledge the DATALAB Project <https://lmrs-num.math.cnrs.fr/projet-datalab.html> (financed by the European Union with the European Regional Development fund (ERDF) and by the Normandy Region) and the HSMM-INCA Project (financed by the French Agence Nationale de la Recherche (ANR) under grant ANR-21-CE40-0005). |
Authors: | Vlad Stefan Barbu [aut] , Ioannis Mavrogiannis [aut, cre] , Nicolas Vergne [aut] |
Maintainer: | Ioannis Mavrogiannis <[email protected]> |
License: | GPL |
Version: | 1.0.6 |
Built: | 2024-11-05 04:57:50 UTC |
Source: | https://github.com/mavrogiannis-ioannis/dsmmr |
Performs parametric and non-parametric estimation and simulation of drifting semi-Markov processes. The definition of parametric and non-parametric model specifications is also possible. Furthermore, three different types of drifting semi-Markov models are considered. These models differ in the number of transition matrices and sojourn time distributions used for the computation of a number of semi-Markov kernels, which in turn characterize the drifting semi-Markov kernel.
Introduction
The difference between the Markov models and the semi-Markov models
concerns the modelling of the sojourn time distributions.
The Markov models (in discrete time) are modelled by a sojourn time
following the Geometric distribution. The semi-Markov models
are able to have a sojourn time distribution of arbitrary shape.
The further difference with a drifting semi-Markov model,
is that we have (arbitrary) sojourn time distributions
and
transition matrices (Model 1),
where
is defined as the polynomial degree.
Through them, we compute
semi-Markov kernels.
In this work, we also consider the possibility for obtaining these
semi-Markov kernels with
transition matrices and
sojourn time distribution (Model 2) or
sojourn time
distributions and
transition matrix (Model 3).
Definition
Drifting semi-Markov processes are particular non-homogeneous semi-Markov
chains for which the drifting semi-Markov kernel
is defined as
the probability that, given at the instance
the previous state is
, the next state state
will be
reached with a sojourn time of
:
where is the model size, defined as the length of the
embedded Markov chain
minus the
last state, where
is the state at the instant
and
is the sojourn time of the state
.
The drifting semi-Markov kernel
is a linear combination of the product of
semi-Markov kernels
, where every semi-Markov kernel is the product of
a transition matrix
and a sojourn time distribution
. We define the situation when both
and
are "drifting" between
fixed points of the model
as Model 1, and thus we will use the exponential
as a way to
refer to the drifting semi-Markov kernel
and corresponding
semi-Markov kernels
in this case.
For Model 2, we allow the transition matrix
to drift
but not the sojourn time distributions
, and for Model 3 we allow
the sojourn time distributions
to drift but not the transition
matrix
.
The exponential
or
will be used for signifying
Model 2 or Model 3, respectively.
In the general case an exponential will not be used.
Model 1
Both and
are drifting in this case.
Thus, the drifting semi-Markov kernel
is a
linear combination of the product of
semi-Markov kernels
, which are given by:
where for we have
Markov transition matrices
of the embedded Markov chain
,
and
sojourn time distributions
. Therefore, the drifting semi-Markov kernel
is described as:
where are
polynomials with degree
, which satisfy the conditions:
where the indicator function ,
if
,
otherwise.
Model 2
In this case, is drifting and
is not drifting.
Therefore, the drifting semi-Markov kernel is now described as:
Model 3
In this case, is drifting and
is not drifting.
Therefore, the drifting semi-Markov Kernel is now described as:
Parametric and non-parametric model specifications
In this package, we can define parametric and non-parametric drifting semi-Markov models.
For the parametric case, several discrete distributions are
considered for the modelling of the sojourn times:
Uniform, Geometric, Poisson, Discrete Weibull and Negative Binomial.
This is done from the function
parametric_dsmm
which returns an object of the
S3 class (dsmm_parametric
, dsmm
).
The non-parametric model specification concerns the sojourn
time distributions when no assumptions are done about the
shape of the distributions. This is done through the function called
nonparametric_dsmm()
, that returns an object of class
(dsmm_nonparametric
, dsmm
).
It is also possible to proceed with a parametric or non-parametric
estimation for a model on an existing sequence through the function
fit_dsmm()
, which returns an object with the S3 class
(dsmm_fit_parametric
, dsmm
) or
(dsmm_fit_nonparametric
, dsmm
) respectively, depending
on the given argument estimation = "parametric"
or
estimation = "nonparametric"
.
Therefore, the dsmm
class acts like a wrapper class
for drifting semi-Markov model specifications, while the classes
dsmm_fit_parametric
, dsmm_fit_nonparametric
,
dsmm_parametric
and dsmm_nonparametric
are exclusive to the functions that create the corresponding models,
and inherit methods from the dsmm
class.
In summary, based on an dsmm
object
it is possible to use the following methods:
Simulate a sequence through the function simulate.dsmm()
.
Get the drifting semi-Markov kernel
, for any choice of
or
,
through the function
get_kernel()
.
Restrictions
The following restrictions must be satisfied for every drifting semi-Markov model:
The drifting semi-Markov kernel ,
for every
and
, has its sums
over
and
, equal to
:
Therefore, we also get that for every and
, the semi-Markov kernel
has its sums over
and
equal to
:
Lastly, like in semi-Markov models, we do not allow sojourn times
equal to or passing into the same state:
Model specification restrictions
When we define a drifting semi-Markov model specification through the
functions parametric_dsmm
or nonparametric_dsmm
,
the following restrictions need to be satisfied.
Model 1
The semi-Markov kernels are equal to
. Therefore,
the sums
of
over
and the sums of
over
must be
equal to 1:
Model 2
The semi-Markov kernels are equal to . Therefore,
the sums of
over
and
the sums of
over
must be equal to 1:
Model 3
The semi-Markov kernels are equal to . Therefore,
the sums
of
over
and the sums of
over
must be
equal to 1:
For third parties wishing to contribute to the software, or to report issues or problems about the software, they can do so directly through the development github page of the package.
Automated tests are in place in order to aid the user with any false input made
and, furthermore, to ensure that the functions used return the expected output.
Moreover, through strict automated tests, it is made possible for the user to
properly define their own dsmm
objects and make use of them with the generic
functions of the package.
Maintainer: Ioannis Mavrogiannis [email protected]
Authors:
Vlad Stefan Barbu
Ioannis Mavrogiannis
Nicolas Vergne
Barbu, V. S., Limnios, N. (2008). Semi-Markov Chains and Hidden Semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modelling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.
T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.
Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., & Petersen, G. B.
(1982). Nucleotide sequence of bacteriophage DNA.
Journal of molecular biology, 162(4), 729-773.
For the estimation of a drifting semi-Markov model given a sequence: fit_dsmm.
For drifting semi-Markov model specifications: parametric_dsmm, nonparametric_dsmm.
For the simulation of sequences: simulate.dsmm, create_sequence.
For the retrieval of the drifting semi-Markov kernel through a
dsmm
object: get_kernel.
This is a wrapper function around sample()
.
create_sequence(states, len = 5000, probs = NULL, seed = NULL)
create_sequence(states, len = 5000, probs = NULL, seed = NULL)
states |
Character vector of unique values. If given the value "DNA" the values c("a", "c", "g", "t") are given instead. |
len |
Optional. Positive integer with the default value equal to 5000. |
probs |
Optional. Numeric vector with values interpreted as
probabilities for each of the states in |
seed |
Optional. Object specifying the initialization of the random
number generator (see more in |
A character sequence of length len
.
For the simulation of a sequence with a drifting semi-Markov kernel: simulate.dsmm.
The original function: sample
.
About random number generation in R: RNG
.
For the theoretical background of drifting semi-Markov models: dsmmR.
# This is equal to having the argument `probs = c(1/4, 1/4, 1/4, 1/4)`. rand_dna_seq <- create_sequence(states = "DNA") table(rand_dna_seq) random_letters <- sample(letters, size = 5, replace = FALSE) rand_dna_seq2 <- create_sequence( states = random_letters, probs = c(0.6, 0.3, 0.05, 0.025, 0.025), len = 10000) table(rand_dna_seq2)
# This is equal to having the argument `probs = c(1/4, 1/4, 1/4, 1/4)`. rand_dna_seq <- create_sequence(states = "DNA") table(rand_dna_seq) random_letters <- sample(letters, size = 5, replace = FALSE) rand_dna_seq2 <- create_sequence( states = random_letters, probs = c(0.6, 0.3, 0.05, 0.025, 0.025), len = 10000) table(rand_dna_seq2)
Estimation of a drifting semi-Markov chain, given one sequence of states. This estimation can be parametric or non-parametric and is available for the three types of drifting semi-Markov models.
fit_dsmm( sequence, degree, f_is_drifting, p_is_drifting, states = NULL, initial_dist = "unif", estimation = "nonparametric", f_dist = NULL )
fit_dsmm( sequence, degree, f_is_drifting, p_is_drifting, states = NULL, initial_dist = "unif", estimation = "nonparametric", f_dist = NULL )
sequence |
Character vector that represents a sequence of states
from the state space |
degree |
Positive integer that represents the polynomial degree
|
f_is_drifting |
Logical. Specifies if |
p_is_drifting |
Logical. Specifies if |
states |
Character vector that represents the state space
|
initial_dist |
Optional. Character that represents the method to estimate the initial distribution.
|
estimation |
Optional. Character. Represents whether the estimation will be nonparametric or parametric.
|
f_dist |
Optional. It can be defined in two ways:
|
This function estimates a drifting semi-Markov model in the parametric and non-parametric case. The parametric estimation can be achieved by the following steps:
We obtain the non-parametric estimation of the sojourn time distributions.
We estimate the parameters for the distributions defined in
the attribute f_dist
through the probabilities
that were obtained in step 1.
Three different models are possible for to be estimated for each case.
A normalization technique is used in order to correct estimation errors
from small sequences.
We will use the exponentials to distinguish between
the drifting semi-Markov kernel
and the
semi-Markov kernels
used in
Model 1, Model 2, Model 3.
More about the theory of drifting semi-Markov models in dsmmR.
Non-parametric Estimation
Model 1
When the transition matrix of the embedded Markov chain
and
the conditional sojourn time distribution
are both drifting,
the drifting semi-Markov kernel can be estimated as:
, where
is the maximum
sojourn time that was observed in the sequence and
are
polynomials with degree
(see dsmmR).
The semi-Markov kernels
,
are estimated through Least Squares Estimation (LSE) and are obtained
after solving the following system,
,
and
:
where the matrices are written as:
and we use the following indicator functions:
,
if at
the previous state is
,
otherwise.
,
if at
the previous state is
with sojourn time
and next state
,
otherwise.
In order to obtain the estimations of
and
, we use the following formulas:
Model 2
In this case, is drifting and
is not drifting. Therefore,
the estimated drifting semi-Markov kernel will be given by:
, where
is the maximum
sojourn time that was observed in the sequence and
are
polynomials with degree
(see dsmmR).
In order to obtain the estimators
and
,
we use the estimated semi-Markov kernels
from Model 1.
Since
is drifting, we define the estimation
the same way as we did in Model 1.
In total, we have the following estimations,
:
Thus, the estimated semi-Markov kernels for Model 2,
, can be written with
regards to the estimated semi-Markov kernels of Model 1,
, as in the following:
Model 3
In this case, is drifting and
is not drifting. Therefore,
the estimated drifting semi-Markov kernel will be given by:
, where
is the maximum
sojourn time that was observed in the sequence and
are
polynomials with degree
(see dsmmR).
In order to obtain the estimators
and
,
we use the estimated semi-Markov kernels
estimated semi-Markov kernels
from Model 1. Since
is drifting,
we define the estimation
the same way as we did in
Model 1. In total, we have the following estimations,
:
Thus, the estimated semi-Markov kernels for Model 3,
, can be written with
regards to the estimated semi-Markov kernels of Model 1,
, as in the following:
Parametric Estimation
In this package, the parametric estimation of the sojourn time distributions
defined in the attribute f_dist
is achieved as follows:
We obtain the non-parametric LSE of the sojourn time distributions
.
We estimate the parameters for the distributions defined in
f_dist
through the probabilities of , estimated
in previously in
.
The available distributions for the modelling of the conditional sojourn
times of the drifting semi-Markov model, defined from
the argument f_dist
, have their parameters estimated through the
following formulas:
Geometric :
, where
.
We estimate the probability of success
as such:
Poisson :
, where
. We estimate
as such:
Negative binomial :
, where
.
is the Gamma function,
is the probability of success and
is the parameter describing the
number of successful trials, or the dispersion parameter
(the shape parameter of the gamma mixing distribution).
We estimate them as such:
Discrete Weibull of type 1 :
, where
,
is the first parameter with
and
the second parameter. We estimate them as such:
Note that we require for estimating
.
Uniform :
where
, for
a
positive integer. We use a numerical method to obtain an estimator
for
in this case.
Returns an object of S3 class (dsmm_fit_nonparametric, dsmm)
or (dsmm_fit_parametric, dsmm)
. It has the following attributes:
dist
: List. Contains 2 or 3 arrays.
If estimation = "nonparametric"
we have 2 arrays:
p_drift
or p_notdrift
, corresponding to whether the
defined transition matrix is drifting or not.
f_drift
or f_notdrift
, corresponding to whether the
defined sojourn time distribution is drifting or not.
If estimation = "parametric"
we have 3 arrays:
p_drift
or p_notdrift
, corresponding to whether the
defined transition matrix is drifting or not.
f_drift_parametric
or f_notdrift_parametric
,
corresponding to whether the
defined sojourn time distribution is drifting or not.
f_drift_parameters
or f_notdrift_parameters
,
which are the defined sojourn time distribution parameters,
depending on whether
is drifting or not.
emc
: Character vector that contains the
embedded Markov chain
of the original sequence.
It is this attribute of the object that describes the size of the model
. Last state is also included, for a total length of
,
but it is not used for any calculation.
soj_times
: Numerical vector that contains the sojourn times
spent for each state in emc
before the jump to the next state.
Last state is also included, for a total length of ,
but it is not used for any calculation.
initial_dist
: Numerical vector that contains an estimation
for the initial distribution of the realized states in sequence
.
It always has values between and
.
states
: Character vector. Passing down from the arguments.
It contains the realized states given in the argument sequence
.
s
: Positive integer that contains the length of the character
vector given in the attribute states
, which is equal to .
degree
: Positive integer. Passing down from the arguments.
It contains the polynomial degree
considered for the drifting of the model.
k_max
: Numerical value that contains the maximum sojourn
time, which is the maximum value in soj_times
, excluding the last
state.
model_size
: Positive integer that contains the size of the
drifting semi-Markov model , which is equal to the length of the
embedded Markov chain
,
minus the last state.
It has a value of
length(emc) - 1
, for emc
as defined above.
f_is_drifting
: Logical. Passing down from the arguments.
Specifies if is drifting or not.
p_is_drifting
: Logical. Passing down from the arguments.
Specifies if is drifting or not.
Model
: Character. Possible values:
"Model_1"
: Both and
are drifting.
"Model_2"
: is drifting and
is not drifting.
"Model_3"
: is drifting and
is not drifting.
estimation
: Character. Specifies whether parametric or
nonparametric estimation was used.
A_i
: Numerical Matrix. Represents the polynomials
with degree
that were used for solving
the system
. Used for the methods defined for the
object. Not printed when viewing the object.
J_i
: Numerical Array. Represents the estimated semi-Markov
kernels of the first model
that were obtained after solving the system
.
Not printed when viewing the object.
V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for Drifting Markov models: modelling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.
T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.
For the theoretical background of drifting semi-Markov models: dsmmR.
For sequence simulation: simulate.dsmm and create_sequence.
For drifting semi-Markov model specification: parametric_dsmm, nonparametric_dsmm
For the retrieval of the drifting semi-Markov kernel: get_kernel.
# Create a random sequence sequence <- create_sequence("DNA", len = 2000, seed = 1) ## Alternatively, we could obtain a sequence as follows: ## > data("lambda", package = "dsmmR") ## > sequence <- c(lambda) states <- sort(unique(sequence)) degree <- 3 # =========================================================================== # Nonparametric Estimation. # Fitting a random sequence under distributions of unknown shape. # =========================================================================== # --------------------------------------------------------------------------- # Both p and f are drifting - Model 1. # --------------------------------------------------------------------------- obj_model_1 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = TRUE, p_is_drifting = TRUE, states = states, initial_dist = "freq", estimation = "nonparametric", # default value f_dist = NULL # default value ) cat(paste0("We fitted a sequence with ", obj_model_1$Model, ",\n", "model size: n = ", obj_model_1$model_size, ",\n", "length of state space: s = ", obj_model_1$s, ",\n", "maximum sojourn time: k_max = ", obj_model_1$k_max, " and\n", "polynomial (drifting) Degree: d = ", obj_model_1$degree, ".\n")) # Get the drifting p and f arrays. p_drift <- obj_model_1$dist$p_drift f_drift <- obj_model_1$dist$f_drift cat(paste0("Dimension of p_drift: (s, s, d + 1) = (", paste(dim(p_drift), collapse = ", "), ").\n", "Dimension of f_drift: (s, s, k_max, d + 1) = (", paste(dim(f_drift), collapse = ", "), ").\n")) # We can even check the embedded Markov chain and the sojourn times # directly from the returned object, if we wish to do so. # This is achieved through the `base::rle()` function, used on `sequence`. model_emc <- obj_model_1$emc model_sojourn_times <- obj_model_1$soj_times # --------------------------------------------------------------------------- # Fitting the sequence when p is drifting and f is not drifting - Model 2. # --------------------------------------------------------------------------- obj_model_2 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = FALSE, p_is_drifting = TRUE) cat(paste0("We fitted a sequence with ", obj_model_2$Model, ".\n")) # Get the drifting p and non-drifting f arrays. p_drift_2 <- obj_model_2$dist$p_drift f_notdrift <- obj_model_2$dist$f_notdrift all.equal.numeric(p_drift, p_drift_2) # p is the same as in Model 1. cat(paste0("Dimension of f_notdrift: (s, s, k_max) = (", paste(dim(f_notdrift), collapse = ", "), ").\n")) # --------------------------------------------------------------------------- # Fitting the sequence when f is drifting and p is not drifting - Model 3. # --------------------------------------------------------------------------- obj_model_3 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = TRUE, p_is_drifting = FALSE) cat(paste0("We fitted a sequence with ", obj_model_3$Model, ".\n")) # Get the drifting f and non-drifting p arrays. p_notdrift <- obj_model_3$dist$p_notdrift f_drift_3 <- obj_model_3$dist$f_drift all.equal.numeric(f_drift, f_drift_3) # f is the same as in Model 1. cat(paste0("Dimension of f_notdrift: (s, s) = (", paste(dim(p_notdrift), collapse = ", "), ").\n")) # =========================================================================== # Parametric Estimation # Fitting a random sequence under distributions of known shape. # =========================================================================== ### Comments ### 1. For the parametric estimation it is recommended to use a common set ### of distributions while only the parameters (of the sojourn times) ### are drifting. This results in (generally) higher accuracy. ### 2. This process is similar to that used in `dsmm_parametric()`. s <- length(states) # Getting the distributions for the states. # Rows correspond to previous state `u`. # Columns correspond to next state `v`. f_dist_1 <- matrix(c(NA, "unif", "dweibull", "nbinom", "pois", NA, "pois", "dweibull", "geom", "pois", NA, "geom", "dweibull", 'geom', "pois", NA), nrow = s, ncol = s, byrow = TRUE) f_dist <- array(f_dist_1, dim = c(s, s, degree + 1)) dim(f_dist) # --------------------------------------------------------------------------- # Both p and f are drifting - Model 1. # --------------------------------------------------------------------------- obj_fit_parametric <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = TRUE, p_is_drifting = TRUE, states = states, initial_dist = 'unif', estimation = 'parametric', f_dist = f_dist) cat("The class of `obj_fit_parametric` is : (", paste0(class(obj_fit_parametric), collapse = ', '), ").\n") # Estimated parameters. f_params <- obj_fit_parametric$dist$f_drift_parameters # The drifting sojourn time distribution parameters. f_0 <- f_params[,,,1] f_1.3 <- f_params[,,,2] f_2.3 <- f_params[,,,3] f_1 <- f_params[,,,4] params <- paste0('q = ', round(f_params["c", "t", 1, ], 3), ', beta = ', round(f_params["c", "t", 2, ], 3)) f_names <- c("f_0", paste0("f_", 1:(degree-1), "/", degree), "f_1") all_names <- paste(f_names, ":", params) cat("The drifting of the parameters for passing from \n", "`u` = 'c' to `v` = 't' under a discrete Weibull distribution is:", "\n", all_names[1], "\n", all_names[2], "\n", all_names[3], "\n", all_names[4]) # --------------------------------------------------------------------------- # f is not drifting, only p is drifting - Model 2. # --------------------------------------------------------------------------- obj_fit_parametric_2 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = FALSE, p_is_drifting = TRUE, initial_dist = 'unif', estimation = 'parametric', f_dist = f_dist_1) cat("The class of `obj_fit_parametric_2` is : (", paste0(class(obj_fit_parametric_2), collapse = ', '), ").\n") # Estimated parameters. f_params_2 <- obj_fit_parametric_2$dist$f_notdrift_parameters params_2 <- paste0('q = ', round(f_params_2["c", "t", 1], 3), ', beta = ', round(f_params_2["c", "t", 2], 3)) cat("Not-drifting parameters for passing from ", "`u` = 'c' to `v` = 't' \n under a discrete Weibull distribution are:\n", paste("f :", params_2)) # =========================================================================== # `simulate()` and `get_kernel()` can be used for the two objects, # `dsmm_fit_nonparametric` and `dsmm_fit_parametric`. # =========================================================================== sim_seq_nonparametric <- simulate(obj_model_1, nsim = 10) str(sim_seq_nonparametric) kernel_drift_parametric <- get_kernel(obj_fit_parametric, klim = 10) str(kernel_drift_parametric)
# Create a random sequence sequence <- create_sequence("DNA", len = 2000, seed = 1) ## Alternatively, we could obtain a sequence as follows: ## > data("lambda", package = "dsmmR") ## > sequence <- c(lambda) states <- sort(unique(sequence)) degree <- 3 # =========================================================================== # Nonparametric Estimation. # Fitting a random sequence under distributions of unknown shape. # =========================================================================== # --------------------------------------------------------------------------- # Both p and f are drifting - Model 1. # --------------------------------------------------------------------------- obj_model_1 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = TRUE, p_is_drifting = TRUE, states = states, initial_dist = "freq", estimation = "nonparametric", # default value f_dist = NULL # default value ) cat(paste0("We fitted a sequence with ", obj_model_1$Model, ",\n", "model size: n = ", obj_model_1$model_size, ",\n", "length of state space: s = ", obj_model_1$s, ",\n", "maximum sojourn time: k_max = ", obj_model_1$k_max, " and\n", "polynomial (drifting) Degree: d = ", obj_model_1$degree, ".\n")) # Get the drifting p and f arrays. p_drift <- obj_model_1$dist$p_drift f_drift <- obj_model_1$dist$f_drift cat(paste0("Dimension of p_drift: (s, s, d + 1) = (", paste(dim(p_drift), collapse = ", "), ").\n", "Dimension of f_drift: (s, s, k_max, d + 1) = (", paste(dim(f_drift), collapse = ", "), ").\n")) # We can even check the embedded Markov chain and the sojourn times # directly from the returned object, if we wish to do so. # This is achieved through the `base::rle()` function, used on `sequence`. model_emc <- obj_model_1$emc model_sojourn_times <- obj_model_1$soj_times # --------------------------------------------------------------------------- # Fitting the sequence when p is drifting and f is not drifting - Model 2. # --------------------------------------------------------------------------- obj_model_2 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = FALSE, p_is_drifting = TRUE) cat(paste0("We fitted a sequence with ", obj_model_2$Model, ".\n")) # Get the drifting p and non-drifting f arrays. p_drift_2 <- obj_model_2$dist$p_drift f_notdrift <- obj_model_2$dist$f_notdrift all.equal.numeric(p_drift, p_drift_2) # p is the same as in Model 1. cat(paste0("Dimension of f_notdrift: (s, s, k_max) = (", paste(dim(f_notdrift), collapse = ", "), ").\n")) # --------------------------------------------------------------------------- # Fitting the sequence when f is drifting and p is not drifting - Model 3. # --------------------------------------------------------------------------- obj_model_3 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = TRUE, p_is_drifting = FALSE) cat(paste0("We fitted a sequence with ", obj_model_3$Model, ".\n")) # Get the drifting f and non-drifting p arrays. p_notdrift <- obj_model_3$dist$p_notdrift f_drift_3 <- obj_model_3$dist$f_drift all.equal.numeric(f_drift, f_drift_3) # f is the same as in Model 1. cat(paste0("Dimension of f_notdrift: (s, s) = (", paste(dim(p_notdrift), collapse = ", "), ").\n")) # =========================================================================== # Parametric Estimation # Fitting a random sequence under distributions of known shape. # =========================================================================== ### Comments ### 1. For the parametric estimation it is recommended to use a common set ### of distributions while only the parameters (of the sojourn times) ### are drifting. This results in (generally) higher accuracy. ### 2. This process is similar to that used in `dsmm_parametric()`. s <- length(states) # Getting the distributions for the states. # Rows correspond to previous state `u`. # Columns correspond to next state `v`. f_dist_1 <- matrix(c(NA, "unif", "dweibull", "nbinom", "pois", NA, "pois", "dweibull", "geom", "pois", NA, "geom", "dweibull", 'geom', "pois", NA), nrow = s, ncol = s, byrow = TRUE) f_dist <- array(f_dist_1, dim = c(s, s, degree + 1)) dim(f_dist) # --------------------------------------------------------------------------- # Both p and f are drifting - Model 1. # --------------------------------------------------------------------------- obj_fit_parametric <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = TRUE, p_is_drifting = TRUE, states = states, initial_dist = 'unif', estimation = 'parametric', f_dist = f_dist) cat("The class of `obj_fit_parametric` is : (", paste0(class(obj_fit_parametric), collapse = ', '), ").\n") # Estimated parameters. f_params <- obj_fit_parametric$dist$f_drift_parameters # The drifting sojourn time distribution parameters. f_0 <- f_params[,,,1] f_1.3 <- f_params[,,,2] f_2.3 <- f_params[,,,3] f_1 <- f_params[,,,4] params <- paste0('q = ', round(f_params["c", "t", 1, ], 3), ', beta = ', round(f_params["c", "t", 2, ], 3)) f_names <- c("f_0", paste0("f_", 1:(degree-1), "/", degree), "f_1") all_names <- paste(f_names, ":", params) cat("The drifting of the parameters for passing from \n", "`u` = 'c' to `v` = 't' under a discrete Weibull distribution is:", "\n", all_names[1], "\n", all_names[2], "\n", all_names[3], "\n", all_names[4]) # --------------------------------------------------------------------------- # f is not drifting, only p is drifting - Model 2. # --------------------------------------------------------------------------- obj_fit_parametric_2 <- fit_dsmm(sequence = sequence, degree = degree, f_is_drifting = FALSE, p_is_drifting = TRUE, initial_dist = 'unif', estimation = 'parametric', f_dist = f_dist_1) cat("The class of `obj_fit_parametric_2` is : (", paste0(class(obj_fit_parametric_2), collapse = ', '), ").\n") # Estimated parameters. f_params_2 <- obj_fit_parametric_2$dist$f_notdrift_parameters params_2 <- paste0('q = ', round(f_params_2["c", "t", 1], 3), ', beta = ', round(f_params_2["c", "t", 2], 3)) cat("Not-drifting parameters for passing from ", "`u` = 'c' to `v` = 't' \n under a discrete Weibull distribution are:\n", paste("f :", params_2)) # =========================================================================== # `simulate()` and `get_kernel()` can be used for the two objects, # `dsmm_fit_nonparametric` and `dsmm_fit_parametric`. # =========================================================================== sim_seq_nonparametric <- simulate(obj_model_1, nsim = 10) str(sim_seq_nonparametric) kernel_drift_parametric <- get_kernel(obj_fit_parametric, klim = 10) str(kernel_drift_parametric)
This is a generic method that computes and returns the drifting semi-Markov kernel.
get_kernel(obj, t, u, v, l, klim = 100)
get_kernel(obj, t, u, v, l, klim = 100)
obj |
An object that inherits from the S3
classes |
t |
Optional, but recommended. Positive integer specifying
the instance |
u |
Optional. Can be either of the two options below:
|
v |
Optional. Can be either of the two options below:
|
l |
Optional. Positive integer specifying the sojourn time |
klim |
Optional. Positive integer. Used only when A larger value will result in a considerably larger
kernel, which has dimensions of |
The drifting semi-Markov kernel is given as the probability that,
given at the instance the previous state
is
, the next state state
will be reached
with a sojourn time of
:
where is the model size, defined as the length of the embedded
Markov chain
minus the last state,
is the visited state at the instant
and
is the sojourn time of the state
.
Specifically, it is given as the sum of a linear combination:
where are
polynomials with degree
that satisfy certain conditions (see dsmmR) and
are
semi-Markov kernels.
Three possible model specifications are described below.
We will use the exponentials
to distinguish between
the drifting semi-Markov kernel
and the
semi-Markov kernels
used in
Model 1, Model 2 and Model 3.
Model 1
In this case, both and
are "drifting" between
fixed points of the model, hence the "drifting" in drifting semi-Markov
models. Therefore, the semi-Markov kernels
are
equal to:
where for we have
Markov Transition
matrices
, and
sojourn time
distributions
, where
is the
polynomial degree.
Thus, the drifting semi-Markov kernel will be equal to:
Model 2
In this case, is drifting and
is not drifting.
Therefore, the semi-Markov kernels
are
equal to:
Thus, the drifting semi-Markov kernel will be equal to:
Model 3
In this case, is drifting and
is not drifting.
Therefore, the semi-Markov kernels
are now described as:
Thus, the drifting semi-Markov kernel will be equal to:
An array with dimensions of
, giving the
value of the drifting semi-Markov kernel
for
the corresponding
. If any of
or
are
specified, we obtain the element of the array for their given value.
For the objects required to calculate this kernel: fit_dsmm, parametric_dsmm, nonparametric_dsmm.
For sequence simulation through this kernel: simulate.dsmm.
For the theoretical background of drifting semi-Markov models: dsmmR.
# Setup. states <- c("Rouen", "Bucharest", "Samos", "Aigio", "Marseille") emc <- create_sequence(states, probs = c(0.3, 0.1, 0.1, 0.3, 0.2)) obj_model_2 <- fit_dsmm( sequence = emc, states = states, degree = 3, f_is_drifting = FALSE, p_is_drifting = TRUE ) # Get the kernel. kernel_model_2 <- get_kernel(obj_model_2) cat(paste0("If no further arguments are made, kernel has dimensions ", "for all u, v, l, t:\n", "(s, s, k_max, n + 1) = (", paste(dim(kernel_model_2), collapse = ", "), ")")) # Specifying `t`. kernel_model_2_t <- get_kernel(obj_model_2, t = 100) # kernel_model_2_t[ , , , t = 100] cat(paste0("If we specify t, the kernel has dimensions for ", "all the remaining u, v, l:\n(s, s, k_max) = (", paste(dim(kernel_model_2_t), collapse = ", "), ")")) # Specifying `t` and `u`. kernel_model_2_tu <- get_kernel(obj_model_2, t = 2, u = "Aigio") # kernel_model_2_tu["Aigio", , , t = 2] cat(paste0("If we specify t and u, the kernel has dimensions for ", "all the remaining v, l:\n(s, k_max) = (", paste(dim(kernel_model_2_tu), collapse = ", "), ")")) # Specifying `t`, `u` and `v`. kernel_model_2_tuv <- get_kernel(obj_model_2, t = 3, u = "Rouen", v = "Bucharest") # kernel_model_2_tuv["Rouen", "Bucharest", , t = 3] cat(paste0("If we specify t, u and v, the kernel has dimensions ", "for all l:\n(k_max) = (", paste(length(kernel_model_2_tuv), collapse = ", "), ")")) # It is possible to ask for any valid combination of `u`, `v`, `l` and `t`.
# Setup. states <- c("Rouen", "Bucharest", "Samos", "Aigio", "Marseille") emc <- create_sequence(states, probs = c(0.3, 0.1, 0.1, 0.3, 0.2)) obj_model_2 <- fit_dsmm( sequence = emc, states = states, degree = 3, f_is_drifting = FALSE, p_is_drifting = TRUE ) # Get the kernel. kernel_model_2 <- get_kernel(obj_model_2) cat(paste0("If no further arguments are made, kernel has dimensions ", "for all u, v, l, t:\n", "(s, s, k_max, n + 1) = (", paste(dim(kernel_model_2), collapse = ", "), ")")) # Specifying `t`. kernel_model_2_t <- get_kernel(obj_model_2, t = 100) # kernel_model_2_t[ , , , t = 100] cat(paste0("If we specify t, the kernel has dimensions for ", "all the remaining u, v, l:\n(s, s, k_max) = (", paste(dim(kernel_model_2_t), collapse = ", "), ")")) # Specifying `t` and `u`. kernel_model_2_tu <- get_kernel(obj_model_2, t = 2, u = "Aigio") # kernel_model_2_tu["Aigio", , , t = 2] cat(paste0("If we specify t and u, the kernel has dimensions for ", "all the remaining v, l:\n(s, k_max) = (", paste(dim(kernel_model_2_tu), collapse = ", "), ")")) # Specifying `t`, `u` and `v`. kernel_model_2_tuv <- get_kernel(obj_model_2, t = 3, u = "Rouen", v = "Bucharest") # kernel_model_2_tuv["Rouen", "Bucharest", , t = 3] cat(paste0("If we specify t, u and v, the kernel has dimensions ", "for all l:\n(k_max) = (", paste(length(kernel_model_2_tuv), collapse = ", "), ")")) # It is possible to ask for any valid combination of `u`, `v`, `l` and `t`.
dsmm
classChecks for the validity of the specified attributes and the
inheritance of the S3 class dsmm
. This class acts like a parent
class for the classes dsmm_fit_nonparametric,
dsmm_fit_parametric, dsmm_parametric
and dsmm_nonparametric
.
is.dsmm(obj)
is.dsmm(obj)
obj |
Arbitrary |
TRUE or FALSE.
is.dsmm_fit_nonparametric, is.dsmm_fit_nonparametric, is.dsmm_parametric, is.dsmm_nonparametric
dsmm_fit_nonparametric
classChecks for the validity of the specified attributes and the
inheritance of the S3 class dsmm_fit_nonparametric
.
This class inherits methods from the parent class dsmm
.
is.dsmm_fit_nonparametric(obj)
is.dsmm_fit_nonparametric(obj)
obj |
Arbitrary |
TRUE or FALSE.
is.dsmm, is.dsmm_fit_parametric, is.dsmm_nonparametric, is.dsmm_parametric
dsmm_fit_parametric
classChecks for the validity of the specified attributes and the
inheritance of the S3 class dsmm_fit_parametric
.
This class inherits methods from the parent class dsmm
.
is.dsmm_fit_parametric(obj)
is.dsmm_fit_parametric(obj)
obj |
Arbitrary |
TRUE or FALSE.
is.dsmm, is.dsmm_fit_nonparametric, is.dsmm_parametric, is.dsmm_nonparametric
dsmm_nonparametric
classChecks for the validity of the specified attributes and the
inheritance of the S3 class dsmm_nonparametric
.
This class inherits methods from the parent class dsmm
.
is.dsmm_nonparametric(obj)
is.dsmm_nonparametric(obj)
obj |
Arbitrary |
TRUE or FALSE.
is.dsmm, is.dsmm_fit_nonparametric, is.dsmm_fit_parametric, is.dsmm_parametric
dsmm_parametric
classChecks for the validity of the specified attributes and the
inheritance of the S3 class dsmm_parametric
.
This class inherits methods from the parent class dsmm
.
is.dsmm_parametric(obj)
is.dsmm_parametric(obj)
obj |
Arbitrary |
TRUE or FALSE.
is.dsmm, is.dsmm_fit_parametric, is.dsmm_fit_nonparametric, is.dsmm_nonparametric
Contains the complete genome of the Escherichia phage Lambda.
data("lambda", package = "dsmmR") data(lambda, package = "dsmmR") # equivalent. # The following requires the package to be loaded, # e.g. through `library(dsmmR)`. data("lambda") data(lambda)
data("lambda", package = "dsmmR") data(lambda, package = "dsmmR") # equivalent. # The following requires the package to be loaded, # e.g. through `library(dsmmR)`. data("lambda") data(lambda)
A vector object of type "character"
and length of 48502.
It has class of "Rdata"
.
Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., & Petersen, G. B.
(1982). Nucleotide sequence of bacteriophage DNA.
Journal of molecular biology, 162(4), 729-773.
data("lambda", package = "dsmmR") class(lambda) sequence <- c(lambda) # Convert to "character" class str(sequence)
data("lambda", package = "dsmmR") class(lambda) sequence <- c(lambda) # Convert to "character" class str(sequence)
Creates a non-parametric model specification for a drifting
semi-Markov model. Returns an object of class
(dsmm_nonparametric, dsmm)
.
nonparametric_dsmm( model_size, states, initial_dist, degree, k_max, f_is_drifting, p_is_drifting, p_dist, f_dist )
nonparametric_dsmm( model_size, states, initial_dist, degree, k_max, f_is_drifting, p_is_drifting, p_dist, f_dist )
model_size |
Positive integer that represents the size of
the drifting semi-Markov model |
states |
Character vector that represents the state space |
initial_dist |
Numerical vector of |
degree |
Positive integer that represents the polynomial degree |
k_max |
Positive integer that represents the maximum sojourn time of choice, for the drifting semi-Markov model. |
f_is_drifting |
Logical. Specifies if |
p_is_drifting |
Logical. Specifies if |
p_dist |
Numerical array, that represents the probabilities of the
transition matrix
|
f_dist |
Numerical array, that represents the probabilities of the
conditional sojourn time distributions
|
Defined Arguments
For the non-parametric case, we explicitly define:
The transition matrix of the embedded Markov chain
, given in the attribute
p_dist
:
If is not drifting, it contains the values:
given in an array with dimensions of ,
where the first dimension corresponds to the previous state
and the second dimension corresponds to the current state
.
If is drifting then, for
,
it contains the values:
given in an array with dimensions of ,
where the first and second dimensions are defined as in the
non-drifting case, and the third dimension corresponds to the
different matrices
The conditional sojourn time distribution, given in the
attribute f_dist
:
If is not drifting, it contains the values:
given in an array with dimensions of ,
where the first dimension corresponds to the previous state
,
the second dimension corresponds to the current state
,
and the third dimension correspond to the sojourn time
.
If is drifting then, for
,
it contains the values:
given in an array with dimensions of
,
where the first, second and third dimensions are defined as in the
non-drifting case, and the fourth dimension corresponds to the
different arrays
Returns an object of the S3 class
dsmm_nonparametric,dsmm
.
dist
: List. Contains 2 arrays,
passing down from the arguments:
p_drift
or p_notdrift
, corresponding to whether the
defined transition matrix is drifting or not.
f_drift
or f_notdrift
, corresponding to whether the
defined sojourn time distribution is drifting or not.
initial_dist
: Numerical vector. Passing down from the arguments.
It contains the initial distribution of the drifting semi-Markov model.
states
: Character vector. Passing down from the arguments.
It contains the state space .
s
: Positive integer. It contains the number of states in the
state space, , which is given in the attribute
states
.
degree
: Positive integer. Passing down from the arguments.
It contains the polynomial degree considered for the drifting of
the model.
k_max
: Numerical value. Passing down from the arguments.
It contains the maximum sojourn time, for the drifting semi-Markov
model.
model_size
: Positive integer. Passing down from the arguments.
It contains the size of the drifting semi-Markov model , which
represents the length of the embedded Markov chain
, without the last state.
f_is_drifting
: Logical. Passing down from the arguments.
Specifies if is drifting or not.
p_is_drifting
: Logical. Passing down from the arguments.
Specifies if is drifting or not.
Model
: Character. Possible values:
"Model_1"
: Both and
are drifting.
"Model_2"
: is drifting and
is not drifting.
"Model_3"
: is drifting and
is not drifting.
A_i
: Numerical Matrix. Represents the polynomials
with degree
that are used for solving
the system
. Used for the methods defined for the
object. Not printed when viewing the object.
V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.
Methods applied to this object: simulate.dsmm, get_kernel.
For the parametric drifting semi-Markov model specification: parametric_dsmm.
For the theoretical background of drifting semi-Markov models: dsmmR.
# Setup. states <- c("AA", "AC", "CC") s <- length(states) d <- 2 k_max <- 3 # =========================================================================== # Defining non-parametric drifting semi-Markov models. # =========================================================================== # --------------------------------------------------------------------------- # Defining distributions for Model 1 - both p and f are drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). # Sums over v must be 1 for all u and i = 0, ..., d. p_dist_1 <- matrix(c(0, 0.1, 0.9, 0.5, 0, 0.5, 0.3, 0.7, 0), ncol = s, byrow = TRUE) p_dist_2 <- matrix(c(0, 0.6, 0.4, 0.7, 0, 0.3, 0.6, 0.4, 0), ncol = s, byrow = TRUE) p_dist_3 <- matrix(c(0, 0.2, 0.8, 0.6, 0, 0.4, 0.7, 0.3, 0), ncol = s, byrow = TRUE) # Get `p_dist` as an array of p_dist_1, p_dist_2 and p_dist_3. p_dist <- array(c(p_dist_1, p_dist_2, p_dist_3), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s, k_max, d + 1). # First f distribution. Dimensions: (s, s, k_max). # Sums over l must be 1, for every u, v and i = 0, ..., d. f_dist_1_l_1 <- matrix(c(0, 0.2, 0.7, 0.3, 0, 0.4, 0.2, 0.8, 0), ncol = s, byrow = TRUE) f_dist_1_l_2 <- matrix(c(0, 0.3, 0.2, 0.2, 0, 0.5, 0.1, 0.15, 0), ncol = s, byrow = TRUE) f_dist_1_l_3 <- matrix(c(0, 0.5, 0.1, 0.5, 0, 0.1, 0.7, 0.05, 0), ncol = s, byrow = TRUE) # Get f_dist_1 f_dist_1 <- array(c(f_dist_1_l_1, f_dist_1_l_2, f_dist_1_l_3), dim = c(s, s, k_max)) # Second f distribution. Dimensions: (s, s, k_max) f_dist_2_l_1 <- matrix(c(0, 1/3, 0.4, 0.3, 0, 0.4, 0.2, 0.1, 0), ncol = s, byrow = TRUE) f_dist_2_l_2 <- matrix(c(0, 1/3, 0.4, 0.4, 0, 0.2, 0.3, 0.4, 0), ncol = s, byrow = TRUE) f_dist_2_l_3 <- matrix(c(0, 1/3, 0.2, 0.3, 0, 0.4, 0.5, 0.5, 0), ncol = s, byrow = TRUE) # Get f_dist_2 f_dist_2 <- array(c(f_dist_2_l_1, f_dist_2_l_2, f_dist_2_l_3), dim = c(s, s, k_max)) # Third f distribution. Dimensions: (s, s, k_max) f_dist_3_l_1 <- matrix(c(0, 0.3, 0.3, 0.3, 0, 0.5, 0.05, 0.1, 0), ncol = s, byrow = TRUE) f_dist_3_l_2 <- matrix(c(0, 0.2, 0.6, 0.3, 0, 0.35, 0.9, 0.2, 0), ncol = s, byrow = TRUE) f_dist_3_l_3 <- matrix(c(0, 0.5, 0.1, 0.4, 0, 0.15, 0.05, 0.7, 0), ncol = s, byrow = TRUE) # Get f_dist_3 f_dist_3 <- array(c(f_dist_3_l_1, f_dist_3_l_2, f_dist_3_l_3), dim = c(s, s, k_max)) # Get f_dist as an array of f_dist_1, f_dist_2 and f_dist_3. f_dist <- array(c(f_dist_1, f_dist_2, f_dist_3), dim = c(s, s, k_max, d + 1)) # --------------------------------------------------------------------------- # Non-Parametric object for Model 1. # --------------------------------------------------------------------------- obj_nonpar_model_1 <- nonparametric_dsmm( model_size = 8000, states = states, initial_dist = c(0.3, 0.5, 0.2), degree = d, k_max = k_max, p_dist = p_dist, f_dist = f_dist, p_is_drifting = TRUE, f_is_drifting = TRUE ) # p drifting array. p_drift <- obj_nonpar_model_1$dist$p_drift p_drift # f distribution. f_drift <- obj_nonpar_model_1$dist$f_drift f_drift # --------------------------------------------------------------------------- # Defining Model 2 - p is drifting, f is not drifting. # --------------------------------------------------------------------------- # p_dist has the same dimensions as in Model 1: (s, s, d + 1). p_dist_model_2 <- array(c(p_dist_1, p_dist_2, p_dist_3), dim = c(s, s, d + 1)) # f_dist has dimensions of: (s,s,k_{max}). f_dist_model_2 <- f_dist_2 # --------------------------------------------------------------------------- # Non-Parametric object for Model 2. # --------------------------------------------------------------------------- obj_nonpar_model_2 <- nonparametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.7, 0.1, 0.2), degree = d, k_max = k_max, p_dist = p_dist_model_2, f_dist = f_dist_model_2, p_is_drifting = TRUE, f_is_drifting = FALSE ) # p drifting array. p_drift <- obj_nonpar_model_2$dist$p_drift p_drift # f distribution array. f_notdrift <- obj_nonpar_model_2$dist$f_notdrift f_notdrift # --------------------------------------------------------------------------- # Defining Model 3 - f is drifting, p is not drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). p_dist_model_3 <- p_dist_3 # `f_dist` has the same dimensions as in Model 1: (s, s, d + 1). f_dist_model_3 <- array(c(f_dist_1, f_dist_2, f_dist_3), dim = c(s, s, k_max, d + 1)) # --------------------------------------------------------------------------- # Non-Parametric object for Model 3. # --------------------------------------------------------------------------- obj_nonpar_model_3 <- nonparametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.3, 0.4, 0.3), degree = d, k_max = k_max, p_dist = p_dist_model_3, f_dist = f_dist_model_3, p_is_drifting = FALSE, f_is_drifting = TRUE ) # p distribution matrix. p_notdrift <- obj_nonpar_model_3$dist$p_notdrift p_notdrift # f distribution array. f_drift <- obj_nonpar_model_3$dist$f_drift f_drift # =========================================================================== # Using methods for non-parametric objects. # =========================================================================== kernel_parametric <- get_kernel(obj_nonpar_model_3) str(kernel_parametric) sim_seq_par <- simulate(obj_nonpar_model_3, nsim = 50) str(sim_seq_par)
# Setup. states <- c("AA", "AC", "CC") s <- length(states) d <- 2 k_max <- 3 # =========================================================================== # Defining non-parametric drifting semi-Markov models. # =========================================================================== # --------------------------------------------------------------------------- # Defining distributions for Model 1 - both p and f are drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). # Sums over v must be 1 for all u and i = 0, ..., d. p_dist_1 <- matrix(c(0, 0.1, 0.9, 0.5, 0, 0.5, 0.3, 0.7, 0), ncol = s, byrow = TRUE) p_dist_2 <- matrix(c(0, 0.6, 0.4, 0.7, 0, 0.3, 0.6, 0.4, 0), ncol = s, byrow = TRUE) p_dist_3 <- matrix(c(0, 0.2, 0.8, 0.6, 0, 0.4, 0.7, 0.3, 0), ncol = s, byrow = TRUE) # Get `p_dist` as an array of p_dist_1, p_dist_2 and p_dist_3. p_dist <- array(c(p_dist_1, p_dist_2, p_dist_3), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s, k_max, d + 1). # First f distribution. Dimensions: (s, s, k_max). # Sums over l must be 1, for every u, v and i = 0, ..., d. f_dist_1_l_1 <- matrix(c(0, 0.2, 0.7, 0.3, 0, 0.4, 0.2, 0.8, 0), ncol = s, byrow = TRUE) f_dist_1_l_2 <- matrix(c(0, 0.3, 0.2, 0.2, 0, 0.5, 0.1, 0.15, 0), ncol = s, byrow = TRUE) f_dist_1_l_3 <- matrix(c(0, 0.5, 0.1, 0.5, 0, 0.1, 0.7, 0.05, 0), ncol = s, byrow = TRUE) # Get f_dist_1 f_dist_1 <- array(c(f_dist_1_l_1, f_dist_1_l_2, f_dist_1_l_3), dim = c(s, s, k_max)) # Second f distribution. Dimensions: (s, s, k_max) f_dist_2_l_1 <- matrix(c(0, 1/3, 0.4, 0.3, 0, 0.4, 0.2, 0.1, 0), ncol = s, byrow = TRUE) f_dist_2_l_2 <- matrix(c(0, 1/3, 0.4, 0.4, 0, 0.2, 0.3, 0.4, 0), ncol = s, byrow = TRUE) f_dist_2_l_3 <- matrix(c(0, 1/3, 0.2, 0.3, 0, 0.4, 0.5, 0.5, 0), ncol = s, byrow = TRUE) # Get f_dist_2 f_dist_2 <- array(c(f_dist_2_l_1, f_dist_2_l_2, f_dist_2_l_3), dim = c(s, s, k_max)) # Third f distribution. Dimensions: (s, s, k_max) f_dist_3_l_1 <- matrix(c(0, 0.3, 0.3, 0.3, 0, 0.5, 0.05, 0.1, 0), ncol = s, byrow = TRUE) f_dist_3_l_2 <- matrix(c(0, 0.2, 0.6, 0.3, 0, 0.35, 0.9, 0.2, 0), ncol = s, byrow = TRUE) f_dist_3_l_3 <- matrix(c(0, 0.5, 0.1, 0.4, 0, 0.15, 0.05, 0.7, 0), ncol = s, byrow = TRUE) # Get f_dist_3 f_dist_3 <- array(c(f_dist_3_l_1, f_dist_3_l_2, f_dist_3_l_3), dim = c(s, s, k_max)) # Get f_dist as an array of f_dist_1, f_dist_2 and f_dist_3. f_dist <- array(c(f_dist_1, f_dist_2, f_dist_3), dim = c(s, s, k_max, d + 1)) # --------------------------------------------------------------------------- # Non-Parametric object for Model 1. # --------------------------------------------------------------------------- obj_nonpar_model_1 <- nonparametric_dsmm( model_size = 8000, states = states, initial_dist = c(0.3, 0.5, 0.2), degree = d, k_max = k_max, p_dist = p_dist, f_dist = f_dist, p_is_drifting = TRUE, f_is_drifting = TRUE ) # p drifting array. p_drift <- obj_nonpar_model_1$dist$p_drift p_drift # f distribution. f_drift <- obj_nonpar_model_1$dist$f_drift f_drift # --------------------------------------------------------------------------- # Defining Model 2 - p is drifting, f is not drifting. # --------------------------------------------------------------------------- # p_dist has the same dimensions as in Model 1: (s, s, d + 1). p_dist_model_2 <- array(c(p_dist_1, p_dist_2, p_dist_3), dim = c(s, s, d + 1)) # f_dist has dimensions of: (s,s,k_{max}). f_dist_model_2 <- f_dist_2 # --------------------------------------------------------------------------- # Non-Parametric object for Model 2. # --------------------------------------------------------------------------- obj_nonpar_model_2 <- nonparametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.7, 0.1, 0.2), degree = d, k_max = k_max, p_dist = p_dist_model_2, f_dist = f_dist_model_2, p_is_drifting = TRUE, f_is_drifting = FALSE ) # p drifting array. p_drift <- obj_nonpar_model_2$dist$p_drift p_drift # f distribution array. f_notdrift <- obj_nonpar_model_2$dist$f_notdrift f_notdrift # --------------------------------------------------------------------------- # Defining Model 3 - f is drifting, p is not drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). p_dist_model_3 <- p_dist_3 # `f_dist` has the same dimensions as in Model 1: (s, s, d + 1). f_dist_model_3 <- array(c(f_dist_1, f_dist_2, f_dist_3), dim = c(s, s, k_max, d + 1)) # --------------------------------------------------------------------------- # Non-Parametric object for Model 3. # --------------------------------------------------------------------------- obj_nonpar_model_3 <- nonparametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.3, 0.4, 0.3), degree = d, k_max = k_max, p_dist = p_dist_model_3, f_dist = f_dist_model_3, p_is_drifting = FALSE, f_is_drifting = TRUE ) # p distribution matrix. p_notdrift <- obj_nonpar_model_3$dist$p_notdrift p_notdrift # f distribution array. f_drift <- obj_nonpar_model_3$dist$f_drift f_drift # =========================================================================== # Using methods for non-parametric objects. # =========================================================================== kernel_parametric <- get_kernel(obj_nonpar_model_3) str(kernel_parametric) sim_seq_par <- simulate(obj_nonpar_model_3, nsim = 50) str(sim_seq_par)
Creates a parametric model specification for a drifting
semi-Markov model. Returns an object of class
(dsmm_parametric, dsmm)
.
parametric_dsmm( model_size, states, initial_dist, degree, f_is_drifting, p_is_drifting, p_dist, f_dist, f_dist_pars )
parametric_dsmm( model_size, states, initial_dist, degree, f_is_drifting, p_is_drifting, p_dist, f_dist, f_dist_pars )
model_size |
Positive integer that represents the size of
the drifting semi-Markov model |
states |
Character vector that represents the state space |
initial_dist |
Numerical vector of |
degree |
Positive integer that represents the polynomial degree |
f_is_drifting |
Logical. Specifies if |
p_is_drifting |
Logical. Specifies if |
p_dist |
Numerical array, that represents the probabilities of the
transition matrix
|
f_dist |
Character array, that represents the discrete sojourn time
distribution
|
f_dist_pars |
Numerical array, that represents the parameters of the
sojourn time distributions given in
|
Defined Arguments
For the parametric case, we explicitly define:
The transition matrix of the embedded Markov chain
, given in
the attribute
p_dist
:
If is not drifting, it contains the values:
given in an array with dimensions of ,
where the first dimension corresponds to the previous state
and
the second dimension corresponds to the current state
.
If is drifting, for
,
it contains the values:
given in an array with dimensions of ,
where the first and second dimensions are defined as in the non-drifting
case, and the third dimension corresponds to the
different
matrices
The conditional sojourn time distribution, given in the
attribute f_dist
:
If is not drifting, it contains the discrete
distribution names (as characters or
NA
), given in an
array with dimensions of ,
where the first dimension corresponds to the previous state
,
the second dimension corresponds to the current state
.
If is drifting, it contains the discrete
distribution names (as characters or
NA
) given in an
array with dimensions of ,
where the first and second dimensions are defined as in the
non-drifting case, and the third dimension corresponds to the
different arrays
The conditional sojourn time distribution parameters,
given in the attribute f_dist_pars
:
If is not drifting, it contains the
numerical values (or
NA
) of the corresponding
distributions defined in f_dist
, given in
an array with dimensions of ,
where the first dimension corresponds to the previous state
,
the second dimension corresponds to the current state
.
If is drifting, it contains the
numerical values (or
NA
) of the corresponding
distributions defined in f_dist
, given in an array
with dimensions of ,
where the first and second dimensions are defined as in the
non-drifting case, and the third dimension corresponds to the
different arrays
Sojourn time distributions
In this package, the available distributions for the modeling of the
conditional sojourn times, of the drifting semi-Markov model, used through
the argument f_dist
, are the following:
Uniform :
, for
, where
is a
positive integer.
This can be specified through the following:
f_dist = "unif"
f_dist_pars
= (,
NA
)
( as defined here).
Geometric :
, for
where
is the probability of success.
This can be specified through the following:
f_dist
= "geom"
f_dist_pars
= (,
NA
)
( as defined here).
Poisson :
, for
where
.
This can be specified through the following:
f_dist
= "pois"
f_dist_pars
= (,
NA
)
Negative binomial :
, for
where
is the Gamma function,
is the parameter describing the
target for number of successful trials, or the dispersion parameter
(the shape parameter of the gamma mixing distribution).
is the probability of success,
.
f_dist
= "nbinom"
f_dist_pars
= ()
(
as defined here)
Discrete Weibull of type 1 :
, for
with
is the first parameter (probability)
and
is the second parameter.
This can be specified through the following:
f_dist
= "dweibull"
f_dist_pars
= ()
(
as defined here)
From these discrete distributions, by using "dweibull", "nbinom"
we require two parameters. It's for this reason that the attribute
f_dist_pars
is an array of dimensions
if
is not drifting or
if
is drifting.
Returns an object of the S3 class dsmm_parametric, dsmm
.
It has the following attributes:
dist
: List. Contains 3 arrays, passing down from the arguments:
p_drift
or p_notdrift
, corresponding to whether the
defined transition matrix is drifting or not.
f_drift_parametric
or f_notdrift_parametric
,
corresponding to whether the
defined sojourn time distribution is drifting or not.
f_drift_parameters
or f_notdrift_parameters
,
which are the defined sojourn time distribution parameters,
depending on whether
is drifting or not.
initial_dist
: Numerical vector. Passing down from the arguments.
It contains the initial distribution of the drifting semi-Markov model.
states
: Character vector. Passing down from the arguments.
It contains the state space .
s
: Positive integer. It contains the number of states in the
state space, , which is given in the attribute
states
.
degree
: Positive integer. Passing down from the arguments.
It contains the polynomial degree considered for the drifting of
the model.
model_size
: Positive integer. Passing down from the arguments.
It contains the size of the drifting semi-Markov model , which
represents the length of the embedded Markov chain
, without the last state.
f_is_drifting
: Logical. Passing down from the arguments.
Specifies if is drifting or not.
p_is_drifting
: Logical. Passing down from the arguments.
Specifies if is drifting or not.
Model
: Character. Possible values:
"Model_1"
: Both and
are drifting.
"Model_2"
: is drifting and
is not drifting.
"Model_3"
: is drifting and
is not drifting.
A_i
: Numerical matrix. Represents the polynomials
with degree
that are used for solving
the system
. Used for the methods defined for the
object. Not printed when viewing the object.
V. S. Barbu, N. Limnios. (2008). semi-Markov Chains and Hidden semi-Markov Models Toward Applications - Their Use in Reliability and DNA Analysis. New York: Lecture Notes in Statistics, vol. 191, Springer.
Vergne, N. (2008). Drifting Markov models with Polynomial Drift and Applications to DNA Sequences. Statistical Applications in Genetics Molecular Biology 7 (1).
Barbu V. S., Vergne, N. (2019). Reliability and survival analysis for drifting Markov models: modeling and estimation. Methodology and Computing in Applied Probability, 21(4), 1407-1429.
T. Nakagawa and S. Osaki. (1975). The discrete Weibull distribution. IEEE Transactions on Reliability, R-24, 300-301.
Methods applied to this object: simulate.dsmm, get_kernel.
For the non-parametric drifting semi-Markov model specification: nonparametric_dsmm.
For the theoretical background of drifting semi-Markov models: dsmmR.
# We can also define states in a flexible way, including spaces. states <- c("Dollar $", " /1'2'3/ ", " Z E T A ", "O_M_E_G_A") s <- length(states) d <- 1 # =========================================================================== # Defining parametric drifting semi-Markov models. # =========================================================================== # --------------------------------------------------------------------------- # Defining the drifting distributions for Model 1. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). # Sums over v must be 1 for all u and i = 0, ..., d. # First matrix. p_dist_1 <- matrix(c(0, 0.1, 0.4, 0.5, 0.5, 0, 0.3, 0.2, 0.3, 0.4, 0, 0.3, 0.8, 0.1, 0.1, 0), ncol = s, byrow = TRUE) # Second matrix. p_dist_2 <- matrix(c(0, 0.3, 0.6, 0.1, 0.3, 0, 0.4, 0.3, 0.5, 0.3, 0, 0.2, 0.2, 0.3, 0.5, 0), ncol = s, byrow = TRUE) # get `p_dist` as an array of p_dist_1 and p_dist_2. p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s, d + 1). # First matrix. f_dist_1 <- matrix(c(NA, "unif", "dweibull", "nbinom", "geom", NA, "pois", "dweibull", "dweibull", "pois", NA, "geom", "pois", NA, "geom", NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_2 <- matrix(c(NA, "pois", "geom", "nbinom", "geom", NA, "pois", "dweibull", "unif", "geom", NA, "geom", "pois", "pois", "geom", NA), nrow = s, ncol = s, byrow = TRUE) # get `f_dist` as an array of `f_dist_1` and `f_dist_2` f_dist_model_1 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1)) # `f_dist_pars` has dimensions of: (s, s, 2, d + 1). # First array of coefficients, corresponding to `f_dist_1`. # First matrix. f_dist_1_pars_1 <- matrix(c(NA, 5, 0.4, 4, 0.7, NA, 5, 0.6, 0.2, 3, NA, 0.6, 4, NA, 0.4, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_1_pars_2 <- matrix(c(NA, NA, 0.2, 0.6, NA, NA, NA, 0.8, 0.6, NA, NA, NA, NA, NA, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Second array of coefficients, corresponding to `f_dist_2`. # First matrix. f_dist_2_pars_1 <- matrix(c(NA, 6, 0.4, 3, 0.7, NA, 2, 0.5, 3, 0.6, NA, 0.7, 6, 0.2, 0.7, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_2_pars_2 <- matrix(c(NA, NA, NA, 0.6, NA, NA, NA, 0.8, NA, NA, NA, NA, NA, NA, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Get `f_dist_pars`. f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2, f_dist_2_pars_1, f_dist_2_pars_2), dim = c(s, s, 2, d + 1)) # --------------------------------------------------------------------------- # Parametric object for Model 1. # --------------------------------------------------------------------------- obj_par_model_1 <- parametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.8, 0.1, 0.1, 0), degree = d, p_dist = p_dist_model_1, f_dist = f_dist_model_1, f_dist_pars = f_dist_pars_model_1, p_is_drifting = TRUE, f_is_drifting = TRUE ) # p drifting array. p_drift <- obj_par_model_1$dist$p_drift p_drift # f distribution. f_dist_drift <- obj_par_model_1$dist$f_drift_parametric f_dist_drift # parameters for the f distribution. f_dist_pars_drift <- obj_par_model_1$dist$f_drift_parameters f_dist_pars_drift # --------------------------------------------------------------------------- # Defining Model 2 - p is drifting, f is not drifting. # --------------------------------------------------------------------------- # `p_dist` has the same dimensions as in Model 1: (s, s, d + 1). p_dist_model_2 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s). f_dist_model_2 <- matrix(c( NA, "pois", NA, "nbinom", "geom", NA, "geom", "dweibull", "unif", "geom", NA, "geom", "nbinom", "unif", "dweibull", NA), nrow = s, ncol = s, byrow = TRUE) # `f_dist_pars` has dimensions of: (s, s, 2), # corresponding to `f_dist_model_2`. # First matrix. f_dist_pars_1_model_2 <- matrix(c(NA, 0.2, NA, 3, 0.2, NA, 0.2, 0.5, 3, 0.4, NA, 0.7, 2, 3, 0.7, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_pars_2_model_2 <- matrix(c(NA, NA, NA, 0.6, NA, NA, NA, 0.8, NA, NA, NA, NA, 0.2, NA, 0.3, NA), nrow = s, ncol = s, byrow = TRUE) # Get `f_dist_pars`. f_dist_pars_model_2 <- array(c(f_dist_pars_1_model_2, f_dist_pars_2_model_2), dim = c(s, s, 2)) # --------------------------------------------------------------------------- # Parametric object for Model 2. # --------------------------------------------------------------------------- obj_par_model_2 <- parametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.8, 0.1, 0.1, 0), degree = d, p_dist = p_dist_model_2, f_dist = f_dist_model_2, f_dist_pars = f_dist_pars_model_2, p_is_drifting = TRUE, f_is_drifting = FALSE ) # p drifting array. p_drift <- obj_par_model_2$dist$p_drift p_drift # f distribution. f_dist_notdrift <- obj_par_model_2$dist$f_notdrift_parametric f_dist_notdrift # parameters for the f distribution. f_dist_pars_notdrift <- obj_par_model_2$dist$f_notdrift_parameters f_dist_pars_notdrift # --------------------------------------------------------------------------- # Defining Model 3 - f is drifting, p is not drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s). p_dist_model_3 <- matrix(c(0, 0.1, 0.3, 0.6, 0.4, 0, 0.1, 0.5, 0.4, 0.3, 0, 0.3, 0.9, 0.01, 0.09, 0), ncol = s, byrow = TRUE) # `f_dist` has the same dimensions as in Model 1: (s, s, d + 1). f_dist_model_3 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1)) # `f_dist_pars` has the same dimensions as in Model 1: (s, s, 2, d + 1). f_dist_pars_model_3 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2, f_dist_2_pars_1, f_dist_2_pars_2), dim = c(s, s, 2, d + 1)) # --------------------------------------------------------------------------- # Parametric object for Model 3. # --------------------------------------------------------------------------- obj_par_model_3 <- parametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.3, 0.2, 0.2, 0.3), degree = d, p_dist = p_dist_model_3, f_dist = f_dist_model_3, f_dist_pars = f_dist_pars_model_3, p_is_drifting = FALSE, f_is_drifting = TRUE ) # p drifting array. p_notdrift <- obj_par_model_3$dist$p_notdrift p_notdrift # f distribution. f_dist_drift <- obj_par_model_3$dist$f_drift_parametric f_dist_drift # parameters for the f distribution. f_dist_pars_drift <- obj_par_model_3$dist$f_drift_parameters f_dist_pars_drift # =========================================================================== # Parametric estimation using methods corresponding to an object # which inherits from the class `dsmm_parametric`. # =========================================================================== ### Comments ### 1. Using a larger `klim` and a larger `model_size` will increase the ### accuracy of the model, with the need of larger memory requirements ### and computational cost. ### 2. For the parametric estimation it is recommended to use a common set ### of distributions while only the parameters are drifting. This results ### in higher accuracy. # --------------------------------------------------------------------------- # Defining the distributions for Model 1 - both p and f are drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). # First matrix. p_dist_1 <- matrix(c(0, 0.2, 0.4, 0.4, 0.5, 0, 0.3, 0.2, 0.3, 0.4, 0, 0.3, 0.5, 0.3, 0.2, 0), ncol = s, byrow = TRUE) # Second matrix. p_dist_2 <- matrix(c(0, 0.3, 0.5, 0.2, 0.3, 0, 0.4, 0.3, 0.5, 0.3, 0, 0.2, 0.2, 0.4, 0.4, 0), ncol = s, byrow = TRUE) # get `p_dist` as an array of p_dist_1 and p_dist_2. p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s, d + 1). # We will use the same sojourn time distributions. f_dist_1 <- matrix(c( NA, "unif", "dweibull", "nbinom", "geom", NA, "pois", "dweibull", "dweibull", "pois", NA, "geom", "pois", 'nbinom', "geom", NA), nrow = s, ncol = s, byrow = TRUE) # get `f_dist` f_dist_model_1 <- array(f_dist_1, dim = c(s, s, d + 1)) # `f_dist_pars` has dimensions of: (s, s, 2, d + 1). # First array of coefficients, corresponding to `f_dist_1`. # First matrix. f_dist_1_pars_1 <- matrix(c(NA, 7, 0.4, 4, 0.7, NA, 5, 0.6, 0.2, 3, NA, 0.6, 4, 4, 0.4, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_1_pars_2 <- matrix(c(NA, NA, 0.2, 0.6, NA, NA, NA, 0.8, 0.6, NA, NA, NA, NA, 0.3, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Second array of coefficients, corresponding to `f_dist_2`. # First matrix. f_dist_2_pars_1 <- matrix(c(NA, 6, 0.5, 3, 0.5, NA, 4, 0.5, 0.4, 5, NA, 0.7, 6, 5, 0.7, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_2_pars_2 <- matrix(c(NA, NA, 0.4, 0.5, NA, NA, NA, 0.6, 0.5, NA, NA, NA, NA, 0.4, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Get `f_dist_pars`. f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2, f_dist_2_pars_1, f_dist_2_pars_2), dim = c(s, s, 2, d + 1)) # --------------------------------------------------------------------------- # Defining the parametric object for Model 1. # --------------------------------------------------------------------------- obj_par_model_1 <- parametric_dsmm( model_size = 4000, states = states, initial_dist = c(0.8, 0.1, 0.1, 0), degree = d, p_dist = p_dist_model_1, f_dist = f_dist_model_1, f_dist_pars = f_dist_pars_model_1, p_is_drifting = TRUE, f_is_drifting = TRUE ) cat("The object has class of (", paste0(class(obj_par_model_1), collapse = ', '), ").") # --------------------------------------------------------------------------- # Generating a sequence from the parametric object. # --------------------------------------------------------------------------- # A larger klim will lead to an increase in accuracy. klim <- 20 sim_seq <- simulate(obj_par_model_1, klim = klim, seed = 1) # --------------------------------------------------------------------------- # Fitting the generated sequence under the same distributions. # --------------------------------------------------------------------------- fit_par_model1 <- fit_dsmm(sequence = sim_seq, states = states, degree = d, f_is_drifting = TRUE, p_is_drifting = TRUE, estimation = 'parametric', f_dist = f_dist_model_1) cat("The object has class of (", paste0(class(fit_par_model1), collapse = ', '), ").") cat("\nThe estimated parameters are:\n") fit_par_model1$dist$f_drift_parameters
# We can also define states in a flexible way, including spaces. states <- c("Dollar $", " /1'2'3/ ", " Z E T A ", "O_M_E_G_A") s <- length(states) d <- 1 # =========================================================================== # Defining parametric drifting semi-Markov models. # =========================================================================== # --------------------------------------------------------------------------- # Defining the drifting distributions for Model 1. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). # Sums over v must be 1 for all u and i = 0, ..., d. # First matrix. p_dist_1 <- matrix(c(0, 0.1, 0.4, 0.5, 0.5, 0, 0.3, 0.2, 0.3, 0.4, 0, 0.3, 0.8, 0.1, 0.1, 0), ncol = s, byrow = TRUE) # Second matrix. p_dist_2 <- matrix(c(0, 0.3, 0.6, 0.1, 0.3, 0, 0.4, 0.3, 0.5, 0.3, 0, 0.2, 0.2, 0.3, 0.5, 0), ncol = s, byrow = TRUE) # get `p_dist` as an array of p_dist_1 and p_dist_2. p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s, d + 1). # First matrix. f_dist_1 <- matrix(c(NA, "unif", "dweibull", "nbinom", "geom", NA, "pois", "dweibull", "dweibull", "pois", NA, "geom", "pois", NA, "geom", NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_2 <- matrix(c(NA, "pois", "geom", "nbinom", "geom", NA, "pois", "dweibull", "unif", "geom", NA, "geom", "pois", "pois", "geom", NA), nrow = s, ncol = s, byrow = TRUE) # get `f_dist` as an array of `f_dist_1` and `f_dist_2` f_dist_model_1 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1)) # `f_dist_pars` has dimensions of: (s, s, 2, d + 1). # First array of coefficients, corresponding to `f_dist_1`. # First matrix. f_dist_1_pars_1 <- matrix(c(NA, 5, 0.4, 4, 0.7, NA, 5, 0.6, 0.2, 3, NA, 0.6, 4, NA, 0.4, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_1_pars_2 <- matrix(c(NA, NA, 0.2, 0.6, NA, NA, NA, 0.8, 0.6, NA, NA, NA, NA, NA, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Second array of coefficients, corresponding to `f_dist_2`. # First matrix. f_dist_2_pars_1 <- matrix(c(NA, 6, 0.4, 3, 0.7, NA, 2, 0.5, 3, 0.6, NA, 0.7, 6, 0.2, 0.7, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_2_pars_2 <- matrix(c(NA, NA, NA, 0.6, NA, NA, NA, 0.8, NA, NA, NA, NA, NA, NA, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Get `f_dist_pars`. f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2, f_dist_2_pars_1, f_dist_2_pars_2), dim = c(s, s, 2, d + 1)) # --------------------------------------------------------------------------- # Parametric object for Model 1. # --------------------------------------------------------------------------- obj_par_model_1 <- parametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.8, 0.1, 0.1, 0), degree = d, p_dist = p_dist_model_1, f_dist = f_dist_model_1, f_dist_pars = f_dist_pars_model_1, p_is_drifting = TRUE, f_is_drifting = TRUE ) # p drifting array. p_drift <- obj_par_model_1$dist$p_drift p_drift # f distribution. f_dist_drift <- obj_par_model_1$dist$f_drift_parametric f_dist_drift # parameters for the f distribution. f_dist_pars_drift <- obj_par_model_1$dist$f_drift_parameters f_dist_pars_drift # --------------------------------------------------------------------------- # Defining Model 2 - p is drifting, f is not drifting. # --------------------------------------------------------------------------- # `p_dist` has the same dimensions as in Model 1: (s, s, d + 1). p_dist_model_2 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s). f_dist_model_2 <- matrix(c( NA, "pois", NA, "nbinom", "geom", NA, "geom", "dweibull", "unif", "geom", NA, "geom", "nbinom", "unif", "dweibull", NA), nrow = s, ncol = s, byrow = TRUE) # `f_dist_pars` has dimensions of: (s, s, 2), # corresponding to `f_dist_model_2`. # First matrix. f_dist_pars_1_model_2 <- matrix(c(NA, 0.2, NA, 3, 0.2, NA, 0.2, 0.5, 3, 0.4, NA, 0.7, 2, 3, 0.7, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_pars_2_model_2 <- matrix(c(NA, NA, NA, 0.6, NA, NA, NA, 0.8, NA, NA, NA, NA, 0.2, NA, 0.3, NA), nrow = s, ncol = s, byrow = TRUE) # Get `f_dist_pars`. f_dist_pars_model_2 <- array(c(f_dist_pars_1_model_2, f_dist_pars_2_model_2), dim = c(s, s, 2)) # --------------------------------------------------------------------------- # Parametric object for Model 2. # --------------------------------------------------------------------------- obj_par_model_2 <- parametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.8, 0.1, 0.1, 0), degree = d, p_dist = p_dist_model_2, f_dist = f_dist_model_2, f_dist_pars = f_dist_pars_model_2, p_is_drifting = TRUE, f_is_drifting = FALSE ) # p drifting array. p_drift <- obj_par_model_2$dist$p_drift p_drift # f distribution. f_dist_notdrift <- obj_par_model_2$dist$f_notdrift_parametric f_dist_notdrift # parameters for the f distribution. f_dist_pars_notdrift <- obj_par_model_2$dist$f_notdrift_parameters f_dist_pars_notdrift # --------------------------------------------------------------------------- # Defining Model 3 - f is drifting, p is not drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s). p_dist_model_3 <- matrix(c(0, 0.1, 0.3, 0.6, 0.4, 0, 0.1, 0.5, 0.4, 0.3, 0, 0.3, 0.9, 0.01, 0.09, 0), ncol = s, byrow = TRUE) # `f_dist` has the same dimensions as in Model 1: (s, s, d + 1). f_dist_model_3 <- array(c(f_dist_1, f_dist_2), dim = c(s, s, d + 1)) # `f_dist_pars` has the same dimensions as in Model 1: (s, s, 2, d + 1). f_dist_pars_model_3 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2, f_dist_2_pars_1, f_dist_2_pars_2), dim = c(s, s, 2, d + 1)) # --------------------------------------------------------------------------- # Parametric object for Model 3. # --------------------------------------------------------------------------- obj_par_model_3 <- parametric_dsmm( model_size = 10000, states = states, initial_dist = c(0.3, 0.2, 0.2, 0.3), degree = d, p_dist = p_dist_model_3, f_dist = f_dist_model_3, f_dist_pars = f_dist_pars_model_3, p_is_drifting = FALSE, f_is_drifting = TRUE ) # p drifting array. p_notdrift <- obj_par_model_3$dist$p_notdrift p_notdrift # f distribution. f_dist_drift <- obj_par_model_3$dist$f_drift_parametric f_dist_drift # parameters for the f distribution. f_dist_pars_drift <- obj_par_model_3$dist$f_drift_parameters f_dist_pars_drift # =========================================================================== # Parametric estimation using methods corresponding to an object # which inherits from the class `dsmm_parametric`. # =========================================================================== ### Comments ### 1. Using a larger `klim` and a larger `model_size` will increase the ### accuracy of the model, with the need of larger memory requirements ### and computational cost. ### 2. For the parametric estimation it is recommended to use a common set ### of distributions while only the parameters are drifting. This results ### in higher accuracy. # --------------------------------------------------------------------------- # Defining the distributions for Model 1 - both p and f are drifting. # --------------------------------------------------------------------------- # `p_dist` has dimensions of: (s, s, d + 1). # First matrix. p_dist_1 <- matrix(c(0, 0.2, 0.4, 0.4, 0.5, 0, 0.3, 0.2, 0.3, 0.4, 0, 0.3, 0.5, 0.3, 0.2, 0), ncol = s, byrow = TRUE) # Second matrix. p_dist_2 <- matrix(c(0, 0.3, 0.5, 0.2, 0.3, 0, 0.4, 0.3, 0.5, 0.3, 0, 0.2, 0.2, 0.4, 0.4, 0), ncol = s, byrow = TRUE) # get `p_dist` as an array of p_dist_1 and p_dist_2. p_dist_model_1 <- array(c(p_dist_1, p_dist_2), dim = c(s, s, d + 1)) # `f_dist` has dimensions of: (s, s, d + 1). # We will use the same sojourn time distributions. f_dist_1 <- matrix(c( NA, "unif", "dweibull", "nbinom", "geom", NA, "pois", "dweibull", "dweibull", "pois", NA, "geom", "pois", 'nbinom', "geom", NA), nrow = s, ncol = s, byrow = TRUE) # get `f_dist` f_dist_model_1 <- array(f_dist_1, dim = c(s, s, d + 1)) # `f_dist_pars` has dimensions of: (s, s, 2, d + 1). # First array of coefficients, corresponding to `f_dist_1`. # First matrix. f_dist_1_pars_1 <- matrix(c(NA, 7, 0.4, 4, 0.7, NA, 5, 0.6, 0.2, 3, NA, 0.6, 4, 4, 0.4, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_1_pars_2 <- matrix(c(NA, NA, 0.2, 0.6, NA, NA, NA, 0.8, 0.6, NA, NA, NA, NA, 0.3, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Second array of coefficients, corresponding to `f_dist_2`. # First matrix. f_dist_2_pars_1 <- matrix(c(NA, 6, 0.5, 3, 0.5, NA, 4, 0.5, 0.4, 5, NA, 0.7, 6, 5, 0.7, NA), nrow = s, ncol = s, byrow = TRUE) # Second matrix. f_dist_2_pars_2 <- matrix(c(NA, NA, 0.4, 0.5, NA, NA, NA, 0.6, 0.5, NA, NA, NA, NA, 0.4, NA, NA), nrow = s, ncol = s, byrow = TRUE) # Get `f_dist_pars`. f_dist_pars_model_1 <- array(c(f_dist_1_pars_1, f_dist_1_pars_2, f_dist_2_pars_1, f_dist_2_pars_2), dim = c(s, s, 2, d + 1)) # --------------------------------------------------------------------------- # Defining the parametric object for Model 1. # --------------------------------------------------------------------------- obj_par_model_1 <- parametric_dsmm( model_size = 4000, states = states, initial_dist = c(0.8, 0.1, 0.1, 0), degree = d, p_dist = p_dist_model_1, f_dist = f_dist_model_1, f_dist_pars = f_dist_pars_model_1, p_is_drifting = TRUE, f_is_drifting = TRUE ) cat("The object has class of (", paste0(class(obj_par_model_1), collapse = ', '), ").") # --------------------------------------------------------------------------- # Generating a sequence from the parametric object. # --------------------------------------------------------------------------- # A larger klim will lead to an increase in accuracy. klim <- 20 sim_seq <- simulate(obj_par_model_1, klim = klim, seed = 1) # --------------------------------------------------------------------------- # Fitting the generated sequence under the same distributions. # --------------------------------------------------------------------------- fit_par_model1 <- fit_dsmm(sequence = sim_seq, states = states, degree = d, f_is_drifting = TRUE, p_is_drifting = TRUE, estimation = 'parametric', f_dist = f_dist_model_1) cat("The object has class of (", paste0(class(fit_par_model1), collapse = ', '), ").") cat("\nThe estimated parameters are:\n") fit_par_model1$dist$f_drift_parameters
Generic function that simulates a sequence under the rule of a
drifting semi-Markov kernel. The number of simulated states is nsim
,
while the kernel is retrieved from the object obj
via inheritance
from the S3 class dsmm
.
## S3 method for class 'dsmm' simulate( object, nsim = NULL, seed = NULL, max_seq_length = NULL, klim = 100, ... )
## S3 method for class 'dsmm' simulate( object, nsim = NULL, seed = NULL, max_seq_length = NULL, klim = 100, ... )
object |
An object of S3 class |
nsim |
Optional. An integer specifying the number of simulations to be made
from the drifting semi-Markov kernel. The maximum value of |
seed |
Optional. An integer specifying the initialization of the random number generator. |
max_seq_length |
Optional. A positive integer that will ensure the simulated
sequence will not have a maximum total length greater than
|
klim |
Optional. Positive integer. Passed down to |
... |
Optional. Attributes passed down from the |
Returns the simulated sequence for the given drifting
semi-Markov model. It is a character vector based on nsim
simulations,
with a maximum length of max_seq_length
.
This sequence is not to be confused with the embedded Markov chain. The user
can apply the base::rle()
function on this simulated sequence, if he wishes
to obtain the corresponding embedded Markov chain and the sojourn times.
About random number generation in R: RNG
.
Fitting a model through a sequence from this function: fit_dsmm.
For the theoretical background of drifting semi-Markov models: dsmmR.
For obtaining the lengths and values of equals values in a vector:
rle
.
# Setup. sequence <- create_sequence("DNA", len = 1000) states <- sort(unique(sequence)) d <- 1 obj_model_3 <- fit_dsmm(sequence = sequence, states = states, degree = d, f_is_drifting = TRUE, p_is_drifting = FALSE) # Using the method `simulate.dsmm()`. simulated_seq <- simulate(obj_model_3, seed = 1) short_sim <- simulate(obj = obj_model_3, nsim = 10, seed = 1) cut_sim <- simulate(obj = obj_model_3, max_seq_length = 10, seed = 1) str(simulated_seq) str(short_sim) str(cut_sim) # To obtain the embedded Markov chain (EMC) and the corresponding sojourn times # of any simulated sequence, we can simply use the `base::rle()` function. sim_seq_emc <- base::rle(cut_sim)$values # embedded Markov chain sim_seq_sojourn_times <- base::rle(cut_sim)$lengths # sojourn times cat("Start of the simulated sequence: ", head(cut_sim), "...\nThe embedded Markov chain: ", head(sim_seq_emc), "...\nThe sojourn times: ", head(sim_seq_sojourn_times), "...")
# Setup. sequence <- create_sequence("DNA", len = 1000) states <- sort(unique(sequence)) d <- 1 obj_model_3 <- fit_dsmm(sequence = sequence, states = states, degree = d, f_is_drifting = TRUE, p_is_drifting = FALSE) # Using the method `simulate.dsmm()`. simulated_seq <- simulate(obj_model_3, seed = 1) short_sim <- simulate(obj = obj_model_3, nsim = 10, seed = 1) cut_sim <- simulate(obj = obj_model_3, max_seq_length = 10, seed = 1) str(simulated_seq) str(short_sim) str(cut_sim) # To obtain the embedded Markov chain (EMC) and the corresponding sojourn times # of any simulated sequence, we can simply use the `base::rle()` function. sim_seq_emc <- base::rle(cut_sim)$values # embedded Markov chain sim_seq_sojourn_times <- base::rle(cut_sim)$lengths # sojourn times cat("Start of the simulated sequence: ", head(cut_sim), "...\nThe embedded Markov chain: ", head(sim_seq_emc), "...\nThe sojourn times: ", head(sim_seq_sojourn_times), "...")