The package provides approximate maximum likelihood estimation of regression coefficients and an unstructured covariance matrix parameterizing dependence between responses of mixed types. A function for approximate likelihood-ratio testing is also included.
The package can be installed from GitHub, using devtools.
## devtools::install_github("koekvall/mmrr")
library(mmrr)
Let y1,…,yn be independent r−dimensional response vectors whose elements are possibly of mixed types, some continuous and some discrete for example. Let X1,…,Xn be corresponding r×p design matrices. The package fits the regression model E(yi∣wi)=g−1(wi), wi∼Nr(Xiβ,Σ), where g is an Rr-valued link function, wi is a latent (unobservable) variable, β∈Rp, and Σ∈Rr×r is a covariance matrix. Details are in the manuscript ``Mixed-type multivariate response regression with covariance estimation" by Ekvall and Molstad.
The package currently supports responses such that, for every i=1,…,n and j=1,…,r, yij∣wi is either (a) normal with mean wij and variance ψj, (b) quasi-Poisson with mean exp(wij) and variance ψjexp(wij), or (c) Bernoulli with success probability 1/{1+exp(−wij)}. The vector ψ=(ψ1,…,ψr) is assumed known and provided as an argument to the fitting function.
If yij is a Bernoulli response, ψj=1 and Σjj=1 is enforced for identifiability.
We generate n=100 observations of a 3-dimensional response vector and then fit the model. The matrix of predictors is of size nr×p, where the first r rows correspond to the design matrix X1 for the response vector y1, the next r rows correspond to X2, and so on. That is, X is the design matrix for the vector of all responses [y′1,…,y′n]′∈Rnr. We take Xi=I3, the 3×3 identity matrix, corresponding to a separate intercept for each of the three responses. The responses are stored in an n×r matrix where the ith row is the ith response vector. The first response is normal, the second Bernoulli, and the third Poisson (conditionally on the latent vector).
library(mmrr)
# Generate data
n <- 100; r <- 3
set.seed(4)
X <- kronecker(rep(1, n), diag(3)) # Each response has its own intercept
type <- 1:3 # First response normal, second Bernoulli, third Poisson
psi <- rep(1, 3) # psi_j = 1 for all j
Beta0 <- runif(3, -1, 1) # True coefficient vector
Sigma0 <- cov2cor(crossprod(matrix(rnorm(9), 3, 3))) + diag(c(1.5, 0, 0.5)) # True covariance matrix
R0 <- chol(Sigma0) # Function for generating data uses Cholesky root
Y <- generate_mmrr(X = X, Beta = Beta0, R = R0, type = type, psi = psi)
# Fit model
fit <- mmrr(Y = Y, X = X, type = type, psi = psi)
# Get estimates
fit$Beta
## [1] 0.1318782 -0.6950476 -0.2211912
fit$Sigma
## [,1] [,2] [,3]
## [1,] 2.822656 1.6749202 1.0834566
## [2,] 1.674920 1.0000005 0.7222296
## [3,] 1.083457 0.7222296 1.4425031
Suppose we want to test whether the Normal and Bernoulli variable are independent. That is equivalent to testing whether Σ12=Σ21=0. We fit the null model by incorporating the elementwise restrictions on Σ using the argument M. We also restrict Σ22=1 for identifiability because the second response is Bernoulli (this restriction is automatic when the user does not supply M).
M <- matrix(NA, 3, 3) # NA means no restriction
M[1, 2] <- M[2, 1] <- 0 # Set covariance restriction
M[2, 2] <- 1 # Set variance restriction
# Inspect M
M
## [,1] [,2] [,3]
## [1,] NA 0 NA
## [2,] 0 1 NA
## [3,] NA NA NA
# Fit model
fit_null <- mmrr(Y = Y, X = X, type = type, psi = psi, M = M)
# Inspect null estimate
fit_null$Sigma
## [,1] [,2] [,3]
## [1,] 2.753878e+00 1.498801e-15 0.9691641
## [2,] 1.554312e-15 1.000000e+00 0.2859969
## [3,] 9.691641e-01 2.859969e-01 1.2902496
# Do approximate likelihood ratio test
lrt_approx(fit_null = fit_null, fit_full = fit)
## $stat
## [1] 14.5007
##
## $p
## [1] 0.0001401071
##
## $df
## [1] 1
Because the predictors (an intercept only) do not depend on i, the marginal moments of the yi∈R3 do not depend on i. With the estimates of β and Σ, we can get an estimate of the mean vector and covariance matrix of the yi as follows.
# True marginal moments
mean_y <- predict_mmrr(X = diag(3), Beta = Beta0,
sigma = sqrt(diag(Sigma0)),
type = type, num_nodes = 10)
cov_y <- cov_mmrr(X = diag(3), Beta = Beta0, Sigma = Sigma0, psi = psi,
type = type, num_nodes = 10)
# Estimates of marginal moments
mean_y_hat <- predict_mmrr(X = diag(3), Beta = fit$Beta,
sigma = sqrt(diag(fit$Sigma)),
type = type, num_nodes = 10)
cov_y_hat <- cov_mmrr(X = diag(3), Beta = fit$Beta, Sigma = fit$Sigma, psi = psi,
type = type, num_nodes = 10)
# Compare
mean_y; mean_y_hat
## [,1] [,2] [,3]
## [1,] 0.1716006 0.3064572 1.40141
## [,1] [,2] [,3]
## [1,] 0.1318782 0.3597954 1.648821
cov_y; cov_y_hat
## [,1] [,2] [,3]
## [1,] 3.5000000 0.1697016 1.0248625
## [2,] 0.1697016 0.2125412 0.1959248
## [3,] 1.0248625 0.1959248 8.2392785
## [,1] [,2] [,3]
## [1,] 3.8226563 0.3218452 1.7864258
## [2,] 0.3218452 0.2303427 0.2404325
## [3,] 1.7864258 0.2404325 10.4333959