Processing math: 100%

Overview

The package provides approximate maximum likelihood estimation of regression coefficients and an unstructured covariance matrix parameterizing dependence between responses of mixed types. A function for approximate likelihood-ratio testing is also included.

Installation

The package can be installed from GitHub, using devtools.

## devtools::install_github("koekvall/mmrr")
library(mmrr)

Setting

Let y1,,yn be independent rdimensional response vectors whose elements are possibly of mixed types, some continuous and some discrete for example. Let X1,,Xn be corresponding r×p design matrices. The package fits the regression model E(yiwi)=g1(wi),  wiNr(Xiβ,Σ), where g is an Rr-valued link function, wi is a latent (unobservable) variable, βRp, and ΣRr×r is a covariance matrix. Details are in the manuscript ``Mixed-type multivariate response regression with covariance estimation" by Ekvall and Molstad.

The package currently supports responses such that, for every i=1,,n and j=1,,r, yijwi is either (a) normal with mean wij and variance ψj, (b) quasi-Poisson with mean exp(wij) and variance ψjexp(wij), or (c) Bernoulli with success probability 1/{1+exp(wij)}. The vector ψ=(ψ1,,ψr) is assumed known and provided as an argument to the fitting function.

If yij is a Bernoulli response, ψj=1 and Σjj=1 is enforced for identifiability.

Example

Model fitting

We generate n=100 observations of a 3-dimensional response vector and then fit the model. The matrix of predictors is of size nr×p, where the first r rows correspond to the design matrix X1 for the response vector y1, the next r rows correspond to X2, and so on. That is, X is the design matrix for the vector of all responses [y1,,yn]Rnr. We take Xi=I3, the 3×3 identity matrix, corresponding to a separate intercept for each of the three responses. The responses are stored in an n×r matrix where the ith row is the ith response vector. The first response is normal, the second Bernoulli, and the third Poisson (conditionally on the latent vector).

library(mmrr)

# Generate data
n <- 100; r <- 3
set.seed(4)
X <- kronecker(rep(1, n), diag(3)) # Each response has its own intercept
type <- 1:3 # First response normal, second Bernoulli, third Poisson
psi <- rep(1, 3) # psi_j = 1 for all j
Beta0 <- runif(3, -1, 1) # True coefficient vector
Sigma0 <- cov2cor(crossprod(matrix(rnorm(9), 3, 3))) + diag(c(1.5, 0, 0.5)) # True covariance matrix
R0 <- chol(Sigma0) # Function for generating data uses Cholesky root
Y <- generate_mmrr(X = X, Beta = Beta0, R = R0, type = type, psi = psi)

# Fit model
fit <- mmrr(Y = Y, X = X, type = type, psi = psi)

# Get estimates
fit$Beta
## [1]  0.1318782 -0.6950476 -0.2211912
fit$Sigma
##          [,1]      [,2]      [,3]
## [1,] 2.822656 1.6749202 1.0834566
## [2,] 1.674920 1.0000005 0.7222296
## [3,] 1.083457 0.7222296 1.4425031

Testing

Suppose we want to test whether the Normal and Bernoulli variable are independent. That is equivalent to testing whether Σ12=Σ21=0. We fit the null model by incorporating the elementwise restrictions on Σ using the argument M. We also restrict Σ22=1 for identifiability because the second response is Bernoulli (this restriction is automatic when the user does not supply M).

M <- matrix(NA, 3, 3) # NA means no restriction
M[1, 2] <- M[2, 1] <- 0 # Set covariance restriction
M[2, 2] <- 1 # Set variance restriction
# Inspect M
M
##      [,1] [,2] [,3]
## [1,]   NA    0   NA
## [2,]    0    1   NA
## [3,]   NA   NA   NA
# Fit model
fit_null <- mmrr(Y = Y, X = X, type = type, psi = psi, M = M)

# Inspect null estimate
fit_null$Sigma
##              [,1]         [,2]      [,3]
## [1,] 2.753878e+00 1.498801e-15 0.9691641
## [2,] 1.554312e-15 1.000000e+00 0.2859969
## [3,] 9.691641e-01 2.859969e-01 1.2902496
# Do approximate likelihood ratio test
lrt_approx(fit_null = fit_null, fit_full = fit)
## $stat
## [1] 14.5007
## 
## $p
## [1] 0.0001401071
## 
## $df
## [1] 1

Marginal moments

Because the predictors (an intercept only) do not depend on i, the marginal moments of the yiR3 do not depend on i. With the estimates of β and Σ, we can get an estimate of the mean vector and covariance matrix of the yi as follows.

# True marginal moments
mean_y <- predict_mmrr(X = diag(3), Beta = Beta0,
                      sigma = sqrt(diag(Sigma0)),
                      type = type, num_nodes = 10)
cov_y <- cov_mmrr(X = diag(3), Beta = Beta0, Sigma = Sigma0, psi = psi,
                  type = type, num_nodes = 10)

# Estimates of marginal moments
mean_y_hat <- predict_mmrr(X = diag(3), Beta = fit$Beta,
                      sigma = sqrt(diag(fit$Sigma)),
                      type = type, num_nodes = 10)
cov_y_hat <- cov_mmrr(X = diag(3), Beta = fit$Beta, Sigma = fit$Sigma, psi = psi,
                  type = type, num_nodes = 10)

# Compare
mean_y; mean_y_hat
##           [,1]      [,2]    [,3]
## [1,] 0.1716006 0.3064572 1.40141
##           [,1]      [,2]     [,3]
## [1,] 0.1318782 0.3597954 1.648821
cov_y; cov_y_hat
##           [,1]      [,2]      [,3]
## [1,] 3.5000000 0.1697016 1.0248625
## [2,] 0.1697016 0.2125412 0.1959248
## [3,] 1.0248625 0.1959248 8.2392785
##           [,1]      [,2]       [,3]
## [1,] 3.8226563 0.3218452  1.7864258
## [2,] 0.3218452 0.2303427  0.2404325
## [3,] 1.7864258 0.2404325 10.4333959