This function enables the use of Mixed Effects Random Forests (MERFs) by effectively
combining a random forest from ranger with a model capturing random effects from
lme4. The MERF algorithm is an algorithmic procedure reminiscent of an EM-algorithm
(see Details). The function is the base-function for the wrapping function (SAEforest_model
and should not be directly used by the ordinary user. Recommended exceptions are applications exceeding
the scope of existing wrapper functions or further research. The function MERFranger
allows to model complex patterns of structural relations (see Examples). The function returns
an object of class MERFranger
, which can be used to produce unit-level predictions. In contrast to
the wrapping functions, this function does not directly provide SAE estimates on domain-specific indicators.
Usage
MERFranger(
Y,
X,
random,
data,
importance = "none",
initialRandomEffects = 0,
ErrorTolerance = 1e-04,
MaxIterations = 25,
na.rm = TRUE,
...
)
Arguments
- Y
Continuous input value of target variable.
- X
Matrix of predictive covariates.
- random
Specification of random effects terms following the syntax of lmer. Random effect terms are specified by vertical bars
(|)
separating expressions for design matrices from grouping factors. For further details see lmer and the example below.- data
data.frame of sample data including the specified elements of
Y
andX
.- importance
Variable importance mode processed by the random forest from the ranger. Must be 'none', 'impurity', 'impurity_corrected', 'permutation'. For further details see ranger.
- initialRandomEffects
Numeric value or vector of initial estimate of random effects. Defaults to 0.
- ErrorTolerance
Numeric value to monitor the MERF algorithm's convergence. Defaults to 1e-04.
- MaxIterations
Numeric value specifying the maximal amount of iterations for the MERF algorithm. Defaults to 25.
- na.rm
Logical. Whether missing values should be removed. Defaults to
TRUE
.- ...
Additional parameters are directly passed to the random forest ranger. Most important parameters are for instance
mtry
(number of variables to possibly split at in each node), ornum.trees
(number of trees). For further details on possible parameters see ranger and the example below.
Value
An object of class MERFranger includes the following elements:
Forest
A random forest of class ranger modelling fixed effects of the model.
EffectModel
A model of random effects of class
merMod
capturing structural components of MERFs and modeling random components.RandomEffects
List element containing the values of random intercepts from
EffectModel
.RanEffSD
Numeric value of the standard deviation of random intercepts.
ErrorSD
Numeric value of standard deviation of unit-level errors.
VarianceCovariance
VarCorr matrix from
EffectModel
.LogLik
Vector with numerical entries showing the loglikelihood of the MERF algorithm.
IterationsUsed
Numeric number of iterations used until convergence of the MERF algorithm.
OOBresiduals
Vector of OOB-residuals.
Random
Character specifying the random intercept in the random effects model.
ErrorTolerance
Numerical value to monitor the MERF algorithm's convergence.
initialRandomEffects
Numeric value or vector of initial specification of random effects.
MaxIterations
Numeric value specifying the maximal amount of iterations for the MERF algorithm.
Details
There exists a generic function for predict
for objects obtained by MERFranger
.
The MERF algorithm iteratively optimizes two separate steps: a) the random forest function, assuming the random effects term to be correct and b) estimates the random effects part, assuming the OOB-predictions from the forest to be correct. Overall convergence of the algorithm is monitored by the log-likelihood of a joint model of both components. For further details see Krennmair & Schmid (2022) or Hajjem et al. (2014).
Note that the MERFranger
object is a composition of elements from a random forest of class
ranger
and a random effects model of class merMod
. Thus, all generic functions are
applicable to corresponding objects. For further details on generic functions see ranger
and lmer
as well as the examples below.
References
Hajjem, A., Bellavance, F., & Larocque, D. (2014). Mixed-Effects Random Forest for Clustered Data. Journal of Statistical Computation and Simulation, 84 (6), 1313–1328.
Krennmair, P., & Schmid, T. (2022). Flexible Domain Prediction Using Mixed Effects Random Forests. Journal of Royal Statistical Society: Series C (Applied Statistics) (forthcoming).
Examples
# \donttest{
# Load Data
data("eusilcA_pop")
data("eusilcA_smp")
income <- eusilcA_smp$eqIncome
X_covar <- eusilcA_smp[, -c(1, 16, 17, 18)]
# Example 1:
# Calculating general model used in wrapper functions
model1 <- MERFranger(Y = income, X = X_covar, random = "(1|district)",
data = eusilcA_smp, num.trees=50)
#> Error in initializePtr(): function 'cholmod_factor_ldetA' not provided by package 'Matrix'
# get individual predictions:
ind_pred <- predict(model1, eusilcA_pop)
#> Error in eval(expr, envir, enclos): object 'model1' not found
# }