Skip to contents

This function enables the use of Mixed Effects Random Forests (MERFs) by effectively combining a random forest from ranger with a model capturing random effects from lme4. The MERF algorithm is an algorithmic procedure reminiscent of an EM-algorithm (see Details). The function is the base-function for the wrapping function (SAEforest_model and should not be directly used by the ordinary user. Recommended exceptions are applications exceeding the scope of existing wrapper functions or further research. The function MERFranger allows to model complex patterns of structural relations (see Examples). The function returns an object of class MERFranger, which can be used to produce unit-level predictions. In contrast to the wrapping functions, this function does not directly provide SAE estimates on domain-specific indicators.

Usage

MERFranger(
  Y,
  X,
  random,
  data,
  importance = "none",
  initialRandomEffects = 0,
  ErrorTolerance = 1e-04,
  MaxIterations = 25,
  na.rm = TRUE,
  ...
)

Arguments

Y

Continuous input value of target variable.

X

Matrix of predictive covariates.

random

Specification of random effects terms following the syntax of lmer. Random effect terms are specified by vertical bars (|) separating expressions for design matrices from grouping factors. For further details see lmer and the example below.

data

data.frame of sample data including the specified elements of Y and X.

importance

Variable importance mode processed by the random forest from the ranger. Must be 'none', 'impurity', 'impurity_corrected', 'permutation'. For further details see ranger.

initialRandomEffects

Numeric value or vector of initial estimate of random effects. Defaults to 0.

ErrorTolerance

Numeric value to monitor the MERF algorithm's convergence. Defaults to 1e-04.

MaxIterations

Numeric value specifying the maximal amount of iterations for the MERF algorithm. Defaults to 25.

na.rm

Logical. Whether missing values should be removed. Defaults to TRUE.

...

Additional parameters are directly passed to the random forest ranger. Most important parameters are for instance mtry (number of variables to possibly split at in each node), or num.trees (number of trees). For further details on possible parameters see ranger and the example below.

Value

An object of class MERFranger includes the following elements:

Forest

A random forest of class ranger modelling fixed effects of the model.

EffectModel

A model of random effects of class merMod capturing structural components of MERFs and modeling random components.

RandomEffects

List element containing the values of random intercepts from EffectModel.

RanEffSD

Numeric value of the standard deviation of random intercepts.

ErrorSD

Numeric value of standard deviation of unit-level errors.

VarianceCovariance

VarCorr matrix from EffectModel.

LogLik

Vector with numerical entries showing the loglikelihood of the MERF algorithm.

IterationsUsed

Numeric number of iterations used until convergence of the MERF algorithm.

OOBresiduals

Vector of OOB-residuals.

Random

Character specifying the random intercept in the random effects model.

ErrorTolerance

Numerical value to monitor the MERF algorithm's convergence.

initialRandomEffects

Numeric value or vector of initial specification of random effects.

MaxIterations

Numeric value specifying the maximal amount of iterations for the MERF algorithm.

Details

There exists a generic function for predict for objects obtained by MERFranger.

The MERF algorithm iteratively optimizes two separate steps: a) the random forest function, assuming the random effects term to be correct and b) estimates the random effects part, assuming the OOB-predictions from the forest to be correct. Overall convergence of the algorithm is monitored by the log-likelihood of a joint model of both components. For further details see Krennmair & Schmid (2022) or Hajjem et al. (2014).

Note that the MERFranger object is a composition of elements from a random forest of class ranger and a random effects model of class merMod. Thus, all generic functions are applicable to corresponding objects. For further details on generic functions see ranger and lmer as well as the examples below.

References

Hajjem, A., Bellavance, F., & Larocque, D. (2014). Mixed-Effects Random Forest for Clustered Data. Journal of Statistical Computation and Simulation, 84 (6), 1313–1328.

Krennmair, P., & Schmid, T. (2022). Flexible Domain Prediction Using Mixed Effects Random Forests. Journal of Royal Statistical Society: Series C (Applied Statistics) (forthcoming).

Examples

# \donttest{
# Load Data
data("eusilcA_pop")
data("eusilcA_smp")

income <- eusilcA_smp$eqIncome
X_covar <- eusilcA_smp[, -c(1, 16, 17, 18)]

# Example 1:
# Calculating general model used in wrapper functions

model1 <- MERFranger(Y = income, X = X_covar, random = "(1|district)",
                     data = eusilcA_smp, num.trees=50)
#> Error in initializePtr(): function 'cholmod_factor_ldetA' not provided by package 'Matrix'

# get individual predictions:

ind_pred <- predict(model1, eusilcA_pop)
#> Error in eval(expr, envir, enclos): object 'model1' not found
# }