Skip to contents

Function tune_parameters allows to tune parameters for the implemented MERF method. Essentially, this function can be understood as a modified wrapper for train from the package caret, treating MERFs as a custom method.

Usage

tune_parameters(
  Y,
  X,
  data,
  dName,
  trControl,
  tuneGrid,
  seed = 11235,
  gg_theme = theme_minimal(),
  plot_res = TRUE,
  return_plot = FALSE,
  na.rm = TRUE,
  ...
)

Arguments

Y

Continuous input value of target variable.

X

Matrix or data.frame of predictive covariates.

data

data.frame of survey sample data including the specified elements of Y and X.

dName

Character specifying the name of domain identifier, for which random intercepts are modeled.

trControl

Control parameters passed to train. Most important parameters are method ("repeatedcv" for x-fold cross-validation), number (the number of folds) and repeats (the number of repetitions). For further details see trainControl and the example below.

tuneGrid

A data.frame with possible tuning values. The columns must have the same names as the tuning parameters. For this tuning function the grid must comprise entries for the following parameters: num.trees, mtry, min.node.size, splitrule.

seed

Enabling reproducibility of for cross-validation and tuning. Defaults to 11235.

gg_theme

Specify a predefined theme from ggplot2. Defaults to theme_minimal.

plot_res

Optional logical. If TRUE, the plot with results of cross-validation and tuning is shown. Defaults to TRUE.

return_plot

If set to TRUE, a list of the comparative plot produced by ggplot2 is returned for further individual customization and processing.

na.rm

Logical. Whether missing values should be removed. Defaults to TRUE.

...

Additional parameters are directly passed to the random forest ranger and/or the training function train. For further details on possible parameters and examples see ranger or train.

Value

Prints requested optimal tuning parameters and (if requested) an additional comparative plot produced by ggplot2.

Details

Tuning can be performed on the following four parameters: num.trees (the number of trees for a forest), mtry (number of variables as split candidates at in each node), min.node.size (minimal individual node size) and splitrule (general splitting rule). For details see ranger.

Examples

# \donttest{
# Loading data
data("eusilcA_pop")
data("eusilcA_smp")
library(caret)
#> Loading required package: ggplot2
#> Loading required package: lattice

income <- eusilcA_smp$eqIncome
X_covar <- eusilcA_smp[, -c(1, 16, 17, 18)]

# Specific characteristics of Cross-validation
fitControl <- trainControl(method = "repeatedcv", number = 5,
                           repeats = 1)

# Define a tuning-grid
merfGrid <- expand.grid(num.trees = 50, mtry = c(3, 7, 9),
                        min.node.size = 10, splitrule = "variance")

tune_parameters(Y = income, X = X_covar, data = eusilcA_smp,
                dName = "district", trControl = fitControl,
                tuneGrid = merfGrid)
#> Warning: model fit failed for Fold1.Rep1: num.trees=50, mtry=3, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold1.Rep1: num.trees=50, mtry=7, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold1.Rep1: num.trees=50, mtry=9, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold2.Rep1: num.trees=50, mtry=3, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold2.Rep1: num.trees=50, mtry=7, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold2.Rep1: num.trees=50, mtry=9, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold3.Rep1: num.trees=50, mtry=3, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold3.Rep1: num.trees=50, mtry=7, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold3.Rep1: num.trees=50, mtry=9, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold4.Rep1: num.trees=50, mtry=3, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold4.Rep1: num.trees=50, mtry=7, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold4.Rep1: num.trees=50, mtry=9, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold5.Rep1: num.trees=50, mtry=3, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold5.Rep1: num.trees=50, mtry=7, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: model fit failed for Fold5.Rep1: num.trees=50, mtry=9, min.node.size=10, splitrule=variance Error in initializePtr() : 
#>   function 'cholmod_factor_ldetA' not provided by package 'Matrix'
#> Warning: There were missing values in resampled performance measures.
#> Something is wrong; all the RMSE metric values are missing:
#>       RMSE        Rsquared        MAE     
#>  Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   : NA   Max.   : NA   Max.   : NA  
#>  NA's   :3     NA's   :3     NA's   :3    
#> Error: Stopping
# }