This guide provides an introduction to applying the multicate package in health data analyses. The explanations below are based on the original paper that is based on for development of this package: Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in medicine (Brantner et al., 2024).1 The detailed github code and readme are in this link(https://github.com/dobengjhu/multicate).

1. Which dataset is suitable?

 

Q. How do I input my data into the function parameters?

To estiamte CATE, you can use the function estiamte_cate() and this function requires parameter like below:

  • estimation_method (string): Choose ‘slearner’ or ‘causalforest’
  • aggregation_method (String): Choose one of options below.
      1. studyindicator: Pool all data together but keep study ID as an indicator, including them as covariates in the single-study method.
      1. ensembleforest: Fit the estimation model within each study, apply each model to all indivudals across all studies, then fit ensemble random forest to augmented data.
      1. studyspecific: Fit the estimation model separately for each study, and report out study-specific estimates.
  • study_col (String): Name of the column of study indicator.Note that study_col can be NULL only when aggregation method is studyspecific. In other cases, study_col parameter should be filled with the specific column name, like ‘studyid’.
  • treatment_col (String): Name of the column of treatment.
  • outcome_col (String): Name of the column of outcome.
  • covariate_col (Vector): Names of the columns of covariates. The default is NULL. The length of string should be the same with the number of covariates. You can type the list like c('var1','var2','var3').
  • drop_col (Vector): Name of the columns to be deleted. You can type the list like c('var1','var2','var3'). Default is NULL.
  • extra_args (List): Add additional arguments for each estimation method. For example, when you choose causal forest for the estimation method you may adjust extra arguments to compelte splitting tress faster by setting a smaller number of trees. (e.g.extra_args = list(num.trees = 50))

 

Q. What kinds of objects are returned?

The result objects of estimate_cate() and summary() provide different information.

  • estimate_cate() result object:
    • ‘GRF forest object of type causal_forest’ (e.g. Number of trees, Number of training samples)
    • ‘variance importance tibble’
    • ‘original data with additional two columns, tau hat and variance of estimates’.
  • summary() of the result object of estimate_cate():
    • overall ATE (object ate)
    • its standard error
    • study-specific ATE (object studycate) with minimum, median, maximum CATE.

 

Q. What kinds of data should I have?

While the primary purpose of this package is for RCTs, but the base dataset can be from randomized controlled trials, observational studies, or a combination of both. Our functions are primarily designed for IPD (Individual Patient Data). The dataset must contain at least 4 key components, a study indicator variable, a treatment indicator variable, An outcome variable, and covariate variables.

 

Q. What data format is required?

You should be careful to use which data formats your data has. For this package, the treatment variable should be numeric (e.g. 0 or 1), the study ID should be character (factor is allowed.), and the outcome must be numeric. Note that this package is primarily designed for continuous outcome, and covariates should be all numeric.

Data column Data format
treatment numeric(0,1)
studyid character
outcome numeric
covariates numeric

 

Q. Can I have missing values?

Packages like grf (used for causal_forest) will support missing values in a specific way. The explanations below are from the grf github reference page.2

Missing values GRF can handle missing covariate values, meaning that a the input covariate matrix can contain NA in some cells instead of numerical values.

These missing values can be informative, in the sense that the fact that a covariate is missing can be predictive of the unobserved value itself or of the outcome Y, or it can be non-informative. For instance, an individual’s annual income could be missing because this individual is wealthy and prefers to conceal his/her income and this in turn is likely to be predictive, for instance, of the type of housing of this individual.

GRF handles missing values implicitly using the missing incorporated in attributes criterion (MIA; Twala et al., 2008), meaning that as soon as there is a missing value in a variable j, there are three candidate splits for a given threshold.

  • split the samples with observed value for j according to the threshold and send all samples with missing value for j to the left,

  • split the samples with observed value for j according to the threshold and send all samples with missing value for j to the right,

  • send all samples with observed value for j to the left and all samples with missing value for j to the right.

Note that by default, missing values are sent to the left, which implies that if the covariates from the training set are completely observed but there are missing values occurring in a new covariate matrix from the test set, then the prediction functions will not produce an error but simply send the incomplete observations to the left side of the corresponding nodes.

Missing values occur almost inevitably in most applications and the way to handle them depends on the type of analysis one wishes to perform and on the mechanism that generated these missing values. For a broader discussion on treatment effect estimation with missing attributes please see Mayer et al. (2019).

 

Q. What if I have categorical or factor variables in my covariates?

If you need to include factor variables, we recommend the following:

The brief summary of solutions suggested by the grf vignett is as follows:

  1. One-hot encoding: Use fastdummies to convert categorical variables into dummy variables easily.
  • Ensure that values do not contain spaces or special characters. For example, convert ‘study 1’ to ‘study_1’.
  1. Means encoding: Use the sufrep package for target/means encoding.

  2. Oridinal encoding: If your categorical variable has a natural order (e.g. months), encode it as a numeric variable accordingly.

See the grf vignette on categorical inputs for mode deail.3

 

Q. I applied one-hot encoding but it is still not working-Why?

Ensure that the column names or formulas passed to estimate_cate() do not contain spaces or logical connectors (like ‘and’). The janitor package is helpful for cleaning column names by replacing spaces with underscores (e.g., ‘variable name’ becomes ‘variable_name’). This step is especially important when using the ensemble forest method.

 

Q. Why did I encounter errors when using the causal forest with the study-specific method?

These errors may be caused by lack of sufficient variation or sparsity in one or more covariates. When a variable has very few observations in a particular category (e.g., a binary variable with only a handful of “Yes” values), it can lead to unstable splits in tree-based models like causal forest, or to overfitting, especially in small subgroups.

Before applying the model to data, it is recommended to examine the distribution of each covariate to identify rare levels or highly imbalanced variables that may affect model performance.

 

Q. What other packages are used internally?

  • slearner: Based on the dbarts and grfpackages.
    • The detailed arguments can be found in the paper by Hill (2011)4 and Künzel (2019)5. The default arguments are as follows: keeptrees is set to TRUE, and verbose is set to FALSE when aggregation_method is set to ensembleforest.
  • causalforest: Based on the package grf’s causal_forest.
    • You can refer to the detailed arguments in the paper by Athey et al. (2019)6. This is estimated by averaging the treatment effect across patients, accounting for counterfactual assignments. The default arguments are as follows: importance is set to impurity, and keep.inbag is set to TRUE.

 

Q. Why can’t I get Plot 4 (BLP) with ensemble forest?

Plot4 BLP(Best Linear projection figure) is not available for the ensemble forest method. You can refer to the table below to see which plots are supported by each combination of aggregation and estimation methods.

No Plots from plot() Aggregation Method Estimation Method
1 Histogram of estimated CATEs Available to all Available to all
2 Boxplot of CATEs by study ID Available to all Available to all
3 95% CI for all CATEs Study specific, Study indicator Available to all
4 Best Linear Projection Study specific, Study indicator Causal forest
5 Interpretation tree Available to all Available to all

 

2. Application

1) Load packages

To get started, install and load the multicate package using either install.packages(multicate) or pak::pak("dobengjhu/multicate"). If you wouldl like to know what kinds of data would be needed, for the exploration purpose, you can refer to our fake dataset dummy_tbl. This fake dataset includes three previously conducted trials comparing the same two treatments. It contains:

  • An outcome variable called response

  • A treatment indicator tx

  • Five signal covariates (var to var5) that may act as effect modifiers

You can view a summary of the dataset with summary(dummy_tbl).

In this document, we used the packages as follows.

# 1) Install multicate package
#install.packages("multicate")
#pak::pak("dobengjhu/multicate")
library(multicate)

#2) Additional packages for data cleansing
library(tibble)
#install.packages("tableone")
library(tableone)
library(tidyverse)
#install.packages("fastDummies")  # if not already installed
library(fastDummies)
#install.packages("kableExtra")
library(kableExtra)
#install.packages("performance")
library(performance)

 

2) RHC dataset

In this example, we use data from a study of Right Heart Catheterization (RHC), a diagnostic procedure used to measure cardiac function in critically ill patients. While RHC can guide urgent and ongoing treatment decisions, it also carries a risk of serious complications. The observational dataset originates from Murphy and Cluff (1990), was later reanalyzed by Connors et al. (1996), and has since become a benchmark dataset in causal inference research.

  • Data7

  • Dictionary8

The dataset (rhc.xls) includes information on 5,735 hospitalized adult patients across five U.S. medical centers. The treatment variable (column 2) indicates whether RHC was administered within 24 hours of admission (TRUE for treatment; FALSE for control). The outcome variable of this study is originally dth30 (column 54), records 30-day post-admission mortality (TRUE for death; FALSE for survival), but in this example, we set the outcome Glasgow Coma Score (scoma1 variable) as our continuous coutcome.

Covariate information (column 3 to 53): age, sex, race (black, white, other), years of education,income, type of medical insurance (private, Medicare, Medicaid, private and Medicare, Medicare and Medicaid, or none), primary disease category, secondary disease category, 10 categories of admission diagnosis, activities of daily living (ADL) and Duke Activity Status Index (DASI) 2 weeks before admission, do-not-resuscitate status on day 1 of admission, cancer (none, localized, metastatic), an estimate of the probability of surviving 2 months, acute physiology component of the APACHE III score, Glasgow Coma Score, weight, temperature, mean blood pressure, respiratory rate, heart rate, PaO2=FIO2 ratio, PaCO2, pH, WBC count, hematocrit, sodium, potassium, creatinine, bilirubin, albumin, urine output, 12 categories of comorbidity illness, and whether the patient transferred from another hospital. Since the number of covariates is too large for demonstration purposes, we will use a smaller subset in the analysis. In practice, however, you can use datasets with many variables.

To demonstrate the functionality of the multicate package in the context of multiple studies, we simulate study membership by arbitrarily dividing the dataset into five groups, treating them as if they came from five distinct medical centers. Although the original dataset does not indicate which patient belongs to which hospital, this simulated structure allows us to explore treatment effect heterogeneity across centers and individual-level covariates.

#Load data
rhc <- read.csv(file="rhc.xls", header=TRUE)
rhc <- rhc %>% mutate(across(where(is.logical), as.numeric))
rhc <- rhc %>% subset(select = c(-ID))
rhc$center <- rep(1:5, each=nrow(rhc)/5)
rhc$center <- as.character(rhc$center)

#Choose covariates to be used for this study
rhc <- rhc %>% select(treatment, center, scoma1,  
                      race, ninsclas,
                      hrt1, resp1, bili1, pafi1, aps1, temp1, das2d3pc, sod1, liverhx)
center treatment scoma1 race ninsclas hrt1 resp1 bili1 pafi1 aps1 temp1 das2d3pc sod1 liverhx
1 0 0 white Medicare 124 10 1.0097656 68.0000 46 38.69531 23.50000 145 0
1 1 0 white Private & Medicare 137 38 0.6999512 218.3125 50 38.89844 14.75195 137 0
1 1 0 white Private 130 40 1.0097656 275.5000 82 36.39844 18.13672 146 0
1 0 0 white Private & Medicare 58 26 0.3999634 156.6562 48 35.79688 22.92969 117 0
1 1 41 white Medicare 125 27 1.0097656 478.0000 72 34.79688 21.05078 126 0
1 0 0 white Medicare 134 36 1.0097656 184.1875 38 39.19531 17.50000 138 0
1 0 26 white Private 135 10 1.0097656 148.7500 29 39.39844 15.40625 136 0
1 0 100 white Private 102 34 0.8999023 240.0000 25 39.19531 29.14453 136 0
1 0 0 white Private 118 30 1.0998535 333.3125 47 38.39844 26.28906 136 0
1 1 0 white Medicaid 141 40 0.5000000 68.0000 48 38.50000 23.25781 146 0

 

3) Data preparation

As described above, the multicate package builds on other modeling packages, which require the input data to be fully numeric and free of missing (NA) values. To meet these requirements, we removed the missing values in the dataset accordingly.

After removing missing values (NA) and simplifying categorical variable levels, we convert all factor and character covariates variables into numeric or dummy variables using fastDummies package. The janitor package’s function, janitor::clean_names() is helpful for standardizing column names by replacing spaces and special characters (e.g., converting condition_no symptoms to condition_no_symptoms).

This is the table to show how dummy variables are created via fast_dummies.

race_other race_white ninsclas_medicare ninsclas_medicare_medicaid ninsclas_no_insurance ninsclas_private ninsclas_private_medicare
0 1 1 0 0 0 0
0 1 0 0 0 0 1
0 1 0 0 0 1 0
0 1 0 0 0 0 1
0 1 1 0 0 0 0

 

This table is the final table about the data we will use for analysis.

Dictionary of variables used in analysis
Category Variables Data.type Description
Treatment treatment num [treatment] RHC was administered within 24 hours of admission (Yes = 1, No = 0)
Studies (fake) center factor [studies] Arbitarily created variable referring to the hospitals/centers that RHC was administered at
Outcome scoma1 int [outcome] Glasgow Coma Score. It is used to assess a neurological status of a patient, with higher indicating a better prognosis.
Covariates hrt1 int Heart rate
resp1 num Respiratroy rate
bili1 num Bilirubin in critical illness. High levels are a marker of severity.
pafi1 num PaO2/FIO2 ratio (P/F ratio) is a measure of oxygenation in the blood used to assess respiratory failure and the severity of conditions like Acute Respiratory Distress Syndrome (ARDS).
aps1 int Acute physiology component of the APACHE score is a value from 0 to 60 that measures a critically ill patients physiological state based on 12 clinical variables collected within the first 24 hours of ICU admission.
temp1 num Temperature
das2d3pc num DASI (Duke Activity Status Index) is a 12-question, self-administered questionnaire used to measure a patient functional capacity, especially those with cardiovascular disease.
sod1 int Sodium
liverhx num Cirrhosis, Hepatic Failure
race_other int Race Other
race_white int Race White
ninsclas_medicare int Type of medical insurance - Medicare
ninsclas_medicare_medicaid int Type of medical insurance - Medicare and Medicaid
ninsclas_no_insurance int Type of medical insurance - No insurance
ninsclas_private int Type of medical insurance - Private
ninsclas_private_medicare int Type of medical insurance - Private and Medicare

4) Choose aggregation and estimation method

The multicate package provides tools to combine multiple RCTs to estimate CATE. It supports three options for handling multiple studies: (1) pooling trials using a study indicator ("studyindicator"), (2) aggregating datasets via an ensemble forest approach ("ensembleforest"), and (3) estimating effects separately within each study without pooling ("studyspecific"). Technically, the last option is not ‘aggregation of multiple studies’, but it is included for settings where trials are too heterogeneous to justify combining information.

For CATE estimation, the package implements two methods: (1) the S-learner, which fits a single model for the outcome, treating the treatment as one of covariates ("slearner"), and (2) the causal forest, which recursively partitions covariates to best split based on treatment effect heterogeneity. Further details can be found in Brantner et al.(2024)9 or in the ‘Background’ section of the website10

 

4-1) Studyindicator

The example below is the sample with causal forest for the estimation method, and study indicator for the aggregation method. Study indicator method pools all data together but keep study ID as an indicator variable and includes them as covariates in the single-study method selected in estiamtion method.

set.seed(100)
cate_mod2 <- estimate_cate(rhc_clean,
                          estimation_method = "causalforest",
                          aggregation_method = "studyindicator",
                          study_col = "center",
                          treatment_col = "treatment",
                          outcome_col = "scoma1",
                          covariate_col = NULL,
                          drop_col = c("race","ninsclas"))
summary_cate_mod2 <- summary(cate_mod2)

#result
#cate_mod2
#result summary
summary_cate_mod2
#> 
#> Estimation Method: causalforest
#> Aggregation Method: studyindicator
#> 
#> Variable Importance:
#>             hrt1     resp1     bili1      pafi1      aps1      temp1   das2d3pc
#> Value 0.08616683 0.2156197 0.1305946 0.07425747 0.2340573 0.08996657 0.06862788
#>             sod1    liverhx   race_other  race_white ninsclas_medicare
#> Value 0.04009024 0.02242125 0.0003493235 0.002782086       0.002467816
#>       ninsclas_medicare_medicaid ninsclas_no_insurance ninsclas_private
#> Value                  0.0011903           0.004070067      0.005487103
#>       ninsclas_private_medicare   center_1    center_2   center_3    center_4
#> Value               0.005270401 0.00320616 0.003143155 0.00296119 0.002510454
#>          center_5
#> Value 0.004759996
#> 
#> Overall ATE:
#>   Estimate Std. Error 
#>    -3.4392     0.7547 
#> 
#> Study-Specific ATE:
#>   Minimum CATE Median CATE Maximum CATE
#> 1       -11.55      -2.735        2.789
#> 2       -11.89      -2.738        2.523
#> 3       -12.25      -2.842        2.693
#> 4       -12.39      -2.959        2.738
#> 5       -11.21      -2.799        2.960

 

4-2) Comparison with Study-specific

Study specific method fits the estimation method separately for each study and report out study-specific estiamtes.

set.seed(100)
cate_mod <- estimate_cate(rhc_clean,
                          estimation_method = "causalforest",
                          aggregation_method = "studyspecific",
                          study_col = "center",
                          treatment_col = "treatment",
                          outcome_col = "scoma1",
                          covariate_col = NULL,
                          drop_col = c("race","ninsclas"))
summary_cate_mod <- summary(cate_mod)

#result
summary_cate_mod
#> 
#> Estimation Method: causalforest
#> Aggregation Method: studyspecific
#> 
#> Variable Importance:
#>                                       1            2            3           4
#> hrt1                       8.815784e-02 0.1578055817 0.1274604448 0.095335491
#> resp1                      3.590424e-01 0.0822063168 0.1319580095 0.105403118
#> bili1                      5.913912e-02 0.1064171989 0.0436751343 0.172664202
#> pafi1                      8.506052e-02 0.1229695073 0.1898693456 0.149199309
#> aps1                       1.004332e-01 0.1530942033 0.1619463391 0.128054155
#> temp1                      9.885726e-02 0.1599027173 0.1048710218 0.152842805
#> das2d3pc                   8.559751e-02 0.0671526922 0.1203708546 0.061798845
#> sod1                       6.227833e-02 0.0827432365 0.0943740709 0.091291067
#> liverhx                    1.825143e-02 0.0012113027 0.0033732315 0.018979862
#> race_other                 1.350108e-03 0.0009375085 0.0001988937 0.001009043
#> race_white                 3.908517e-03 0.0081592309 0.0027869287 0.002858875
#> ninsclas_medicare          3.787342e-03 0.0027478012 0.0055582432 0.002197582
#> ninsclas_medicare_medicaid 8.774907e-05 0.0027714856 0.0002542968 0.001523082
#> ninsclas_no_insurance      5.122805e-05 0.0083558126 0.0046291151 0.010114266
#> ninsclas_private           2.587322e-02 0.0247227159 0.0050555514 0.005159331
#> ninsclas_private_medicare  8.124280e-03 0.0188026885 0.0036185189 0.001568968
#>                                       5
#> hrt1                       0.1572957473
#> resp1                      0.1122038054
#> bili1                      0.1253152619
#> pafi1                      0.0861340315
#> aps1                       0.2649563450
#> temp1                      0.0881977771
#> das2d3pc                   0.0664309899
#> sod1                       0.0661750486
#> liverhx                    0.0006311238
#> race_other                 0.0017621565
#> race_white                 0.0032868101
#> ninsclas_medicare          0.0048008674
#> ninsclas_medicare_medicaid 0.0012967847
#> ninsclas_no_insurance      0.0086710562
#> ninsclas_private           0.0073342039
#> ninsclas_private_medicare  0.0055079907
#> 
#> Overall ATE:
#>   Estimate Std. Error 
#>     -3.417         NA 
#> 
#> Study-Specific ATE:
#>   Minimum CATE Median CATE Maximum CATE
#> 1      -22.339      -2.734       5.4008
#> 2       -6.629      -1.218       4.5192
#> 3      -14.692      -5.160       0.7346
#> 4      -10.975      -4.400       3.6490
#> 5      -11.593      -2.068       8.8241

Study indicator aggregation method does not provide standard error.

#study indicator
summary_cate_mod$ate
#>   Estimate Std. Error 
#>   -3.41695         NA

Study specific aggregation method does provide standard error information.

#study specific
summary_cate_mod2$ate
#>   Estimate Std. Error 
#> -3.4391637  0.7546843

5) Visualization of estimation

The plot() funciton with estimate_cate() objects provides five visualizations to help interpret the estiamted CATEs:

  1. Histogram of the CATEs
  • This plot displays the distribution of the estimated conditional average treatment effects (CATEs) across all individuals. The histogram helps assess the overall shape, spread, and variability of treatment effect heterogeneity. A wide distribution indicates meaningful variation in estimated effects, while a narrow distribution suggests limited heterogeneity.
  1. Boxplot of CATEs stratified by study membership
  • When multiple trials are analyzed together, this plot compares the distribution of CATE estimates across studies. Each box represents a study and summarizes its median, interquartile range, and overall spread. Differences in the distributions highlight how treatment effect heterogeneity may vary across trials.
  1. Confidence interval plot shoing 95% confidence intervals for all CATEs, sorted by their estiamted CATEs
  • This plot shows individual-level CATE estimates along with their 95% confidence intervals, ordered from smallest to largest estimated effect. It provides a detailed view of uncertainty around each estimate and allows users to visually assess the range of plausible treatment effects for each observation.
  1. Best linear projection (available only when estimation_method = "causalforest")
  • This plot summarizes how each covariate is linearly associated with the estimated CATE using the Best Linear Projection (BLP) from generalized random forests. Each point represents a covariate’s estimated contribution to treatment effect heterogeneity, with error bars showing 95% confidence intervals. Panels correspond to different studies, enabling comparison of effect modification patterns across trials.

For example, in the fourth plot below, study 1 shows a BLP coefficient for hrt1 (heart rate) close to zero, whereas study 2 shows a coefficient of approximately 6. This indicates that in study 2, individuals with higher heart rates tend to have larger treatment effect heterogeneity, while heart rate appears to have little relationship with treatment effect heterogeneity in study 1.

  1. Interpretation tree for visualizing how covariates drive treatment effect heterogeneity.
  • This tree provides a simplified decision-tree how the model partitions the covariate space to explain treatment effect heterogeneity. It can offer an interpretable summary of key covariates and splits that drive differences in estimated treatment effects, helping users visualize major sources of heterogeneity.

In the example below, resp1(respiratory rate) and aps1(acute physiology component of the APACHE score, meaning a critical illness level within the first 24 hours of ICU admission) appear as the primary splitting variables, indicating they are key modifiers of the treatment effect.

#plot(cate_mod)

Additionally, the plot_vteffect() provides a more focused visualization if you are interested in exploring treatment effect heterogeneity with respect to a specific continuous covariate. It plots the estimated CATE (tau_hat) for each observation against the selected covariate, allowing you to see how treatment effects vary across its range and across different studies.

For example, if you’re interested in how treatment effects differ by aps1 across studies, you can specify this variable using the covariate_name = "aps1" parameter.

plot_vteffect(cate_mod, covariate_name = "aps1")

 


  1. Brantner, C. L., Nguyen, T. Q., Tang, T., Zhao, C., Hong, H., & Stuart, E. A. (2024). Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in medicine, 43(7), 1291-1314. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9955↩︎

  2. https://grf-labs.github.io/grf/REFERENCE.html#missing-values-1↩︎

  3. https://grf-labs.github.io/grf/articles/categorical_inputs.html↩︎

  4. Hill, J. L. (2011). Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics, 20(1), 217–240. https://doi.org/10.1198/jcgs.2010.08162↩︎

  5. Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10), 4156-4165. https://www.pnas.org/doi/abs/10.1073/pnas.1804597116↩︎

  6. Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. https://arxiv.org/abs/1610.01271↩︎

  7. https://hbiostat.org/data↩︎

  8. https://hbiostat.org/data/repo/rhc↩︎

  9. Brantner, C. L., Nguyen, T. Q., Tang, T., Zhao, C., Hong, H., & Stuart, E. A. (2024). Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in medicine, 43(7), 1291-1314. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9955↩︎

  10. https://www.elizabethstuart.org/hte/↩︎