This guide provides an introduction to applying the multicate package in health data analyses. The explanations below are based on the original paper that is based on for development of this package: Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in medicine (Brantner et al., 2024).¹ The detailed github code and readme are in this link(https://github.com/dobengjhu/multicate).

1. Which dataset is suitable?

Q. How do I input my data into the function parameters?

To estiamte CATE, you can use the function estiamte_cate() and this function requires parameter like below:

estimation_method (string): Choose ‘slearner’ or ‘causalforest’
aggregation_method (String): Choose one of options below.
- 1. studyindicator: Pool all data together but keep study ID as an indicator, including them as covariates in the single-study method.
- 1. ensembleforest: Fit the estimation model within each study, apply each model to all indivudals across all studies, then fit ensemble random forest to augmented data.
- 1. studyspecific: Fit the estimation model separately for each study, and report out study-specific estimates.
study_col (String): Name of the column of study indicator.Note that study_col can be NULL only when aggregation method is studyspecific. In other cases, study_col parameter should be filled with the specific column name, like ‘studyid’.
treatment_col (String): Name of the column of treatment.
outcome_col (String): Name of the column of outcome.
covariate_col (Vector): Names of the columns of covariates. The default is NULL. The length of string should be the same with the number of covariates. You can type the list like c('var1','var2','var3').
drop_col (Vector): Name of the columns to be deleted. You can type the list like c('var1','var2','var3'). Default is NULL.
extra_args (List): Add additional arguments for each estimation method. For example, when you choose causal forest for the estimation method you may adjust extra arguments to compelte splitting tress faster by setting a smaller number of trees. (e.g.extra_args = list(num.trees = 50))

Q. What kinds of objects are returned?

The result objects of estimate_cate() and summary() provide different information.

estimate_cate() result object:
- ‘GRF forest object of type causal_forest’ (e.g. Number of trees, Number of training samples)
- ‘variance importance tibble’
- ‘original data with additional two columns, tau hat and variance of estimates’.
summary() of the result object of estimate_cate():
- overall ATE (object ate)
- its standard error
- study-specific ATE (object studycate) with minimum, median, maximum CATE.

Q. What kinds of data should I have?

While the primary purpose of this package is for RCTs, but the base dataset can be from randomized controlled trials, observational studies, or a combination of both. Our functions are primarily designed for IPD (Individual Patient Data). The dataset must contain at least 4 key components, a study indicator variable, a treatment indicator variable, An outcome variable, and covariate variables.

Q. What data format is required?

You should be careful to use which data formats your data has. For this package, the treatment variable should be numeric (e.g. 0 or 1), the study ID should be character (factor is allowed.), and the outcome must be numeric. Note that this package is primarily designed for continuous outcome, and covariates should be all numeric.

Data column	Data format
treatment	numeric(0,1)
studyid	character
outcome	numeric
covariates	numeric

Q. Can I have missing values?

Packages like grf (used for causal_forest) will support missing values in a specific way. The explanations below are from the grf github reference page.²

Missing values GRF can handle missing covariate values, meaning that a the input covariate matrix can contain NA in some cells instead of numerical values.

These missing values can be informative, in the sense that the fact that a covariate is missing can be predictive of the unobserved value itself or of the outcome Y, or it can be non-informative. For instance, an individual’s annual income could be missing because this individual is wealthy and prefers to conceal his/her income and this in turn is likely to be predictive, for instance, of the type of housing of this individual.

GRF handles missing values implicitly using the missing incorporated in attributes criterion (MIA; Twala et al., 2008), meaning that as soon as there is a missing value in a variable j, there are three candidate splits for a given threshold.

split the samples with observed value for j according to the threshold and send all samples with missing value for j to the left,
split the samples with observed value for j according to the threshold and send all samples with missing value for j to the right,
send all samples with observed value for j to the left and all samples with missing value for j to the right.

Note that by default, missing values are sent to the left, which implies that if the covariates from the training set are completely observed but there are missing values occurring in a new covariate matrix from the test set, then the prediction functions will not produce an error but simply send the incomplete observations to the left side of the corresponding nodes.

Missing values occur almost inevitably in most applications and the way to handle them depends on the type of analysis one wishes to perform and on the mechanism that generated these missing values. For a broader discussion on treatment effect estimation with missing attributes please see Mayer et al. (2019).

Q. What if I have categorical or factor variables in my covariates?

If you need to include factor variables, we recommend the following:

The brief summary of solutions suggested by the grf vignett is as follows:

One-hot encoding: Use fastdummies to convert categorical variables into dummy variables easily.

Ensure that values do not contain spaces or special characters. For example, convert ‘study 1’ to ‘study_1’.

Means encoding: Use the sufrep package for target/means encoding.
Oridinal encoding: If your categorical variable has a natural order (e.g. months), encode it as a numeric variable accordingly.

See the grf vignette on categorical inputs for mode deail.³

Q. I applied one-hot encoding but it is still not working-Why?

Ensure that the column names or formulas passed to estimate_cate() do not contain spaces or logical connectors (like ‘and’). The janitor package is helpful for cleaning column names by replacing spaces with underscores (e.g., ‘variable name’ becomes ‘variable_name’). This step is especially important when using the ensemble forest method.

Q. Why did I encounter errors when using the causal forest with the study-specific method?

These errors may be caused by lack of sufficient variation or sparsity in one or more covariates. When a variable has very few observations in a particular category (e.g., a binary variable with only a handful of “Yes” values), it can lead to unstable splits in tree-based models like causal forest, or to overfitting, especially in small subgroups.

Before applying the model to data, it is recommended to examine the distribution of each covariate to identify rare levels or highly imbalanced variables that may affect model performance.

Q. What other packages are used internally?

slearner: Based on the dbarts and grfpackages.
- The detailed arguments can be found in the paper by Hill (2011)⁴ and Künzel (2019)⁵. The default arguments are as follows: keeptrees is set to TRUE, and verbose is set to FALSE when aggregation_method is set to ensembleforest.
causalforest: Based on the package grf’s causal_forest.
- You can refer to the detailed arguments in the paper by Athey et al. (2019)⁶. This is estimated by averaging the treatment effect across patients, accounting for counterfactual assignments. The default arguments are as follows: importance is set to impurity, and keep.inbag is set to TRUE.

Q. Why can’t I get Plot 4 (BLP) with ensemble forest?

Plot4 BLP(Best Linear projection figure) is not available for the ensemble forest method. You can refer to the table below to see which plots are supported by each combination of aggregation and estimation methods.

No	Plots from `plot()`	Aggregation Method	Estimation Method
1	Histogram of estimated CATEs	Available to all	Available to all
2	Boxplot of CATEs by study ID	Available to all	Available to all
3	95% CI for all CATEs	Study specific, Study indicator	Available to all
4	Best Linear Projection	Study specific, Study indicator	Causal forest
5	Interpretation tree	Available to all	Available to all

2. Application

1) Load packages

To get started, install and load the multicate package using either install.packages(multicate) or pak::pak("dobengjhu/multicate"). If you wouldl like to know what kinds of data would be needed, for the exploration purpose, you can refer to our fake dataset dummy_tbl. This fake dataset includes three previously conducted trials comparing the same two treatments. It contains:

An outcome variable called response
A treatment indicator tx
Five signal covariates (var to var5) that may act as effect modifiers

You can view a summary of the dataset with summary(dummy_tbl).

In this document, we used the packages as follows.

# 1) Install multicate package
#install.packages("multicate")
#pak::pak("dobengjhu/multicate")
library(multicate)

#2) Additional packages for data cleansing
library(tibble)
#install.packages("tableone")
library(tableone)
library(tidyverse)
#install.packages("fastDummies")  # if not already installed
library(fastDummies)
#install.packages("kableExtra")
library(kableExtra)
#install.packages("performance")
library(performance)

2) RHC dataset

In this example, we use data from a study of Right Heart Catheterization (RHC), a diagnostic procedure used to measure cardiac function in critically ill patients. While RHC can guide urgent and ongoing treatment decisions, it also carries a risk of serious complications. The observational dataset originates from Murphy and Cluff (1990), was later reanalyzed by Connors et al. (1996), and has since become a benchmark dataset in causal inference research.

Data⁷
Dictionary⁸

The dataset (rhc.xls) includes information on 5,735 hospitalized adult patients across five U.S. medical centers. The treatment variable (column 2) indicates whether RHC was administered within 24 hours of admission (TRUE for treatment; FALSE for control). The outcome variable of this study is originally dth30 (column 54), records 30-day post-admission mortality (TRUE for death; FALSE for survival), but in this example, we set the outcome Glasgow Coma Score (scoma1 variable) as our continuous coutcome.

Covariate information (column 3 to 53): age, sex, race (black, white, other), years of education,income, type of medical insurance (private, Medicare, Medicaid, private and Medicare, Medicare and Medicaid, or none), primary disease category, secondary disease category, 10 categories of admission diagnosis, activities of daily living (ADL) and Duke Activity Status Index (DASI) 2 weeks before admission, do-not-resuscitate status on day 1 of admission, cancer (none, localized, metastatic), an estimate of the probability of surviving 2 months, acute physiology component of the APACHE III score, Glasgow Coma Score, weight, temperature, mean blood pressure, respiratory rate, heart rate, PaO2=FIO2 ratio, PaCO2, pH, WBC count, hematocrit, sodium, potassium, creatinine, bilirubin, albumin, urine output, 12 categories of comorbidity illness, and whether the patient transferred from another hospital. Since the number of covariates is too large for demonstration purposes, we will use a smaller subset in the analysis. In practice, however, you can use datasets with many variables.

To demonstrate the functionality of the multicate package in the context of multiple studies, we simulate study membership by arbitrarily dividing the dataset into five groups, treating them as if they came from five distinct medical centers. Although the original dataset does not indicate which patient belongs to which hospital, this simulated structure allows us to explore treatment effect heterogeneity across centers and individual-level covariates.

#Load data
rhc <- read.csv(file="rhc.xls", header=TRUE)
rhc <- rhc %>% mutate(across(where(is.logical), as.numeric))
rhc <- rhc %>% subset(select = c(-ID))
rhc$center <- rep(1:5, each=nrow(rhc)/5)
rhc$center <- as.character(rhc$center)

#Choose covariates to be used for this study
rhc <- rhc %>% select(treatment, center, scoma1,  
                      race, ninsclas,
                      hrt1, resp1, bili1, pafi1, aps1, temp1, das2d3pc, sod1, liverhx)

center	treatment	scoma1	race	ninsclas	hrt1	resp1	bili1	pafi1	aps1	temp1	das2d3pc	sod1
1	0	0	white	Medicare	124	10	1.0097656	68.0000	46	38.69531	23.50000	145
1	1	0	white	Private & Medicare	137	38	0.6999512	218.3125	50	38.89844	14.75195	137
1	1	0	white	Private	130	40	1.0097656	275.5000	82	36.39844	18.13672	146
1	0	0	white	Private & Medicare	58	26	0.3999634	156.6562	48	35.79688	22.92969	117
1	1	41	white	Medicare	125	27	1.0097656	478.0000	72	34.79688	21.05078	126
1	0	0	white	Medicare	134	36	1.0097656	184.1875	38	39.19531	17.50000	138
1	0	26	white	Private	135	10	1.0097656	148.7500	29	39.39844	15.40625	136
1	0	100	white	Private	102	34	0.8999023	240.0000	25	39.19531	29.14453	136
1	0	0	white	Private	118	30	1.0998535	333.3125	47	38.39844	26.28906	136
1	1	0	white	Medicaid	141	40	0.5000000	68.0000	48	38.50000	23.25781	146

3) Data preparation

As described above, the multicate package builds on other modeling packages, which require the input data to be fully numeric and free of missing (NA) values. To meet these requirements, we removed the missing values in the dataset accordingly.

After removing missing values (NA) and simplifying categorical variable levels, we convert all factor and character covariates variables into numeric or dummy variables using fastDummies package. The janitor package’s function, janitor::clean_names() is helpful for standardizing column names by replacing spaces and special characters (e.g., converting condition_no symptoms to condition_no_symptoms).

This is the table to show how dummy variables are created via fast_dummies.

race_white	ninsclas_medicare	ninsclas_private	ninsclas_private_medicare
1	1	0	0
1	0	0	1
1	0	1	0
1	0	0	1
1	1	0	0

This table is the final table about the data we will use for analysis.

Dictionary of variables used in analysis
Category	Variables	Data.type	Description
Treatment	treatment	num	[treatment] RHC was administered within 24 hours of admission (Yes = 1, No = 0)
Studies	(fake) center	factor	[studies] Arbitarily created variable referring to the hospitals/centers that RHC was administered at
Outcome	scoma1	int	[outcome] Glasgow Coma Score. It is used to assess a neurological status of a patient, with higher indicating a better prognosis.
Covariates	hrt1	int	Heart rate
	resp1	num	Respiratroy rate
	bili1	num	Bilirubin in critical illness. High levels are a marker of severity.
	pafi1	num	PaO2/FIO2 ratio (P/F ratio) is a measure of oxygenation in the blood used to assess respiratory failure and the severity of conditions like Acute Respiratory Distress Syndrome (ARDS).
	aps1	int	Acute physiology component of the APACHE score is a value from 0 to 60 that measures a critically ill patients physiological state based on 12 clinical variables collected within the first 24 hours of ICU admission.
	temp1	num	Temperature
	das2d3pc	num	DASI (Duke Activity Status Index) is a 12-question, self-administered questionnaire used to measure a patient functional capacity, especially those with cardiovascular disease.
	sod1	int	Sodium
	liverhx	num	Cirrhosis, Hepatic Failure
	race_other	int	Race Other
	race_white	int	Race White
	ninsclas_medicare	int	Type of medical insurance - Medicare
	ninsclas_medicare_medicaid	int	Type of medical insurance - Medicare and Medicaid
	ninsclas_no_insurance	int	Type of medical insurance - No insurance
	ninsclas_private	int	Type of medical insurance - Private
	ninsclas_private_medicare	int	Type of medical insurance - Private and Medicare

4) Choose aggregation and estimation method

The multicate package provides tools to combine multiple RCTs to estimate CATE. It supports three options for handling multiple studies: (1) pooling trials using a study indicator ("studyindicator"), (2) aggregating datasets via an ensemble forest approach ("ensembleforest"), and (3) estimating effects separately within each study without pooling ("studyspecific"). Technically, the last option is not ‘aggregation of multiple studies’, but it is included for settings where trials are too heterogeneous to justify combining information.

For CATE estimation, the package implements two methods: (1) the S-learner, which fits a single model for the outcome, treating the treatment as one of covariates ("slearner"), and (2) the causal forest, which recursively partitions covariates to best split based on treatment effect heterogeneity. Further details can be found in Brantner et al.(2024)⁹ or in the ‘Background’ section of the website¹⁰

4-1) Studyindicator

The example below is the sample with causal forest for the estimation method, and study indicator for the aggregation method. Study indicator method pools all data together but keep study ID as an indicator variable and includes them as covariates in the single-study method selected in estiamtion method.

set.seed(100)
cate_mod2 <- estimate_cate(rhc_clean,
                          estimation_method = "causalforest",
                          aggregation_method = "studyindicator",
                          study_col = "center",
                          treatment_col = "treatment",
                          outcome_col = "scoma1",
                          covariate_col = NULL,
                          drop_col = c("race","ninsclas"))
summary_cate_mod2 <- summary(cate_mod2)

#result
#cate_mod2

#result summary
summary_cate_mod2
#> 
#> Estimation Method: causalforest
#> Aggregation Method: studyindicator
#> 
#> Variable Importance:
#>             hrt1     resp1     bili1      pafi1      aps1      temp1   das2d3pc
#> Value 0.08616683 0.2156197 0.1305946 0.07425747 0.2340573 0.08996657 0.06862788
#>             sod1    liverhx   race_other  race_white ninsclas_medicare
#> Value 0.04009024 0.02242125 0.0003493235 0.002782086       0.002467816
#>       ninsclas_medicare_medicaid ninsclas_no_insurance ninsclas_private
#> Value                  0.0011903           0.004070067      0.005487103
#>       ninsclas_private_medicare   center_1    center_2   center_3    center_4
#> Value               0.005270401 0.00320616 0.003143155 0.00296119 0.002510454
#>          center_5
#> Value 0.004759996
#> 
#> Overall ATE:
#>   Estimate Std. Error 
#>    -3.4392     0.7547 
#> 
#> Study-Specific ATE:
#>   Minimum CATE Median CATE Maximum CATE
#> 1       -11.55      -2.735        2.789
#> 2       -11.89      -2.738        2.523
#> 3       -12.25      -2.842        2.693
#> 4       -12.39      -2.959        2.738
#> 5       -11.21      -2.799        2.960

4-2) Comparison with Study-specific

Study specific method fits the estimation method separately for each study and report out study-specific estiamtes.

set.seed(100)
cate_mod <- estimate_cate(rhc_clean,
                          estimation_method = "causalforest",
                          aggregation_method = "studyspecific",
                          study_col = "center",
                          treatment_col = "treatment",
                          outcome_col = "scoma1",
                          covariate_col = NULL,
                          drop_col = c("race","ninsclas"))
summary_cate_mod <- summary(cate_mod)

#result
summary_cate_mod
#> 
#> Estimation Method: causalforest
#> Aggregation Method: studyspecific
#> 
#> Variable Importance:
#>                                       1            2            3           4
#> hrt1                       8.815784e-02 0.1578055817 0.1274604448 0.095335491
#> resp1                      3.590424e-01 0.0822063168 0.1319580095 0.105403118
#> bili1                      5.913912e-02 0.1064171989 0.0436751343 0.172664202
#> pafi1                      8.506052e-02 0.1229695073 0.1898693456 0.149199309
#> aps1                       1.004332e-01 0.1530942033 0.1619463391 0.128054155
#> temp1                      9.885726e-02 0.1599027173 0.1048710218 0.152842805
#> das2d3pc                   8.559751e-02 0.0671526922 0.1203708546 0.061798845
#> sod1                       6.227833e-02 0.0827432365 0.0943740709 0.091291067
#> liverhx                    1.825143e-02 0.0012113027 0.0033732315 0.018979862
#> race_other                 1.350108e-03 0.0009375085 0.0001988937 0.001009043
#> race_white                 3.908517e-03 0.0081592309 0.0027869287 0.002858875
#> ninsclas_medicare          3.787342e-03 0.0027478012 0.0055582432 0.002197582
#> ninsclas_medicare_medicaid 8.774907e-05 0.0027714856 0.0002542968 0.001523082
#> ninsclas_no_insurance      5.122805e-05 0.0083558126 0.0046291151 0.010114266
#> ninsclas_private           2.587322e-02 0.0247227159 0.0050555514 0.005159331
#> ninsclas_private_medicare  8.124280e-03 0.0188026885 0.0036185189 0.001568968
#>                                       5
#> hrt1                       0.1572957473
#> resp1                      0.1122038054
#> bili1                      0.1253152619
#> pafi1                      0.0861340315
#> aps1                       0.2649563450
#> temp1                      0.0881977771
#> das2d3pc                   0.0664309899
#> sod1                       0.0661750486
#> liverhx                    0.0006311238
#> race_other                 0.0017621565
#> race_white                 0.0032868101
#> ninsclas_medicare          0.0048008674
#> ninsclas_medicare_medicaid 0.0012967847
#> ninsclas_no_insurance      0.0086710562
#> ninsclas_private           0.0073342039
#> ninsclas_private_medicare  0.0055079907
#> 
#> Overall ATE:
#>   Estimate Std. Error 
#>     -3.417         NA 
#> 
#> Study-Specific ATE:
#>   Minimum CATE Median CATE Maximum CATE
#> 1      -22.339      -2.734       5.4008
#> 2       -6.629      -1.218       4.5192
#> 3      -14.692      -5.160       0.7346
#> 4      -10.975      -4.400       3.6490
#> 5      -11.593      -2.068       8.8241

Study indicator aggregation method does not provide standard error.

#study indicator
summary_cate_mod$ate
#>   Estimate Std. Error 
#>   -3.41695         NA

Study specific aggregation method does provide standard error information.

#study specific
summary_cate_mod2$ate
#>   Estimate Std. Error 
#> -3.4391637  0.7546843

5) Visualization of estimation

The plot() funciton with estimate_cate() objects provides five visualizations to help interpret the estiamted CATEs:

Histogram of the CATEs

This plot displays the distribution of the estimated conditional average treatment effects (CATEs) across all individuals. The histogram helps assess the overall shape, spread, and variability of treatment effect heterogeneity. A wide distribution indicates meaningful variation in estimated effects, while a narrow distribution suggests limited heterogeneity.

Boxplot of CATEs stratified by study membership

When multiple trials are analyzed together, this plot compares the distribution of CATE estimates across studies. Each box represents a study and summarizes its median, interquartile range, and overall spread. Differences in the distributions highlight how treatment effect heterogeneity may vary across trials.

Confidence interval plot shoing 95% confidence intervals for all CATEs, sorted by their estiamted CATEs

This plot shows individual-level CATE estimates along with their 95% confidence intervals, ordered from smallest to largest estimated effect. It provides a detailed view of uncertainty around each estimate and allows users to visually assess the range of plausible treatment effects for each observation.

Best linear projection (available only when estimation_method = "causalforest")

This plot summarizes how each covariate is linearly associated with the estimated CATE using the Best Linear Projection (BLP) from generalized random forests. Each point represents a covariate’s estimated contribution to treatment effect heterogeneity, with error bars showing 95% confidence intervals. Panels correspond to different studies, enabling comparison of effect modification patterns across trials.

For example, in the fourth plot below, study 1 shows a BLP coefficient for hrt1 (heart rate) close to zero, whereas study 2 shows a coefficient of approximately 6. This indicates that in study 2, individuals with higher heart rates tend to have larger treatment effect heterogeneity, while heart rate appears to have little relationship with treatment effect heterogeneity in study 1.

Interpretation tree for visualizing how covariates drive treatment effect heterogeneity.

This tree provides a simplified decision-tree how the model partitions the covariate space to explain treatment effect heterogeneity. It can offer an interpretable summary of key covariates and splits that drive differences in estimated treatment effects, helping users visualize major sources of heterogeneity.

In the example below, resp1(respiratory rate) and aps1(acute physiology component of the APACHE score, meaning a critical illness level within the first 24 hours of ICU admission) appear as the primary splitting variables, indicating they are key modifiers of the treatment effect.

#plot(cate_mod)

Additionally, the plot_vteffect() provides a more focused visualization if you are interested in exploring treatment effect heterogeneity with respect to a specific continuous covariate. It plots the estimated CATE (tau_hat) for each observation against the selected covariate, allowing you to see how treatment effects vary across its range and across different studies.

For example, if you’re interested in how treatment effects differ by aps1 across studies, you can specify this variable using the covariate_name = "aps1" parameter.

plot_vteffect(cate_mod, covariate_name = "aps1")

Brantner, C. L., Nguyen, T. Q., Tang, T., Zhao, C., Hong, H., & Stuart, E. A. (2024). Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in medicine, 43(7), 1291-1314. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9955 ↩︎
https://grf-labs.github.io/grf/REFERENCE.html#missing-values-1 ↩︎
https://grf-labs.github.io/grf/articles/categorical_inputs.html ↩︎
Hill, J. L. (2011). Bayesian Nonparametric Modeling for Causal Inference. Journal of Computational and Graphical Statistics, 20(1), 217–240. https://doi.org/10.1198/jcgs.2010.08162 ↩︎
Künzel, S. R., Sekhon, J. S., Bickel, P. J., & Yu, B. (2019). Metalearners for estimating heterogeneous treatment effects using machine learning. Proceedings of the national academy of sciences, 116(10), 4156-4165. https://www.pnas.org/doi/abs/10.1073/pnas.1804597116 ↩︎
Athey, S., Tibshirani, J., & Wager, S. (2019). Generalized random forests. https://arxiv.org/abs/1610.01271 ↩︎
https://hbiostat.org/data ↩︎
https://hbiostat.org/data/repo/rhc ↩︎
Brantner, C. L., Nguyen, T. Q., Tang, T., Zhao, C., Hong, H., & Stuart, E. A. (2024). Comparison of methods that combine multiple randomized trials to estimate heterogeneous treatment effects. Statistics in medicine, 43(7), 1291-1314. https://onlinelibrary.wiley.com/doi/abs/10.1002/sim.9955 ↩︎
https://www.elizabethstuart.org/hte/↩︎

Applying multicate

Kyungeun Jeon, Carly Brantner, Daniel Obeng, Elizabeth Stuart

2025-11-27