Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor 
Graduation Model
Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Envigado: Fondo Editorial EIA, 2021.
© 2021 Fondo Editorial EIA
© 2021 Universidad EIA
Envigado, junio de 2021
Dirección editorial: Mauricio Andrés Misas Ruiz
Diagramación: Marcela Londoño Gómez 
Fondo Editorial EIA
Sede Las Palmas: Km. 2 + 200 vía al Aeropuerto José María Córdova. Envigado, 
Colombia.
Código Postal: 055428
Tel.: (57 + 4) 3549090 - opc. 1. Ext. 223 - 314 / Correo electrónico: editorial@eia.edu.co
http://www.eia.edu.co/fondoeditorial
Se permite la reproducción de este documento bajo la debida citación de sus 
autores.
Este documento es responsabilidad de sus autores y no compromete el 
pensamiento de la Universidad EIA ni de ninguno de sus órganos de gobierno.
Grow the pie or have it? 
Using machine learning for impact 
heterogeneity in the Ultra-poor 
Graduation Model*
   Reajul Chowdhury1* 
Federico Ceballos-Sierra2 
Munshi Sulaiman3
Abstract
Anti-poverty interventions often face a trade-off between immediate reduction in poverty, 
measured by consumption, and building assets for longer-term gains. An “Ultra-poor Gra-
duation” model, found effective on both dimensions in several rigorous studies, generally 
leans towards asset building. By using data from a large-scale RCT in Bangladesh, we find sig-
nificant variation in impact on assets where the top quintile gainers experience asset growth 
of 344% while asset growth is only 192% for the bottom quintile. Heterogeneity in impact on 
household expenditures is found to be present but of lower magnitude than that of assets. Im-
portantly, the machine learning techniques we apply reveal contrasts in characteristics of be-
neficiaries who made the most in assets vs. consumption. The results identify beneficiary cha-
racteristics that can be used in targeting households either to maximize impact on the desired 
dimension and/or to customize interventions for balancing the asset and consumption trade-off. 
 
Keywords: Ultra-poor, Impact heterogeneity, Machine Learning, Bangladesh. 
JEL Classification: O12, I39
1   mac7@illinois.edu. Ph.D. Student, Department of Agricultural and Consumer Economics. Universi-
ty of Illinois at Urbana-Champaign. Urbana, Illinois, USA
2  federico.ceballos@eia.edu.co. Ph.D. Student, Department of Agricultural and Consumer Economics 
University of Illinois at Urbana-Champaign Urbana, Illinois, USA. Assistant Professor, Department of 
Economic and Managerial Sciences. Universidad EIA. Envigado, Antioquia
3  munshi.sulaiman@brac.net. Regional Research Lead BRAC International. Kampala, Uganda.
* Corresponding author 
* We are grateful to Dr. Benjamin Crost, Dr. Alex Winter-Nelson, and Dr. Imran Matin for their guid-
ance and feedback on the analysis. Special thanks to Dr. Stefan Wager for prompt responses on use 
of the GRF package and interpreting the coefficients. We also acknowledge BRAC and LSE research 
team who worked on the project that our data comes from. All errors are ours. Corresponding author 
Reajul Chowdhury mac7@illinois.edu
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
I.  Introduction
Achieving the global ambition of ending poverty in all its forms everywhere by 2030, as pos-
tulated in the first sustainable development goal (SDG), will require scaling up successful 
poverty alleviating programs that lift people out of poverty and sustain their impacts over 
the long-term. However, there is an inherent trade-off in anti-poverty programs between 
supporting immediate reduction in poverty (generally measured by consumption or expen-
diture) and encouraging asset accumulation for relatively longer-term change in poverty. 
This trade-off is indirectly discussed in poverty trap literature where the empirical evidence 
is mixed, but asset-based poverty dynamic generally found to be more salient than consump-
tion or income-based measures (e.g. Ikegami et al. 2016; Carter and Barrett 2006; Quisumb-
ing and Baulch 2013). In a more recent paper, Balboni et al (2021) find evidence of the pov-
erty traps by looking at the impact of asset transfer, where being above or below a threshold 
results in asset accumulation or depletion. In terms of impact analysis of different types of 
interventions, a meta-analysis of cost-effectiveness of alternative livelihood support inter-
ventions by Sulaiman (2018) also reveals this trade-off whereby unconditional cash trans-
fers are more attractive in the short-run whereas more comprehensive interventions fare 
better in the long run. Therefore, being able to identify characteristics of households who 
are likely to have lower impacts on assets can improve the efficiency of programs that are 
focused on livelihoods development by giving more emphasis on their asset building through 
varying transfer amounts and/or technical supports. 
In this paper, we look at this trade-off among the beneficiaries of an ultra-poor grad-
uation model, which is considered as an effective approach for addressing poverty in the 
short run, and the impacts are also sustained over several years post interventions. Al-
though the average effects of this model are generally positive for both asset accumula-
tion and consumption, there still exists a trade-off between the two domains. Using ma-
chine learning tools, we investigate whether there are systematic differences between the 
participants of a graduation program in Bangladesh who gain more in either household 
expenditure or asset accumulation and vice versa. Characterizing households by their re-
sponses in this way can help the policymakers and the implementing agencies in design-
ing more targeted interventions to fit the needs of different subgroups of the extreme poor. 
Several studies have shown robust evidence of the graduation model being successful 
in reducing extreme poverty in a wide range of contexts (Bandiera et al. 2017; Banerjee et al. 
2015).1  The intervention model is composed of a sequence of supports including a grant of 
productive assets, hands-on coaching for 12-24 months, life-skills training, short-term con-
sumption support, and access to financial services. The goal of this model is to develop mi-
cro-enterprise from the transferred assets while all the other components are related to pro-
1. Bandiera et al (2017) evaluate the model in Bangladesh and Banerjee et al (2015) evaluate the 
same approach in six countries.
4
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
tecting their enterprise and/or increasing productivity.  Developed by BRAC, this model has 
shown significant impacts on household asset accumulation, consumption, labor supply, in-
come, and food security status in Bangladesh (Bandiera et al. 2017; Banerjee et al. 2015). More 
importantly, these impacts sustain well beyond the 2-year intervention period. A six-country 
replication of the model has also shown similar positive results. Long-term follow-up studies 
show the impacts not only persist but also grow over 7 to 10 years in West Bengal (Banerjee 
et al. 2016; Duflo 2020) and up to 14 years in Bangladesh (Balboni et al. 2020). Evaluations of 
variations of the graduation model also produce similar positive results (Blattman et al. 2016; 
Gobin et al 2016; Sedlmayr et al 2020). Currently, the model has been adopted by various NGOs 
and in government social protection schemes in 75 countries by 2020 (Andrews et al. 2021).
Because of its multifaceted nature, the graduation model generally costs substantially 
more than alternative poverty alleviation approaches (Sulaiman 2018). One of the avenues 
of improving cost-effectiveness is to better customize interventions to the needs of specific 
sub-groups within the target population. Existing studies find a large degree of heterogene-
ity even within the narrow group of the poorest households (e.g. Bandiera et al. 2017; Baner-
jee et al. 2015). All previous studies use the conventional econometric approaches of evalu-
ating heterogeneity and are restricted to considering only a few predetermined covariates. 
These analytical approaches, therefore, likely to leave potential sources of heterogeneity 
unexplored. Understanding the source of heterogeneity in the effects of the graduation mod-
el has implications for implementing agencies in their targeting approaches as well as ef-
forts to customize intervention packages to fit the needs of different sub-groups of the poor. 
The typical approach of analyzing heterogeneous effects involves fitting a lin-
ear model which includes interactions between treatment and the covariates, essential-
ly measuring the treatment effects for subgroups. However, this econometric approach is 
challenged in terms of efficiency and robustness as deciding on a few variables to create 
the subgroups involves the risk of overfitting the estimates (i.e. selecting only those vari-
ables on which we see heterogeneity) and throwing away the rich set of baseline infor-
mation available (Chernozhukov et al. 2020). Executing such a model grows increasingly 
problematic as the number of covariates increases; including all the potential interaction 
terms becomes infeasible unless the sample size is sufficiently large relative to the num-
ber of covariates and their interaction terms (Foster, Taylor, and Ruberg 2011; Green and 
Kern 2012; Imai and Ratkovic 2013; Schiltz et al. 2018). To overcome these weaknesses, 
several recent studies proposed using techniques from machine learning (ML) to better 
understand heterogeneous effects (Athey and Imbens 2017; Chernozhukov et al. 2020). 
This new and growing literature has proposed several parametric, semi-parametric, and 
non-parametric approaches that utilize a larger array of covariates, are computationally 
feasible, and avoid the risk of overfitting. Consequently, the use of ML in randomized con-
trol trials (RCT) to make inferences on heterogeneous treatment effects is receiving in-
creasing attention (Chernozhukov et al. 2020; Foster et al. 2011; Imai and Ratkovic 2013).
5
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
In this paper, we apply two ML approaches to investigate the heterogeneity in the 
effects of the graduation model from an RCT in Bangladesh. We use three rounds of panel 
data from 5,491 ultra-poor households, who were randomized into a treatment and con-
trol groups. We begin with estimating the Conditional Average Treatment Effect (CATE) 
of the graduation program using the Honest Causal Forest (HCF) algorithm proposed by 
Susan Athey and Wager (2019). We favor the HCF method for two reasons:  first, by con-
struction, it allows us to flexibly model complex interactions and discontinuous relation-
ships between independent variables, and second, it allows for valid hypothesis testing 
and the estimation of standard errors and confidence intervals. Next, we use the ML ap-
proach proposed by Chernozhukov et al. (2020), which involves estimating proxy predic-
tors of CATE and then developing valid inference on key features of the CATE. While the 
causal forest method focuses only on tree-based random forest tools to produce consistent 
estimates of CATE and explore heterogeneity, the approach proposed by Chernozhukov et 
al. (2020) is more general and can be applied to any ML methods to predict and make in-
ference on heterogeneous effects. Since both approaches resolve the fundamental problem 
of nonparametric inference of ML methods and propose strategies that produce uniform-
ly valid inference, we use them as complementary to each other. Subsequently, we conduct 
classification analysis to identify baseline characteristics that are associated with the im-
pacts to understand the trade-off and policy implications on customizing interventions. 
For measuring heterogeneous impacts, we focus on two outcomes – household wealth 
and expenditure. Our results detect a large degree of heterogeneity in treatment effects 
on both assets and expenditures. However, the difference between the highest and lowest 
gainers in household wealth is much larger than the corresponding difference in expendi-
ture. Looking at heterogeneity by baseline characteristics, the results indicate a trade-off 
between the gains in wealth vs consumption. We find 15 common variables that are im-
portant determinants of impact heterogeneity in both assets and expenditure, and the di-
rection of their relationships with the impact size for almost all the indicators is reversed 
across the two outcomes. In terms of specific indicators, the age of participants is found 
to be an important factor whereby top gainers of wealth are more likely to include older 
participants whereas those who show a high impact on consumption are more likely to be 
younger beneficiaries. This trend of older beneficiaries accumulating wealth while young-
er beneficiaries having higher consumption gain is contrary to a common understand-
ing of the graduation model being less effective for older people. In another dimension of 
heterogeneity, we find that women with greater involvement in household decision-mak-
ing at baseline are more likely to be in the high impact groups when it comes to expendi-
ture but the opposite is the case for wealth impact. Besides these participant character-
istics, other factors showing significant heterogeneity in impacts in opposing directions 
of assets and consumption are households’ baseline level of savings, assets, and expendi-
ture, community-level variables of distance to market, and paved roads.  We also find in-
6
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
equality in livestock ownership within the communities having significant impact het-
erogeneity whereby households with high expenditure gains are more likely to reside in 
communities with high asset inequality, but no significant difference in impact on assets. 
Our results of reversed impact heterogeneity for expenditure and wealth by 
most baseline characteristics demonstrate the trade-off between achieving impacts 
on immediate poverty reduction by increasing consumption and more long-term im-
pacts through asset accumulation. The rest of the paper is organized as follows. Sec-
tion II gives a brief overview of the Graduation Model and the evidence of its impact. 
Section III describes the data used in this paper. Section IV description of the two ML 
approaches that we apply in this paper for exploring the heterogeneity in treatment 
effects. Study results are discussed in Section V and the conclusion in Section VI. 
II. Graduation model and evidence of its impact
Over the last few decades, development organizations learned that bringing people out of 
ultra-poverty requires simultaneously addressing multiple constraints that they face in 
moving towards a sustainable livelihood. Building on this insight, BRAC, an NGO originating 
in Bangladesh, pioneered a program called Targeting the Ultra-poor (TUP) to build secure, 
sustainable, and resilient livelihoods for the ultra-poor (Matin et al. 2008; Morel and Chowd-
hury 2015). The approach in the TUP program, now better known as the “graduation model”, 
is to combine multifaceted support services addressing both the immediate needs of the 
ultra-poor by giving them consumption supports, and their long-term need for a sustain-
able livelihood by providing them a grant of productive assets with technical skills training. 
This is complemented by a time-bound (typically 18-24 months) intensive coaching, access 
to finance, and health supports to both improve their productivity and prevent the need for 
distress sales. BRAC started implementing the program in 2002 in Bangladesh. 
Several non-experimental studies (e.g. Ahmed, Sulaiman, and Das 2009; Matin and 
Hulme 2003; Mallick 2013) found the program very effective in increasing household con-
sumption, asset holdings, and self-employment among the ultra-poor. The holistic treatment 
of poverty in the graduation approach drew the attention of the donor community and other 
stakeholders in low-income countries. The model has been replicated and adapted by at least 
219 programs in 75 countries by NGOs, governments, and donor organizations (Banerjee et 
al. 2015; Andrews et al. 2021). Taking advantage of the large-scale replication of the model 
in low-income countries, several high-quality randomized trial studies have been conducted 
to assess the impact of the model (Bandiera et al. 2013, 2017; Banerjee et al. 2015). In Ban-
gladesh, Bandiera et al. (2013) found that after four years of the program inception, the ben-
eficiary households expanded their self-employment activities, increased labor supply, ac-
cumulated more productive assets, which led to increased household income and per capita 
consumption. A follow-up survey on the same households seven years after the program be-
7
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
gan, found that the long-term effect of the program is at least as large as the four-years effect 
(Bandiera et al. 2017). Banerjee et al. (2015) documented the findings from 6 randomized 
studies assessing the impact of the graduation model implemented in 6 countries. The study 
found the effects of the program on income, household asset accumulation, food security, and 
consumption similar to the Bangladesh study albeit with some variations across the 6 sites.
While the effects of the graduation model have been found to be positive and dura-
ble in a wide range of geographical and cultural contexts, existing studies report a high de-
gree of heterogeneity in the effects. For instance, Bandiera et al. (2017) showed that the 
effects on consumption, savings, and productive assets accumulation at 95th percentiles 
were at least 10 times larger than the effect at the 5th percentile of the distribution. Simi-
lar significant variation in treatment effects on household income, consumption, food secu-
rity, and financial inclusion was also reported in Emran, Robano, and Smith (2014); Raza, 
Das, and Misha (2012); and Banerjee et al. (2015). The large degree of heterogeneity in 
treatment effects implies that even with the narrow group of the ultra-poor, there could 
be subgroups who are not benefitting as much. However, the quantile regression approach 
in these studies has the limitations of only measuring impact across different percentile on 
a continuum of an outcome indicator and does not confirm whether these are associated 
with any baseline characteristics. This paper aims to identify and define these subgroups. 
III. Data
We use the data from a cluster-randomized trial by Bandiera et al (2017) that assessed the 
impact of the graduation model implemented in Bangladesh by BRAC. Starting in 2007, the 
study randomly assigned 40 BRAC branch offices serving 1,309 villages to the treatment or 
control group. We use the data from three rounds of surveys - baseline in 2007 followed by 
midline in 2009 and endline in 2011.2 The baseline survey was preceded by a participatory 
wealth ranking exercise in both treatment and control villages, which classified households 
into four groups: ultra-poor, near-poor, middle-class, and upper-class. Although the impact 
evaluation paper looked at spillover on these groups, we focus only on the ultra-poor group 
as they are the targeted beneficiaries in the graduation model and received support. Our fi-
nal sample size comprises 5,315 households of whom 3,082 from treated branches and 2,233 
from control branches. Our analysis explores heterogeneity in treatment effects on two out-
comes: the value of per-capita wealth, and per-capita household expenditure. We use the log 
of both outcomes. Household wealth has been calculated summing the monetary value of 
land, business assets, non-business assets, and savings. The household expenditure outcome 
includes household expenses on food (both purchased and produced), fuel, cosmetics, enter-
tainment, transportation, utilities, clothing, footwear, utensils, textiles, dowries, education, 
2. Bandiera et al (2017) also used a fourth round of survey conducted 2014. However, we do not use 
this since some of the households from control group were also treated after the endline.
8
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
charity, and legal expenses. While our analysis is intended to capture possible trade-offs be-
tween wealth and consumption, both outcomes are among the key performance indicators 
for the graduation model. We complement our main analysis of these two outcomes with 
the analysis of two additional outcomes – household savings, and self-employment income. 
The treatment variable is a dummy indicating if a household resides in villages un-
der a treated BRAC branch office. The covariates for heterogeneity include baseline infor-
mation on respondents’ characteristics, demographic and socio-economic characteristics 
at the household level, and several cluster-level characteristics. Initially, we started with 
103 covariates and after filtering out variables with near-zero variance, multicollineari-
ty, and a high number of missing values, we are left with 50 covariates. The list of these 
covariates along with some descriptive statistics are presented in Annex (Table A1). 
III. ML Method
Our empirical strategy combines two machine learning approaches; the honest causal forest 
algorithm proposed by Wager and Susan Athey (2018), and an agnostic approach proposed 
by Chernozhukov et al. (2020). The honest casual forest method builds on the causal tree al-
gorithm proposed by  Susan Athey and Imbens (2016), which partitions the data into a set of 
subgroups such that treatment effect heterogeneity across subgroups is maximized (Athey 
and Imbens 2016). In estimating the treatment effect, the causal tree algorithm follows an 
“honest” approach, whereby one sample is used to construct the partition (i.e. building the 
tree) and another to estimate treatment effects for the subgroups. More specifically, the caus-
al forest algorithm starts by drawing a random subsample of training data and then splitting 
the training data into two halves I and J. The algorithm then grows a tree by using the J-sam-
ple data to partition the data space, while holding out the I-sample data for within-leaf esti-
mation. When choosing a split, the algorithm seeks to maximize the difference in treatment 
effect [τ(X)] between the two child leaves. The treatment effect is estimated simply by taking 
the difference between the outcomes of the treated and control observations within a leaf: 
1 1
τ" X = | 𝑖𝑖:W = 1,X Є	𝐿𝐿 |/ Y* 				− 				 𝑖𝑖 : W = 0,X Є	𝐿𝐿 / Y* 																														 (1)	* * *:1234,52Є	6 * * *:1239,52Є	6
In equation 1, W is the treatment indicator taking value 1 for treated obser-
vations, X is the covariate space, Y is the outcome variable, and L is the leaf with-
in a tree. Wager and Susan Athey (2018) showed that the honest approach of tree 
building produces consistent estimates by eliminating bias in the CATE and en-
ables centered confidence intervals that allow for valid statistical inference. 
While the causal forests approach uses only one specific ML tool (i.e. tree-based al-
gorithm) and relies on an honest approach to produce consistent estimates of CATE and ex-
plore heterogeneity, Chernozhukov et al. (2020) proposed a different approach that allows 
9
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
applying generic ML methods to estimate causal effects and draw a statistical inference. 
The empirical strategy of this approach starts by building a proxy predictor of CATE using 
generic ML methods, and then develop valid inference on some key features of the CATE 
based on this proxy predictor. Instead of obtaining consistent estimation and uniformly val-
id inference on the CATE itself, this approach focuses on providing valid estimation and in-
ference on certain features of CATE. Referring to it as an agnostic approach, Chernozhukov 
et al. (2020) argued that by focusing on key features of CATE rather than CATE itself, this 
approach avoids making strong assumptions about the properties of ML estimators and still 
obtain uniformly valid inference on some features of the estimators. Particularly, this ap-
proach targets to develop valid inference on three features namely – Best Linear Predictor 
(BLP), Sorted Group Average Treatment Effects (GATES), and Classification Analysis (CLAN).
The algorithm of this agnostic approach involves repeatedly splitting the data into 
two samples, namely the main sample (DataM) and the auxiliary sample (DataA), and for 
each split training ML methods and predict the outcome variable on the treated and un-
treated observations separately using the DataA. Applying the trained ML models on DataA, 
the algorithm then estimates two potential outcomes [Y(0), Y(1)], and obtain treatment 
effect estimates, S(X), and baseline effect estimates, B(X), for each observation in DataM. 
The baseline effect and the treatment effect are estimated using the following equations:
B(X)	=	E[Y|W=0,	X]																				(2)
S(X)	=	E[Y|W=1,	X]	–	E[Y|W=0,	X]														(3)
The algorithm then involves testing for heterogeneity in the treatment effects using the 
following weighted ordinary least squared (OLS) or the Best Linear Predictor (BLP) model:
Yi =	α	Xi	+	α1 B(Xi)	+β1 (Wi	– p(Xi))	+	β2 [Wi – p(Xi)]	[Si	(Xi)	– ES)	+	𝜀𝜀,	
with	weights	ω X = 	 > 	 (4)
{@(A)	[>B	@(A)]}
where ES	=	% 	ƩS(Xi),	and	p(X)=	% Ʃp(Wi	=1	|X) or		1	789:79;  for a randomized trial study.  & 1 1
 
In our case W	=	$%&' =	0.58 . 
($)(
β1 in equation 4 indicates the average treatment effect. β2, the main coeffi-
cient of our interest, indicates the degree to which the estimated treatment effect, 
Si(X), serves as a proxy for the true treatment effect or CATE. Rejecting the null hy-
pothesis β2 = 0 means that there is heterogeneity and Si(X) is a relevant predictor. 
The second feature, Group Average Treatment Effect (GATE), involves dividing 
the main sample into non-overlapping groups G1 to GK based on the predicted treat-
ment effect Si(X). If we decide to have k=5, then the resulting group G1 will be the 20% 
10
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
of the data with the lowest treatment effect estimates and G5 will be the group with the 
highest treatment effect estimates. The GATE parameters are estimated as follows: 
E[	Si(X)	|	Gk ],		 k	=	1,	..…	K (5)
Where Gk is non-overlapping intervals dividing the Si(X) into K groups. 
Finally, the third feature, Classification Analysis (CLAN), helps to characterize the 
most and least affected groups by identifying the baseline covariates on which the groups 
differ from each other. Assuming g(Y, X) is a vector of characteristics of an observational 
unit, the average characteristics of the most and least affected groups can be denoted by the 
following parameters: 
ϒ1 =	E	[g	(Y,	X)	|	G1]	 	and		ϒk =	E[g	(Y,	X)	|	Gk]												(6)
Our main results are the treatment effects estimation from the causal forest meth-
od, which we use to construct the group average treatment effect (GATE) and the classifi-
cation analysis (CLAN). We also complement our analysis of causal forest results with the 
results from other generic ML methods. We use the causal forest as our preferred method 
of estimation for two reasons. First, the treatment effect estimates from the causal forest 
are unbiased and allow for valid statistical inference. The ‘honest’ approach used in the 
causal forest model addresses the fundamental problem of causal inference and allows for 
a direct estimation of causal effect while eliminating bias from the estimates. Second, the 
causal forest follows a data-driven approach in identifying the most important variables 
from a large set of predictors used in growing the forest. We use this subset of predictors to 
perform the classification analysis to characterize the most and least affected groups and 
avoid the clumsiness of using the large set of baseline predictors for classification analysis. 
We fit our models to reflect the heterogeneity in treatment effect at the household 
level rather than at the cluster level (e.g. branch or village level). Since the treatment was 
randomly assigned at the branch office level, we assume that the branch level effect or vil-
lage level effect on TUP beneficiaries is normally distributed. Therefore, we train the caus-
al forest model without clustering by branch or villages. However, we include subdistricts 
fixed effects, which was used for stratifying the treatment-control assignment. We also 
include some spot-level Gini coefficients as covariates in our model to see if the distribu-
tion of wealth, income, productive assets, and household assets in the immediate neighbor-
hood of the beneficiaries influence treatment effect and heterogeneity in treatment effect. 
We grow the causal forest following the generalized random forest (GRF) framework 
proposed by Susan Athey, Tibshirani, and Wager (2019). We first orthogonalize the treatment 
and the outcome variables by fitting a regression forest to estimate the expected outcome 
marginalizing over treatment (see Susan Athey, Tibshirani, and Wager 2019 for detail). Using 
the estimates from this regression forest, the GRF then makes out-of-bag predictions to be 
11
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
used as inputs in our causal forests. Following Basu et al. (2018) and Susan Athey and Wager 
(2019), we also train a pilot causal forest on all covariates. Then, we train our final causal forest 
using only those covariates that were found important in growing the pilot forest. In select-
ing the important covariates, we use the ‘variable_importance’ function of the GRF package 
which assigns a score for each covariate by taking a simple weighted sum of how many times 
the covariate was chosen by the algorithm in building trees. We select those covariates whose 
‘variable_importance’ score is above average. This approach improves the precision of our 
estimation as it enables the forest to make more splits on the most important features (Athey 
and Wager 2019). These important features identified from the pilot forest have also been used 
for our classification analysis (CLAN) to characterize the most and least affected households. 
As a complement to causal forest estimations, we used four generic machine learn-
ing (ML) methods namely Elastic Net, Boosting Tree, Neural Network, and Random For-
est method in estimating CATE. Similar to how we trained the Causal Forest model, we 
also controlled for subdistrict level fixed effects in training these generic ML methods. 
V. Results and Discussion
Following the ML approaches described above, our discussion focuses on two sets of 
results - degree of heterogeneity (HET) in treatment effect and classification analy-
sis (CLAN) for the heterogeneity in treatment effects on per-capita household wealth 
and per-capita household expenditures. Among the four generic ML methods, we rely 
more on the results from the random forest and elastic net methods since these two 
methods outperformed the other two (boosting tree and neural network) in terms 
of their ability to detect greater heterogeneity in the treatment effect estimates.3 
 
a. Average vs. Heterogeneous Treatment Effects 
Table 1 presents coefficients of the average treatment effect (ATE) and the degree of 
heterogeneity (HET) for the two outcomes. The ATE estimates for both outcomes are 
large, positive, and significant across the causal forest model and the two generic ML 
methods. The causal forest estimates of ATE coefficients for the log value of per-cap-
ita wealth and household expenditures are 2.54 and 0.14 respectively indicating that 
the per-capita wealth and expenditures increased by 254% and 14% among the ul-
tra-poor women in the treated areas relative to the control areas, both significant at 
1% level. Reassuringly, the estimated ATE coefficients from all four ML methods close-
ly match the OLS estimates, both in terms of magnitude and statistical significance. 
3. Following Chernozhukov et a. (2020), we choose the best ML methods that maximize the criterion 
function Λ = |β2|2 Var(S(Xi))
12
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Table 1.   Average Treatment Effects (ATE) and Heterogeneous Treatment.
Method	 		 Per	Capita	Wealth	(log)	 Per	Capita	Expenditure	(log)	
Causal	Forest	 ATE	 2.54	 0.14	
[2.13							2.95]	 [0.07							0.21]	
(0.000)	 (0.000)	
HET	 1.30	 0.96	
[1.00							2.00]	 [-1.00							3.00]	
(0.000)	 (0.123)	
Random	Forest	 ATE	 2.45	 0.14	
[2.33							2.56]	 [0.12							0.17]	
(0.000)	 (0.000)	
HET	 0.91	 0.23	
[0.77,	1.05]	 [0.01							0.43]	
(0.000)	 (0.073)	
Elastic	Net	 ATE	 2.44	 0.14	
[2.33							2.57]	 [0.12							0.17]	
(0.000)	 (0.000)	
HET	 1.311	 0.20	
[1.10							1.51]	 [-0.03							0.48]	
(0.000)	 (0.184)	
Neural	Network	 ATE	 2.44	 0.14	
[2.32							2.56]	 [0.12							0.16]	
(0.000)	 (0.000)	
HET	 0.30	 0.08	
[0.24							0.37]	 [-0.004							0.18]	
(0.000)	 (0.127)	
Boosting	Tree	 ATE	 2.46	 0.14	
[2.34							2.58]	 [0.12							0.16]	
(0.000)	 (0.000)	
HET	 0.40	 0.10	
[0.32							0.47]	 [0.004							0.19]	
(0.000)	 (0.083)	
OLS	 ATE	 2.51	 0.13	
[2.24							2.79]	 [0.10							0.16]	
(0.000)	 (0.000)	
 	
Note: 95% confidence intervals in brackets, and p-value in parenthesis. In estimating the ATE for the causal for-
est, we used the built-in function in the GRF package. For the generic ML methods, we applied the BLP test that 
estimates both ATE and HET using equation 4. The covariates used in the OLS models include respondent’s age, 
respondent’s education years, household head’s gender, log per-capita expenditure, log per capita food consump-
tion, log per capita wealth, wage income, income from self-employment activities, Gini score for livestock, and 
distance to the nearest market.
13
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
Table 1 also reports the treatment effect estimates from ordinary least square (OLS) 
regression models fitted on each outcome with the treatment variable and several covariates 
at respondent, household, and neighborhood levels. The OLS estimates of the average treat-
ment effects for the two outcomes are 251%and 13%, both significant at 1% level, which 
consistently matches with the estimates from machine learning methods. These results ba-
sically reproduce the conclusions drawn by Bandiera et al (2017) from this data. 
Turning to the heterogeneity in treatment effect, the coefficients for HET are 1.30 for 
per-capita wealth and 0.96 for per-capita expenditure when we use the causal forest meth-
od. The non-zero coefficients indicate that the causal forest estimates of the treatment ef-
fects are important relevant predictors.4 Based on the p-value of the HET coefficient for the 
wealth outcome, we reject the hypothesis of no heterogeneity at the 1% level, suggesting 
that there is a significant heterogeneous effect of the TUP intervention on this outcome. The 
heterogeneity coefficients from the generic ML methods also confirm the same high level of 
heterogeneity in the treatment effect in per-capita wealth. However, the p-value associated 
with the HET coefficient of the expenditure outcome failed to reject the hypothesis of no 
heterogeneity at 10% level of significance by causal forest estimate (p value=0.123). Simi-
larly, elastic net and neural network methods also report an insignificant level of heteroge-
neity and are consistent with the causal forest estimation, while random forest and boosting 
trees show a weak degree of treatment effect heterogeneity (at 10% level). In other words, 
graduation interventions produced highly diverse levels of impact in asset accumulation by 
beneficiary households while the impact variations for consumption are less pronounced. 
b. Group Average Treatment Effects (GATES):
While the HET coefficients are useful in understanding the existence of heterogeneity, they 
do not reveal the magnitudes of differences. To check the magnitude, we use group average 
treatment effects (GAES) that involve dividing the observations into different subgroups 
according to their effect sizes. Specifically, we divide the sample into 5 groups based on the 
quintiles of the estimated "τ X  from our causal forest model and the generic ML proxy pre-
dictor Si(x) and estimate the average effect for each group. Next, we compare the GATES be-
tween the top and the bottom quantiles, alternatively called the most and the least affected 
groups. 
 As shown in Table 2, the causal forest estimates show that the differences of av-
erage treatment effects between the most and least affected groups are significantly 
4. See Chernozhukov et al. 2020 for more technical details on BLP estimates. However, the coeffi-
cient value greater than 1 implies that the random forest predictions are over-shrunk, and the CATE 
estimates from the forest under-estimate the true treatment heterogeneity. For example, suppose the 
random forest gives us a CATE estimate "τ X 	≈ τ 𝑋𝑋2 . Then calibration would give us a coefficient of 
roughly 2. (We are grateful to Stefan Wager, Assistant Professor of Statistics in Stanford University, for 
this explanation).
14
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
different from zero at 1% level for both wealth and expenditure outcomes. The average 
treatment effect among the most affected households on log per-capita wealth is 3.44, 
which is 79% higher than the average treatment effect among the least affected house-
holds. Likewise, the average treatment effect of the most affected groups on log per-cap-
ita expenditure is 63% higher compared to that in the least-affected households. For ge-
neric ML methods, we present the results from the random forest and elastic net only 
as the estimations from these two methods have been found more efficient in detect-
ing heterogeneity in our data. The GATES estimates from the random forest and elas-
tic net report weak differences between the two groups in the expenditure outcome.
In Figure 1, we present the box-plot distribution of GATES scores and confidence 
bands for the five quantile groups using the causal forest estimates. The figure also shows 
ATE and associated confidence intervals obtained from the casual forest. This is apparent 
from the box-plot distribution that the treatment effects are positive on both outcomes for 
all sub-group of households, and the graduation model did not adversely affect any bene-
ficiary households.5 The figure also reveals that the top and bottom 20th quantile groups 
(group 1, and 5) are less symmetric than the ones in the middle, with positive skew on the 
top quantile group and negative skew in the bottom quantile group. We looked into whether 
the differences in treatment effects between the mid-quantile groups (e.g. group 3rd vs 1st, 
and 4th vs 2nd) are statistically significant or not (see Table A2 in the appendix). The test 
shows significant differences between the mid-quantile groups on both outcomes.  
5. Although this appears contradictory to the findings of asset depletion for those below threshold 
Balboni et al (2021), but the key distinction is in timeline. Their long-term follow-up look at asset dy-
namics after the endline and asset depletion does not imply non-positive long-term impact.
15
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
Table 2.   Group Average Treatment Affect (Top 20% vs Bottom 20%).
		 Per	Capita	Wealth	 Per	Capita	Expenditure		 (log)	 (log)	
Causal	Forest	 ATE	of	Most	20%	 3.44	 0.18	
[3.32											3.55]	 [0.17										0.18]	
ATE	of	Least	20%	 1.92	 0.11	
[1.81										2.04]	 [0.11										0.12]	
Diff.	(Most	vs	Least)	 1.52	 0.06	
[1.35										1.68]	 [0.05										0.07]	
(0.000)	 (0.000)	
Random	Forest	 ATE	of	Most	20%	 3.64	 0.18	
[3.38											3.90]	 [0.13										0.23]	
ATE	of	Least	20%	 1.67	 0.10	
[1.39											1.94]	 [0.05										0.15]	
Diff.	(Most	vs	Least)	 1.96	 0.07	
[1.58											2.34]	 [0.00										0.15]	
(0.000)	 (0.092)	
Elastic	Net	 ATE	of	Most	20%	 3.64	 0.17	
	 [3.82										3.90]	 [0.12										0.22]	
ATE	of	Least	20%	 1.67	 0.12	
	 [1.40											1.94]	 [0.07										0.17]	
Diff.	(Most	vs	Least)	 1.98	 0.05	
	 [1.60											2.35]	 [-0.02										0.12]	
	 (0.000)	 (0.353)	
	
Note: 95% confidence intervals in brackets, and p-value in parenthesis. The group average treatment effects have 
been estimated using equation 5 specified in the Method section. 
Overall, our heterogeneity analyses show consistent results for assets and less so 
for expenditure. Heterogeneity test using the best linear predictor (BLP) model applied 
on the estimates of the causal forest, and generic ML methods found detectable hetero-
geneity in per-capita wealth outcome. We did not find strong heterogeneity in treatment 
effects on per-capita total expenditure from the causal forest and the majority of the ge-
neric ML methods. The group average treatment effect analysis (GATES) from the causal 
forest and some of the generic ML methods produced similar results in both outcomes. 
16
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Figure 1.   Box-plot of causal forest estimates of GATES by quantiles.
c. Classification analysis: 
While the analysis presented so far reveals significant heterogeneity in wealth and rela-
tively less consistent heterogeneity in consumption, the HET coefficients do not tell us who 
are the most and least affected households. To understand the differences in characteris-
tics of the most and least affected groups, we look at the average characteristics of the two 
groups by Classification Analysis (CLAN) developed by Chernozhukov, Fernández-Val, and 
Luo (2018). For this, we use the estimations from the causal forest method since it offers a 
data-driven approach in determining which variables are most important in estimating the 
treatment effect in the sample. Following the selection criteria for the most important co-
variates (as explained in the method section), we get a list of a total of 15 variables. Figure 2 
below shows the list of these important variables and their corresponding importance score. 
These variables fall into three groups: individual or primary beneficiary level character-
istics, household-level characteristics, and neighborhood or community-level characteris-
tics. The most important baseline characteristic is their wealth level followed by participant 
woman’s voice in household decision making. These two indicators have much larger impor-
tance scores compared to all others. The important factor from household demography is 
the number of members aged above 10 at baseline. Three community-level variables that are 
in the middle of this important variable list are – distance to paved roads and market and as-
set inequality. The age of the participant and household head, which are often considered im-
portant dimensions of heterogeneity in conventional analysis, are ranked lower in this list. 
17
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
While this list shows us the important factors behind impact heterogeneity, the next 
step is to understand their direction of relationship with impact estimates. In Figure 3, we 
visualize the classification analysis by plotting the differences between the most and least 
affected groups on these fifteen baseline variables.6 The figure shows whether the difference 
between the two groups is positive or negative, where a positive difference means the aver-
age of the most affected group is higher than that of the least affected group, and vice versa. 
Figure 2.  Covariates used most often in building trees.
The primary beneficiaries (respondents) among the most benefitted households 
in per-capita wealth outcome were relatively older, were more dependent on wage in-
come (mostly from agricultural labor works), had less involvement in self-employment 
activities, and had lower participation in household decisions making at baseline. More 
specifically, they were 19 years older, and 21 percent more likely to work as agricul-
ture labor than the respondents from the least affected households on this outcome. 
 
6. see Table A3 in Appendix for detail estimates. 
18
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Log of their income from daily wage activities was 8.09 greater, and from self-employment 
activities was 2.69 lower compared to the respondents from the least affected households. 
Their score on a matrix measuring their participation in household decision-making was 
5.36 points less; their empowerment score was 1.91 compared to 7.27 of the other group. 
Looking at the household level characteristic, the top gainers in wealth have house-
hold heads who are on average 14 years older than the heads in the least affected group. 
These households also have fewer members above 10 years old (by 1.34 members), who were 
less likely (by 1.28 percent) to migrate out of villages for work. This group of top gainers had 
higher per-capita household expenditures, and lower savings at the baseline. Compared to 
their counterparts in the least-gainers group, these households’ log value of per-capita ex-
penditures was higher by 0.18 (though not statistically significant), and the log value of sav-
ings was lower by 3.21. Finally, the log value of livestock of the top gainers was lower by 4.73. 
Regarding the community-level characteristics, the households that benefited the 
most in wealth accumulation are more likely to live in communities far away from paved 
road and markets; their communities are 0.11 km farther away from the nearest markets 
(not significant at conventional level), and 0.33 km farther away from paved roads, com-
pared to the communities of the least affected households. The Gini coefficient, measuring 
the distribution of livestock value in neighborhoods, shows that the most affected house-
holds for the wealth outcome live in communities with lower inequality (by 0.01 points). 
19
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
Figure 3.  Classification analysis of most and least affected groups.
When it comes to gains made in per-capita household expenditure, we find that the 
characteristics of the most affected households are generally in opposite direction (for 14 out 
of the 15 variables) of what we found in the wealth accumulation outcome. The primary re-
spondents of the most affected households in this outcome are younger by 7 years compared 
to those in the least affected households. They also differ from the top gainers in the wealth 
outcome in terms of having less wage income (by 11.34 points), more self-employment income 
(by 5.71 points), and higher participation in household decision making (by 2.44 points). 
20
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Turning to the household level variables, most affected households for this outcome 
are headed by relatively younger  (by 4 years) people, had more members older than 10 
years, and had members who were more likely (by 56%) to migrate out. These households 
had lower per-capita expenditures at the baseline: their baseline expenditures were lower 
by 0.3 points than that among the least affected households. They also had higher savings 
and livestock assets. Their savings and livestock assets value, respectively, was higher by 
0.33 points (not significant at conventional level), and by 1.37 points (p <0.01). At the com-
munity level characteristics, unlike the differences between most and least gainers in the 
wealth outcome, households that increased their expenditure the most were living in com-
munities closer to markets and paved road, by 1 kilometer and 0.49 kilometers respectively. 
Finally, unlike the top gainers in wealth outcome, the most affected households in expendi-
tures outcome live in communities with greater inequality in the distribution of livestock 
assets (by 0.09 points). 
Finally, top gainers of both asset and consumption had less per-capita wealth at the 
baseline compared to the least affected group. This variable was scored as the most import-
ant (Figure 2) and the only variable that has the same direction of relationship with impacts 
on both asset and consumption. This result highlights the possibility of maximizing impact 
by targeting explicitly asset ownership. 
One might wonder whether the top gainers on these two outcomes also differ from 
each other on other welfare dimensions. It may be that households that increased their con-
sumptions by most had been able to generate more income or accumulate more savings than 
those who gained most in assets, and vice versa. To explore this possibility, we investigated 
the treatment effect heterogeneity on the log of household savings and self-employment in-
come (i.e., income from livestock and small businesses) and conducted the CLAN for these 
two outcomes using the same variables we used for the CLAN of assets and consumption. 
Figure A1 in Appendix shows the CLAN for household savings and self-employment income. 
The top gainers in the savings outcome closely resemble the top gainers in assets accumula-
tion in almost all of the baseline characteristics, implying that households that gained most 
in assets also accumulated savings. On the other hand, the CLAN for the self-employment 
income failed to show significant differences between the top and bottom quantiles in half 
of the baseline characteristics (7 out of 15). On the remaining eight characteristics, the top 
gainers in income do not distinctively match with the top gainers in either assets or con-
sumption outcomes. Our results also showed weak heterogeneity in the treatment effects 
for the self-employment income: the estimates of the HET coefficient from the causal forest, 
random forest, and elastic net are non-significant at the conventional levels (see Table A4 in 
Appendix). Consequently, we conclude that most of the beneficiary households experienced 
a uniform increase in their income from self-employment activities while a group of them 
focused on assets accumulation and the others on smoothing their consumptions.  
21
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
VI. Conclusion
Our results from the causal forest, as well as generic ML methods, report positive and signif-
icant effects of the graduation model on wealth accumulation and household consumption. 
These findings are consistent with the existing studies that also report a strong positive 
effect of the graduation model on asset accumulation, and household expenditure out-
comes. We find significant heterogeneity in impact on assets while the results for hetero-
geneity in impact on consumption are less robust. Our classification analysis that seeks to 
characterize the households which benefited most from the graduation model shows that 
there is a trade-off between accumulating assets and increasing household expenditures. 
Characteristics that are associated with higher gain in asset accumulation show the 
opposite direction of association with consumption gain. Households that benefited most 
in asset accumulation were relatively poorer at the baseline compared to the most affected 
households on the expenditure outcome. The most affected households for the asset outcome 
had primary beneficiaries who were older, were more dependent on wage income and had less 
self-employment income at the baseline. In contrast, the most affected households for the ex-
penditure outcomes had younger beneficiaries with higher income from self-employment ac-
tivities and less income from daily wage activities at the baseline. In terms of community-level 
characteristics, proximity to roads and markets helps in consumption gain over asset accumu-
lation. The lower level of baseline wealth (combining all productive and non-productive assets 
and savings) is the only variable that shows a higher impact on both asset and consumption. 
Besides demonstrating a trade-off in the impact between asset and consumption, 
these results identify potential ways the graduation program can improve long-term ef-
fectiveness in the context of Bangladesh. Firstly, keeping overall asset ownership of the 
household as a stricter targeting criterion can help in improving the impact on both out-
comes. Secondly, the coaching component can be customized to mitigate the asset-con-
sumption trade-off, e.g. by targeting more intensive support for younger beneficiaries with 
more decision-making power and closer to markets, to improve on asset accumulation. 
References 
Ahmed, Akhter U., Mehnaz Rabbani, Munshi Sulaiman, and Narayan C. Das. 2009. The Impact of Asset 
Transfer on Livelihoods of the Ultra Poor in Bangladesh. BRAC Research Monograph Series (29). Vol. 
7.
Andrews, Colin, Aude de Montesquiou, Inés Arévalo Sánchez, Puja Vasudeva Dutta, Boban Varghese 
Paul, Sadna Samaranayake, Janet Heisey, Timothy Clay and Sarang Chaudhary. 2021. "The State of 
Economic Inclusion Report 2021: The Potential to Scale". World Bank Publications.
Athey, Susan, and Guido W. Imbens. 2017. “The Econometrics of Randomized Experiments.” Handbook 
of economic field experiments 1:73–140.
22
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Athey, Susan and Guido Imbens. 2016. “Recursive Partitioning for Heterogeneous Causal Effects.” Proce-
edings of the National Academy of Sciences of the United States of America 113(27):7353–60.
Athey, Susan, Julie Tibshirani, and Stefan Wager. 2019. “Generalized Random Forests.” Annals of Statis-
tics 47(2):1179–1203.
Athey, Susan and Stefan Wager. 2019. “Estimating Treatment Effects with Causal Forests: An Applica-
tion.” ArXiv 1–15.
Balboni, Clare, Oriana Bandiera, Robin Burgess, Maitreesh Ghatak, and Anton Heil. 2020. “Why Do Peo-
ple Stay Poor?” Working Paper.
Bandiera, Oriana, Robin Burgess, Narayan Das, Selim Gulesci, Imran Rasul, and Munshi Sulaiman. 2017. 
"Labor markets and poverty in village economies." The Quarterly Journal of Economics 132, no. 2: 
811-870.
Bandiera, Oriana, Robin Burgess, Narayan Das, Selim Gulesci, Imran Rasul, Munshi Sulaiman, , Fran-
cisco Buera, Bronwen Burgess, Anne Case, Arun Chandrasekhar, Angus Deaton, Greg Fischer, Guy 
Michaels, Ted Miguel, Mush Mobarak, Benjamin Olken, Steve Pischke, and Mark Rosenzweig. 2013. 
"Can basic entrepreneurship transform the economic lives of the poor?". Working Paper.
Banerjee, Abhijit, Esther Duflo, Raghabendra Chattopadhyay, and Jeremy Shapiro. 2016. “The Long 
Term Impacts of a ‘Graduation’ Program: Evidence from West Bengal.” Working Paper Massachuse-
tts Institute of Technology (September):1–25.
Banerjee, Abhijit, Esther Duflo, Nathanael Goldberg, Dean Karlan, Robert Osei, William Parienté, Jeremy 
Shapiro, Bram Thuysbaert, and Christopher Udry. 2015. “A Multifaceted Program Causes Lasting 
Progress for the Very Poor: Evidence from Six Countries.” Science 348(6236).
Basu, Sumanta, Karl Kumbier, James B. Brown, and Bin Yu. 2018. “Iterative Random Forests to Discover 
Predictive and Stable High-Order Interactions.” Proceedings of the National Academy of Sciences of 
the United States of America 115(8):1943–48.
Blattman, Christopher, Eric P. Green, Julian Jamison, M. Christian Lehmann, and Jeannie Annan. 2016. 
“The Returns to Microenterprise Support among the Ultrapoor: A Field Experiment in Postwar 
Uganda.” American Economic Journal: Applied Economics 8(2):35–64.
Carter, Michael R. and Christopher B. Barrett. 2006. “The Economics of Poverty Traps and Persistent 
Poverty: An Asset-Based Approach.” Journal of Development Studies 42(2):178–99.
Chernozhukov, Victor, Mert Demirer, Esther Duflo, and Ivan Fernandez-Val. 2020. "Generic Machine 
Learning Inference On Heterogenous Treatment Effects In Randomized Experiments ".  arXi-
v:1712.04802v5
Chernozhukov, Victor, Iván Fernández-Val, and Ye Luo. 2018. “The Sorted Effects Method: Discovering 
Heterogeneous Effects Beyond Their Averages.” Econometrica 86(6):1911–38.
Duflo, Esther. 2020. “Long-Term Effects of the Targeting the Ultra Poor Program.” NBER Working Paper 
Series 17.
Emran, M. Shahe, Virginia Robano, and Stephen C. Smith. 2014. “Assessing the Frontiers of Ultrapo-
verty Reduction: Evidence from Challenging the Frontiers of Poverty Reduction/Targeting the 
Ultra-Poor, an Innovative Program in Bangladesh.” Economic Development and Cultural Change 
62(2):339–80.
Foster, Jared C., Jeremy M. G. Taylor, and Stephen J. Ruberg. 2011. “Subgroup Identification from Rando-
mized Clinical Trial Data.” Statistics in Medicine 30(24):2867–80.
23
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
Gobin, Vilas J., Paulo Santos, and Russell Toth. 2016. "Poverty graduation with cash transfers: a rando-
mized evaluation." Department of Economics Discussion Paper 23: 16.
Green, Donald P. and Holger L. Kern. 2012. “Modeling Heterogeneous Treatment Effects in Survey Expe-
riments with Bayesian Additive Regression Trees.” Public Opinion Quarterly 76(3):491–511.
Ikegami, Munenobu, Michael R. Carter, Christopher B. Barrett, and Sarah Janzen. 2016. “Poverty Traps 
and the Social Protecion Paradox.” The Economics of Poverty Traps (December):223–56.
Imai, Kosuke and Marc Ratkovic. 2013. “Estimating Treatment Effect Heterogeneity in Randomized 
Program Evaluation.” Annals of Applied Statistics 7(1):443–70.
Lee, Myoung Jae. 2009. “Non-Parametric Tests for Distributional Treatment Effect for Randomly Censo-
red Responses.” Journal of the Royal Statistical Society. Series B: Statistical Methodology 71(1):243–
64.
Lin, Yi and Yongho Jeon. 2006. “Random Forests and Adaptive Nearest Neighbors.” Journal of the Ameri-
can Statistical Association 101(474):578–90.
Mallick, Debdulal. 2013. “How Effective Is a Big Push to the Small? Evidence from a Quasi-Experiment.” 
World Development 41(1):168–82.
Matin, Imran and David Hulme. 2003. “Programs for the Poorest: Learning from the IGVGD Program in 
Bangladesh.” World Development 31(3):647–65.
Matin, Imran, Munshi Sulaiman, and Evaluation Division. 2008. Working Paper Crafting a Graduation 
Pathway for the Ultra Poor : Lessons and Evidence from a BRAC Programme.
Morel, Ricardo, and Reajul Chowdhury. 2015. "Reaching the Ultra-Poor: Adapting Targeting Strategy in 
the Context of South Sudan." Journal of International Development 27, no. 7: 987-1011.
Quisumbing, Agnes R. and Bob Baulch. 2013. “Assets and Poverty Traps in Rural Bangladesh.” Journal of 
Development Studies 49(7):898–916.
Raza, Wameq A., Narayan C. Das, and Farzana A. Misha. 2012. “Can Ultra-Poverty Be Sustainably Impro-
ved? Evidence from BRAC in Bangladesh.” Journal of Development Effectiveness 4(2):257–76.
Crump, Richard K., V. Joseph Hotz, Guido W. Imbens, and Oscar A. Mitnik. 2008. "Nonparametric tests 
for treatment effect heterogeneity." The Review of Economics and Statistics 90, no. 3:389-405.
Schiltz, Fritz, Chiara Masci, Tommaso Agasisti, and Daniel Horn. 2018. “Using Regression Tree Ensem-
bles to Model Interaction Effects: A Graphical Approach.” Applied Economics 50(58):6341–54.
Sedlmayr, Richard, Anuj Shah, and Munshi Sulaiman. 2020. “Cash-plus: Poverty Impacts of Alternative 
Transfer-Based Approaches.” Journal of Development Economics 144(November 2019):102418.
Sulaiman, Munshi. 2018. “Graduation Approaches : How Do They Compare in Terms Of.” Pp. 102–20 in 
In Boosting growth to end hunger by 2025: The role of social protection, edited by F. S. W. Taffesse 
and A. Seyoum. Washington, DC: International Food Policy Research Institute (IFPRI).
Thomas, Marius, Björn Bornkamp, and Heidi Seibold. 2018. “Subgroup Identification in Dose-Finding 
Trials via Model-Based Recursive Partitioning.” Statistics in Medicine 37(10):1608–24.
Wager, Stefan and Susan Athey. 2018. “Estimation and Inference of Heterogeneous Treatment Effects 
Using Random Forests.” Journal of the American Statistical Association 113(523):1228–42.
Willke, Richard J., Zhiyuan Zheng, Prasun Subedi, Rikard Althin, and C. Daniel Mullins. 2012. “From Con-
cepts, Theory, and Evidence of Heterogeneity of Treatment Effects to Methodological Approaches: 
A Primer.” BMC Medical Research Methodology 12.
24
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Appendix A
Figure A1.  Classification analysis for most and least affected groups for savings and self-em-
ployment income .
25
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
Table A1.  Descriptive Statistics of the Baseline Covariates
Baseline	Characteristics	 N	 Mean	 SD	 P25	 P50	 P75		
Primary	Beneficiary	Level	Characteristics		 	 	 	 	 	 	
Beneficiary’s	Age	 5315	 39.61	 13.36	 28.00	 38.00	 50.00	
Beneficiary	never	married	 5315	 0.01	 0.11	 0.00	 0.00	 0.00	
Beneficiary	divorced	 5315	 0.02	 0.12	 0.00	 0.00	 0.00	
Beneficiary	married	 5315	 0.61	 0.49	 0.00	 1.00	 1.00	
Beneficiary	widow	 5315	 0.29	 0.46	 0.00	 0.00	 1.00	
Beneficiary	NGO	participation	none	 5315	 0.87	 0.34	 1.00	 1.00	 1.00	
Beneficiary	used	to	participate	in	NGO	 5315	 0.02	 0.15	 0.00	 0.00	 0.00	
Beneficiary’s	education	years	 5315	 0.56	 1.61	 0.00	 0.00	 0.00	
Any	future	plan	for	self-employment	 5315	 0.46	 0.50	 0.00	 0.00	 1.00	
Empowerment	Score	(decision	making)		 5315	 5.43	 3.20	 3.75	 6.00	 7.75	
Empowerment	Score	(mobility)	 5315	 5.22	 3.70	 0.00	 8.00	 8.00	
Any	past	business	activity	 5315	 0.13	 0.34	 0.00	 0.00	 0.00	
Log	wage	income		 5315	 0.74	 8.85	 -9.21	 7.50	 8.85	
Had	a	small	business	 5315	 0.00	 0.02	 0.00	 0.00	 0.00	
Log	income	from	small	business	 5315	 2.37	 104.41	 0.00	 0.00	 0.00	
Worked	as	agricultural	day	labor	 5315	 0.27	 0.44	 0.00	 0.00	 1.00	
Log	self-employment	income		 5315	 -2.65	 8.00	 -9.21	 -9.21	 6.21	
Undernourished	 5315	 0.56	 0.5	 0	 1	 1	
Household	(HH)	Level	Characteristic	 	 	 	 	 	 	
HH	head	is	a	male	 5315	 0.61	 0.49	 0.00	 1.00	 1.00	
Wealth	ranked	as	bottom	 5315	 0.91	 0.29	 1.00	 1.00	 1.00	
Wealth	ranked	medium	 5315	 0.09	 0.29	 0.00	 0.00	 0.00	
Number	of	HH	members	below	10	years	 5315	 0.89	 1.01	 0.00	 1.00	 2.00	
Number	of	HH	members	above	10	years	 5315	 2.48	 1.19	 2.00	 2.00	 3.00	
HH	Head’s	Age	 5315	 44.99	 13.77	 35.00	 45.00	 55.00	
Fraction	Muslim	 5315	 0.83	 0.38	 1.00	 1.00	 1.00	
Fraction	Hindu	 5315	 0.16	 0.37	 0.00	 0.00	 0.00	
Any	HH	member	participating	in	NGO	 5315	 0.08	 0.27	 0.00	 0.00	 0.00	
HH	members	never	participated	in	NGO	 5315	 0.96	 0.21	 1.00	 1.00	 1.00	
HH	receive	any	govt.	benefits	 5315	 0.19	 0.40	 0.00	 0.00	 0.00	
Any	HH	members	migrated	out	for	work	 5315	 0.24	 0.43	 0.00	 0.00	 0.00	
HH	Head	migrated	out	for	work	 5315	 0.17	 0.38	 0.00	 0.00	 0.00	
Total	HH	members	migrated	out	for	work	 5315	 0.27	 0.51	 0.00	 0.00	 0.00	
HH	members	likely	to	migrate	for	work		 5315	 2.33	 1.08	 2.00	 2.00	 3.00	
HH	head’s	education	 5315	 0.63	 1.78	 0.00	 0.00	 0.00	
Number	of	first-degree	family	members	
living	in	same	neighborhood	 5315	 8.69	 3.69	 6.00	 9.00	 11.00	
Number	of	first-degree	family	members	
living	in	same	village	 5315	 2.71	 2.06	 1.00	 2.00	 4.00	
Per-capita	annual	expenditure	(log)	 5315	 9.29	 0.36	 9.06	 9.27	 9.49	
Log	per-capita	food	expenditure	 5315	 9.00	 0.38	 8.78	 8.99	 9.22	
Log	per-capita	non-food	annual	
expenditure	 5315	 7.78	 0.50	 7.48	 7.77	 8.07	
Log	per-capita	wealth	 5315	 5.12	 3.73	 4.83	 5.70	 6.61	
Had	any	livestock	 5315	 0.26	 0.44	 0.00	 0.00	 1.00	
Cultivable	land	size	(in	decimals)	 5315	 3.11	 17.99	 0.00	 0.00	 0.00	
Log	livestock	value	 5315	 3.04	 3.27	 0.00	 0.00	 5.71	
Log	per-capita	education	expenditures	 5315	 -4.81	 6.50	 -9.21	 -9.21	 3.76	
Log	total	savings		 5315	 -6.6	 6.44	 -11.51	 -11.5	 1	
Community	Level	Characteristics		 	 	 	 	 	 	
Gini	landside	 5315	 0.79	 0.07	 0.74	 0.79	 0.83	
26
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Table A2.  GATES by Different Quantiles (Using Causal Forest Estimation).
		 Per	Capita	Wealth	(log)	 Per	Capita	Expenditure	(log)	
2.41	 0.15	
ATE	of	3rd	Quantile	Group	 [2.27										2.54]	 [0.14										0.15]	
st 1.85	 0.12	ATE	of	1 	Quantile	Group	 [1.85											1.86]	 [0.11										0.12]	
0.482	 0.034	
Diff.	(3rd	vs	1st)	 [0.30										0.67]	 [0.02										0.04]	
(0.000)	 (0.000)	
ATE	of	4th	Quantile	Group	 2.77	 0.16	[2.641										2.89]	 [0.15										0.17]	
ATE	of	2nd	Quantile	Group	 2.15	 0.14	[2.02										2.27]	 [0.13										0.14]	
0.62	 0.02	
Diff.	(4th	vs	2nd)	 [0.44										0.79]	 [0.01										0.03]	
(0.000)	 (0.000)	
	
27
Grow the pie or have it? 
Using machine learning for impact heterogeneity in the Ultra-poor Graduation Model
Table A3.  Classification Analysis.
Covariates	 Stats	 Per	Capita	Wealth	[log]	 Per	Capita	Expenditure	[log]	Top	20%	 Bottom	20%	 Difference	 Top	20%	 Bottom	20%	 Difference	
Primary	Beneficiary	Level	Characteristics	 		 		 		 		 		 		
Mean	 50.21	 31.69	 18.52***	 37.39	 44.49	 -7.1***	
Respondent's	Age	
95%	CI	 [49.43						51.00]	 [31.15				32.24]	 [17.57					19.47]	 [36.59			38.20]	 [43.71				45.28]	 [-8.22				-5.98]	
Mean	 0.372	 0.158	 0.214***	 0.101	 0.524	 -0.423***	
Was	agri.	day	laborer	
95%	CI	 [0.342						0.40]	 [0.136						0.18]	 [0.177						0.25]	 0.083						0.119	 0.494						0.554	 -0.458						-0.388	
Mean	 4.407	 -3.684	 8.091***	 -4.173	 7.166	 -11.339***	
Resp.'s	Wage	Income	
95%	CI	 [3.94						4.87]	 [-4.174						-3.193]	 [7.416						8.765]	 [-4.648						-3.699]	 [6.841						7.491]	 [-11.914						-10.764]	
Resp.'s	Self-employment	 Mean	 -3.684	 -0.999	 -2.685***	 0.295	 -5.412	 5.707***	
Income(log)	 95%	CI	 [-4.17						-3.20]	 [-1.489						-0.51]	 [-3.371						-1.999]	 [-0.202						0.792]	 [-5.817						-5.008]	 [5.066						6.347]	
Mean	 1.909	 7.265	 -5.355***	 6.163	 3.723	 2.44***	
Participation	in	decision	making	
95%	CI	 [1.73						2.09]	 [7.155						7.375]	 [-5.564						-5.147]	 [6.001						6.325]	 [3.521						3.925]	 [2.181						2.698]	
Household	Level	Characteristics	 		 		 		 		 		 		
Mean	 52.304	 38.576	 13.728	 43.941	 47.582	 -3.642	
HH	Head's	Age	 95%	CI	 [51.53						53.08]	 37.948						39.203	 12.733						14.723	 43.092						44.79	 46.792						48.372	 -4.801						-2.483	
Num.	of	HH	members	older	than	 Mean	 1.548	 2.937	 -1.389***	 2.653	 2.011	 0.642***	
10	years	 95%	CI	 [1.50						1.60]	 [2.873						3.001]	 [-1.472						-1.305]	 [2.587						2.719]	 [1.943						2.079]	 [0.547						0.736]	
Num.	of	HH	members	likely	to	 Mean	 1.486	 2.763	 -1.277***	 2.486	 1.923	 0.563***	
migrate	 95%	CI	 [1.44						1.53]	 [2.705						2.821]	 [-1.352						-1.201]	 [2.426						2.546]	 [1.859						1.987]	 [0.476						0.651]	
Per	Capita	Wealth	(log)	 Mean	 1.262	 7.381	 -6.119***	 4.739	 5.169	 -0.43**	
95%	CI	 [0.88						1.64]	 [7.3						7.462]	 [-6.51						-5.728]	 [4.482						4.996]	 [4.964						5.373]	 [-0.758						-0.102]	
Total	Per	Cápita	Exp.	(log)	 Mean	 9.349	 9.331	 0.018	 9.124	 9.427	 -0.303***	
95%	CI	 [9.32						9.37]	 [9.311						9.35]	 [-0.013						0.049]	 [9.104						9.143]	 [9.406						9.449]	 [-0.332						-0.274]	
Total	Household	Savings	(log)	 Mean	 -8.137	 -4.929	 -3.208***	 -6.314	 -6.64	 0.326	
95%	CI	 [-8.47						-7.81]	 [-5.349						-4.51]	 [-3.742						-2.673]	 [-6.71						-5.918]	 [-7.022						-6.257]	 [-0.224						0.876]	
Mean	 0.78	 5.508	 -4.728***	 3.544	 2.172	 1.372***	
Baseline	Livestock	Value	
95%	CI	 [0.67						0.89]	 [5.311						5.705]	 [-4.953						-4.502]	 [3.344						3.745]	 [1.995						2.35]	 [1.104						1.64]	
Community/Spot	Level	Characteristics	 		 		 		 		 		 		
Mean	 2.034	 1.706	 0.327***	 1.739	 2.225	 -0.486***	
Distance	to	Pave	Road	
95%	CI	 [1.90						2.17]	 [1.594						1.818]	 [0.155						0.5]	 [1.617						1.861]	 [2.094						2.356]	 [-0.665						-0.307]	
Mean	 2.644	 2.532	 0.113	 2.174	 3.131	 -0.957***	
Distance	to	Nearest	Market	
95%	CI	 [2.52						2.77]	 [2.387						2.676]	 [-0.077						0.302]	 [2.033						2.315]	 [3.001						3.261]	 [-1.149						-0.765]	
Mean	 0.729	 0.739	 -0.011***	 0.781	 0.696	 0.085***	
Gini	Livestock	Value	
95%	CI	 [0.72						0.73]	 [0.734						0.745]	 [-0.018						-0.003]	 [0.776						0.786]	 [0.691						0.702]	 [0.077						0.092]	
	
28
 Reajul Chowdhury, Federico Ceballos-Sierra, Munshi Sulaiman
Table A4.  Average Treatment Effects (ATE) and Heterogeneous Treatment Effect (HET) for 
Savings and Self-employment Income.
Method	 		 Savings	(log)	 Self-employment	Income	(log)	
Causal	Forest	 ATE	 9.57	 5.66	
[8.18							10.96]	 [4.17							7.15]	
(0.000)	 (0.000)	
HET	 0.99	 0.99	
[-2.00							4.00]	 [-3.00							5.00]	
(0.241)	 (0.33)	
Random	Forest	 ATE	 9.03	 5.23	
[8.75							9.32]	 [4.77							5.69]	
(0.000)	 (0.000)	
HET	 0.74	 0.22	
[0.55							0.93]	 [-0.03							0.46]	
(0.000)	 (0.179)	
Elastic	Net	 ATE	 9.14	 5.00	
[8.85							9.43]	 [4.54							5.47]	
(0.000)	 (0.000)	
HET	 1.09	 0.43	
[0.64							1.63]	 [-0.28							1.02]	
(0.000)	 (0.212)	
	
29