ESSAYS IN CAUSAL INFERENCE AND SYNTHETIC CONTROL
by
SIMEON A. MINARD
A DISSERTATION
Presented to the Department of Economics
and the Graduate School of the University of Oregon
in partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
June 2019
DISSERTATION APPROVAL PAGE
Student: Simeon A. Minard
Title: Essays in Causal Inference and Synthetic Control
This dissertation has been accepted and approved in partial fulfillment of the
requirements for the Doctor of Philosophy degree in the Department of Economics
by:
Glen R. Waddell Chair
Jeremy Piger Core Member
Ben Hansen Core Member
David Wagner Institutional Representative
and
Janet Woodruff-Borden Vice Provost and Dean of the Graduate School
Original approval signatures are on file with the University of Oregon Graduate
School.
Degree awarded June 2019
ii
©c 2019 Simeon A. Minard
iii
DISSERTATION ABSTRACT
Simeon A. Minard
Doctor of Philosophy
Department of Economics
June 2019
Title: Essays in Causal Inference and Synthetic Control
This dissertation includes previously unpublished co-authored material. The
first chapter of this dissertation outlines a new method of estimating the synthetic
control technique that has a number of desirable properties. The second chapter
causally infers a positive property crime impact of supply side drug intervention,
and important policy result. The third and final chapter outlines an method of
variable selection in linear regression to be used to decrease bias and increase
precision.
iv
CURRICULUM VITAE
NAME OF AUTHOR: Simeon A. Minard
GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED:
University of Oregon, Eugene, OR
Western Washington University, Bellingham, WA
DEGREES AWARDED:
Doctor of Philosophy, Economics, 2019, University of Oregon
Master of Science, Economics, 2015, University of Oregon
Bachelor of Arts, Economics, 2013, Western Washington University
AREAS OF SPECIAL INTEREST:
Applied Econometrics
Applied Microeconomics
Labor
GRANTS, AWARDS AND HONORS:
Department of Economics Graduate Teaching Award, University of Oregon,
2017
Kleinsorge Fellowship Award, University of Oregon, 2014
v
ACKNOWLEDGEMENTS
I would like to thank my family and friends, and all members of my
committee, especially Dave.
vi
To my family, and Dave.
vii
TABLE OF CONTENTS
Chapter Page
I. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
II. DISPERSION WEIGHTED SYNTHETIC CONTROLS . . . . . . . . 3
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
A new synthetic control procedure . . . . . . . . . . . . . . . . . . 8
Empirics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
III. THE UNINTENDED CONSEQUENCES OF SUPPLY SIDE DRUG INTERVENTION:
EVIDENCE FROM DEA CHEMICAL CLASSIFICATION . . . . . 43
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Further Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
viii
Chapter Page
IV. BIAS REDUCTION THROUGH VARIABLE SELECTION . . . . . . 70
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Model selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
V. CONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
REFERENCES CITED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
ix
LIST OF FIGURES
Figure Page
1. Examples of “overall” and “relative” dispersion . . . . . . . . . . . . . 13
2. Contamination: all available donors . . . . . . . . . . . . . . . . . . . . 18
3. Contamination: Treatment and MSE . . . . . . . . . . . . . . . . . . . 19
4. Contamination: kernel densities . . . . . . . . . . . . . . . . . . . . . . 20
5. Unobserved trends: all available donors . . . . . . . . . . . . . . . . . . 23
6. Unobserved trends: treatment and MSE . . . . . . . . . . . . . . . . . 24
7. Unobserved trends: kernel densities . . . . . . . . . . . . . . . . . . . . 25
8. Unobserved error: all available donors . . . . . . . . . . . . . . . . . . 27
9. Unobserved error: precision properties . . . . . . . . . . . . . . . . . . 28
10. DWSC with a “Bad controls” problem . . . . . . . . . . . . . . . . . . 30
11. Per-capita cigarette sales (packs) in California, 1970 to 2000 . . . . . . 32
12. California smoking: donor inclusion . . . . . . . . . . . . . . . . . . . . 34
13. California: DWSC robustness . . . . . . . . . . . . . . . . . . . . . . . 35
14. Reported rape offences in Rhode Island, 1970 to 2009 . . . . . . . . . . 38
15. Rhode Island: DWSC robustness . . . . . . . . . . . . . . . . . . . . . 39
16. Rhode Island: donor inclusion . . . . . . . . . . . . . . . . . . . . . . . 40
17. Frequency histogram: cocaine admission proportion . . . . . . . . . . . 51
18. Frequency histogram: total admissions and cocaine admissions . . . . . 52
19. Geographic representation of treatment . . . . . . . . . . . . . . . . . . 54
20. Time series representation of treatment . . . . . . . . . . . . . . . . . . 56
21. Event study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
22. Pre-post crime against admissions . . . . . . . . . . . . . . . . . . . . . 64
x
Figure Page
23. Removal of low addiction states . . . . . . . . . . . . . . . . . . . . . . 65
24. Treatment estimates with more pre-treatment years . . . . . . . . . . . 66
25. Distributional co-variate balance tests: Baseline data-generating process 76
26. Bias entering through an omitted interaction (of x2 and x3), which correlates
differently among treated units . . . . . . . . . . . . . . . . . . . . 78
27. Bias entering through an omitted non-linearity (x21) that varies differently
among treated units . . . . . . . . . . . . . . . . . . . . . . . . . . 80
28. Bias entering through multiple omitted interactions (x1x2 and x
2
4) that vary
differently among treated units . . . . . . . . . . . . . . . . . . . . 82
29. Variable selection can reduce bias entering through unobservables: Positive
bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
30. Variable selection can reduce bias entering through unobservables: Negative
bais . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
31. Where there is no bias, variable selection increases precision . . . . . . 85
xi
LIST OF TABLES
Table Page
1. Cocaine Usage in 2006 (Percentage of Respondents NSDUH 2006) . . . 48
2. Pre-Treatment Summary Statistics . . . . . . . . . . . . . . . . . . . . 49
3. Property Crime Rate (2005 Cocaine Admissions per 100,000 population) 53
4. Property Crime Rate (2005 Cocaine Admission Proportion) . . . . . . 59
5. Property Crime Rate (2005 Cocaine Admission Proportion) (Population
Weighted) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6. Property Crime Rate (2005 Cocaine Admission Proportion) . . . . . . 62
7. Violent Crime Rates (2005 Cocaine Admission Proportion) . . . . . . . 63
8. Property Crime Rate (2005 Alcohol Admission Proportion) . . . . . . . 67
xii
CHAPTER I
INTRODUCTION
In Dispersion Weighted Synthetic Controls (co-authored with Glen Waddell)
we propose a new approach to synthetic-control methods, through which we
regularize the consideration of variation in available control units and the stability
properties of the synthetic control. Specifically, we introduce two penalties directly
into the objective function, allowing for the endogenous down-weighting of donors
to the synthetic control with outcomes that exhibit different patterns of variation
before and after treatment, and donors tending to be distant from the synthetic
controls average each period. While nesting a typical approach, we offer an
intuitively appealing method for applied researchers to evaluate the reasonableness
of a variety of synthetic controls and consider the sensitivity of results.
The Unintended Consequences of Supply-Side Drug Intervention: Evidence
from DEA Chemical Classification (JMP) focuses on the crime impacts of supply-
side drug policy. Supply-side drug intervention plays a central role in both local
and federal anti-drug policy, yet disruption of drug markets may lead to unintended
changes in the behavior of individuals who participate these markets. Specifically,
addicted users facing increased prices and reduced availability may turn to
financially motivated crime in order to continue their consumption patterns. I find
evidence that states with high levels of cocaine addiction experienced significant
increases in property crime relative to states with lower levels of addiction after
a nationwide supply shock in the cocaine market that reduced availability. This
effect should be accounted for in both formulation and execution of supply side
1
interventions as well as in the cost-benefit assessment of supply side drug policy as
a whole.
In Bias Reduction through Variable Selection (co-authored with Glen
Waddell) we focus on variable selection techniques to remove bias from regression
parameters. In regressions that seek to determine a causal impact, the treatment of
potentially confounding variables is very important for assuring unbiased estimates,
and is often approached through the so called balance test. We show that while
observable covariates my appear balanced in terms of means, there are other
dimensions where imbalance can lead to biasspecifically higher-order polynomials
and interactions of these variables. As the number of potential controls may be
large (and may even exceed the number of observations) we advocate the use
of model-selection proceduresBayesian Information Criterion (BIC) or Akaike
Information Criterion (AIC). We motivate this process through a number of
simulated environments, showing decreases in bias and improvements in precision
in cases of both omitted observable interactions and omitted unobservable variables
2
CHAPTER II
DISPERSION WEIGHTED SYNTHETIC CONTROLS
I would like to acknowledge Glen Waddell, who contributed to the
early stages of developing the initial concept, and provided guidance through
the development of the estimator. He also contributed to the writing and
presentation of the paper. I formulated the estimator mathematically, developed
it computationally, performed all computational simulations and created all figures,
and contributed to the writing and presentation of the paper.
Introduction
Athey and Imbens (2017) call synthetic control methods (SCMs), “the most
important innovation in the evaluation literature in the last 15 years.” Generally,
SCMs refers to the construction of a weighted average of untreated units that
projects the path that a treated unit would have followed in the absence of
treatment—it is quickly becoming a preferred approach to policy evaluation in the
absence of appropriate individual control units. However, as approaches to SCMs
have yet to be standardized, the scope for researchers to make specification choices
can undermine SCM results.1
In traditional approaches to estimating synthetic controls, it is assumed
that the relationship being identified is stable. That is, one assumes that the pre-
treatment fit with the treated unit is sufficient to imply that the post-treatment
1 Ferman et al. (2018) (i.e., “the cherry picking paper”) will surely stand as one of the
formative papers in the SCM literature, and points directly to this in suggesting that “with no
clear guidance on the choice of predictor variables used to estimate the [synthetic control] weights,
there are opportunities for the researcher to search for specifications with statistically significant
results, undermining one of the main advantages of the method.”
3
levels of the synthetic control approximates the treated entity’s counterfactual
levels in the treatment period. While one cannot benchmark a synthetic control
to the post-treatment behavior of the treated unit absent treatment, given the
fundamental econometric problem, there is interpretable information in the
behavior of untreated units in these periods.
While nesting a typical approach to SCMs, we offer an intuitively appealing
flexibility for applied researchers to evaluate the reasonableness and stability
properties of synthetic controls in their chosen environment, and consider the
sensitivity of estimated treatment effects amid the choices inherent to SCMs.
Specifically, we propose a systematic regularization across two dimensions,
motivated by properties that we anticipate being particularly desirable in a control
group—one’s tolerance for untreated units contributing to the synthetic control to
be distant from the synthetic control’s average each period (i.e., dispersion), and
one’s tolerance for the outcomes of untreated units to exhibit different patterns of
variation before and after treatment (i.e., relative dispersion).2
In Figure 1, for example, we plot a 2 × 2 matrix of four abstract notions of
high and low overall dispersion, and high and low relative dispersion. To our eye,
a set of controls with low dispersion in both dimensions might be appealing—
the bottom-right cell of the figure. The point being, as one approaches a set of
untreated units that looks other than that depicted in that preferred cell, one can
endogenize the down-weighting of donors to the synthetic control who are generally
distant (moving top to bottom in the figure), or the down-weighting of donors who
2 While estimating standard errors in the context of SCMs is itself a developing literature,
we imagine that increases in the precision of an estimated synthetic control imply increases,
ultimately, in the precision of estimated treatment effects.
4
are systematically more- or less-distant from the synthetic control after treatment
than they were before (moving left to right in the figure).
To be clear, we do not think of this as a fix, necessarily, but as a procedure
that reveals the sensitivity of SCM-derived estimates of treatment to the
consideration of synthetic-control stability. In the end, we recommend that
researchers plot estimates of the effect of treatment across the two parameters we
introduce (i.e., controlling “dispersion” and “relative dispersion”)—this also implies
that one considers a very large number of synthetic controls. Conveniently, over
the range of parameters, our procedure is entirely driven by pre-treatment MSE
at one end and converges toward identifying the best-available individual control
at the other.3 Further, we see a comparison of the treated unit to this “best” or
“closest” control as directly informative—we might consider asking of SCM papers,
more generally, whether adopting SCMs is responsible for attenuating or amplifying
the estimated treatment effect relative to such a benchmark. For example, in
the California experiment of Abadie et al. (2010), comparing California cigarette
sales to the individual control with lowest pre-treatment MSE (i.e., Montana)
yields an estimated treatment of 25.4 fewer packs per capita, annually, with the
1988 anti-smoking campaign—this is the largest from among the point estimates
retrieved over the range of our two parameters. In the Rhode Island experiment
of Cunningham and Shah (2018), it’s quite the opposite, where comparing Rhode
Island rape reports to the individual control with lowest pre-treatment MSE (i.e.,
New Hampshire) yields a relative decline in the treated state of only 1.6 rapes
per 100,000, which is the smallest estimate we retrieve of the effect of the 2010
3 Among the early innovators of SCMs, Abadie et al. (2010) retrieves an estimate of the causal
effect of a 1988 anti-smoking campaign in California on per-capita smoking sales. We return to
consider this result as part of our empirical applications. (See Abadie and Gardeazabal (2003),
and Abadie et al. (2015) for other formative SCM analyses.)
5
decriminalization of prostitution. Any additions to “Synthetic Rhode Island”
increase the magnitude of what one would find in this best-available case. This is a
distinction worth noting, we believe (as will be the slope of the estimated treatment
effect in parameters, which we discuss below with the figures we propose).
With the post-treatment variation of individual donors directly contributing
to the estimated treatment effect in proportion to their weight, it seems
uncontroversial that we take exceptional care over post-treatment behavior of
individual contributors as we determine those weights. However, given the near-
absolute importance that typical SCMs place on the pre-treatment fit of the
synthetic to the treated unit, it is noteworthy that we allow for the post-treatment
variation in potential donors directly in the determination of synthetic-control
weights. Yet, in our procedure, pre-treatment MSE plays no-less important a role,
as we propose that the presentation of treatment effects always be accompanied by
the presentation of pre-treatment MSE, where we can evaluate their co-movements
as we vary the ex ante importance of the synthetic control’s stability in the
objective function. In this way, we more-fully explore the variation in available
control units in the building of appropriate efficacy tests, while maintaining the
ability to speak back into more-standard approaches.
Our procedure also moves somewhat toward transparency. For example,
SCMs can often lead to multiple synthetic controls having near-equivalent measures
of pre-treatment fit while at the same time producing estimated treatment effects
that differ wildly. This is not uncommon, in our experience, and a source of
sensitivity that has lead to doubt in our own attempts to evaluate SCM results.
In our procedure, we will argue that near-equivalence in estimated treatment and
in pre-treatment MSE (despite putting increasing wright on stability properties)
6
should breed confidence, and is something that should be demonstrated in SCM
analyses. Likewise, to the extent a particular application produces bounds across
a parameter space, we have learned something. Or, to the extent a single point
estimate is still desired from a given analysis, we find it an appealing feature of our
procedure that we can potentially foreclose on some synthetic controls over others
on principled grounds, while adding transparency to the context from which that
inference is made.
As we approach our procedure, we acknowledge the broader literature and
direct readers to Doudchenko and Imbens (2016) for foundational context regarding
the relations between synthetic controls, difference-in-differences, and matching
methods. In what follows, we first walk through the specifics of what we refer to as
a dispersion-weighted synthetic control (DWSC), which in large part comes from
our own attempts to build confidence in our own policy evaluations with synthetic-
control methods. We then, in Section 2.3, report the results of our procedure
applied in two policy-relevant empirical settings. First, we apply the procedure
to simulated environments—in one, we consider a “contaminated-control” problem,
in another, we consider scenarios where type is unobservable, and untreated units
of different “type” are similar enough in pre-treatment to be given weight in the
synthetic control by traditional approaches, and in another, we consider SCM with
only “bad controls” available that each violate “common trends.” Second, we apply
DWSC to one of the canonical SCM settings: the 1988 anti-smoking campaign in
California evaluated in Abadie et al. (2010). Third, we consider a more-recent
SCM result: the change in reported-rape offences in Rhode Island around their
legalization of prostitution, evaluated Cunningham and Shah (2018). We offer some
concluding remarks in Section V.
7
A new synthetic control procedure
In seeking out both transparency and an informative regularization of a
synthetic control procedure, we allow for two penalties in the objective function
that yields the synthetic control. First, we allow for a penalty on individual donors
who exhibit changes (before and after treatment) in how they fit within a given
synthetic control. That is, we build into our methodology the endogenous down-
weighting of individual donors to the extent they behave differently in the post-
treatment period than they had in the pre-treatment period. This has strong
intuition, we believe, insofar as one is more confident that one has established a
“reasonable control” when the estimator produces a set of donors who at least vary
similarly with each other before and after treatment. Second, we allow for a penalty
on individual donors according to their overall mean-squared distance from the
synthetic control. That is, in the determination of donor weights, we allow for the
endogenous down-weighting of donors who are relative outliers within the synthetic
control, allowing researchers to consider robustness to “tighter” synthetic controls.4
The objective function
Consider the typical context in which the employ of SCMs for identification
seems advantageous—observations of a treated unit’s outcomes, Yit, and a set of K
untreated entities over the same time interval t ∈ [1, T ], with treatment occurring
after some T0 > 1 (i.e., T0 is the last pre-treatment observation). As per usual,
4 To one’s concern that we may introduce potentially undesirable consequences as we consider
distance to the synthetic control, note that we will still rely on pre-treatment MSE capturing in
the objective function the difference to the treated unit itself. As comparison to the treated unit
in the post-treatment period is untenable, neither of the two channels through which we will allow
for the down-weighting of donors can use that as a benchmark for comparison in the post period.
It’s this that drives us to using the synthetic control itself.
8
the treated unit is not observed in the treatment period absent treatment, and
SCMs handle this fundamental econometric problem with a convex combination
of control entities generating a single “synthetic control” that by assumption then
forecasts the counterfactual series that the treated unit would have followed in the
absence of treatment. Any deviations from this counterfactual are then attributed
to treatment. As we’ve already noted, weighted post-treatment movements in donor
entities are hardwired into the estimated treatment effect, which will motivate that
we consider post-treatment movements in the synthetic control more directly.
As a measure of pre-treatment fit, we follow the convention of adopting pre-
treatment mean squared error (MSE). Specifically, with vector w = [w1, w2, ..., wk]
collecting the set of weights to be determined across K untreated entities, and Y0t
notating the outcome of the treated unit itself, we define pre-treatment MSE as
∑(T0 ∑ )K 2
Y0t − wiYit
M(w) = t=1 i=1 . (2.1)
T0
The solution to minimization M(w) we can define as some set of weights w̄.5 We
augment this simple minimization of pre-treatment MSE with the addition of two
penalties.
First, in the objective function we allow for the endogenous down-weighting
of donors when their post-treatment behavior differs from their pre-treatment
5 Implied in our procedure will be that the unobservable components of outcomes are of first-
order importance, more so than particular covariate movements. Though important to distinguish,
this is second-order to our contribution, noting that there are examples of synthetic-control
estimation that do not rely on covariate inclusion (e.g., see Doudchenko and Imbens (2016)),
(Ferman et al., 2018) offers a discussion of the role of covariates and, ultimately, arguments
supporting the exclusion of covariates in SCMs, and Ferman and Botosaru (2017) demonstrates
the unbiasedness of synthetic controls matching on pre-treatment outcomes. (Moreover, note that
the inclusion of covariates requires that some pre-treatment periods be excluded (Kaul et al.,
2018), which introduces significant scope to the econometrician’s choice of which outcomes to
include.)
9
behavior. The intuition is straightforward, we believe—it is reasonable to question
why some donors to the synthetic control may be behaving differently after
treatment, and to consider the sensitivity of a single point estimate to the down-
weighting of deviants. In this way, we think of this as not a prescription for a fix,
per se, but as a procedure that reveals the sensitivity of SC-derived point estimates
to the consideration of synthetic-control stability. In this dimension, we enable the
down-weighting of donors who are more (or less) distant from the synthetic control
after treatment than they were before treatment. Specifically, we can define the
relative dispersion of each donor, Rj(w), as
∑( ∑ )2 ∑ ( ∑ ) 2 2T0 K T KRj(w) = Yjt − wiYit /T0 − Y jt − wiYit /(T − T0) ,
t=1 i=1 t=T0+1 i=1
(2.2)
and, summing across donors, define a measure of relative dispersion, R(w), as
∑K
R(w) = wjRj(w) , (2.3)
j=1
which assures that only weighted donors contribute to the objective function. We
parameterize this potential penalty in the objective function with ρ ∈ [0, 1).
Second, we allow for the endogenous down-weighting of donors contributing
excessively to the overall dispersion of the synthetic control. That is, defining the
overall variation within the synthetic control as
∑T ∑ ( )K ∑K 2
wj Yjt − wiYit
D t=1 j=1 i=1(w) = , (2.4)
T
10
we allow for the down-weighting of donors k who contribute excessively to this
variation, generally, which produces a tighter-fitting synthetic control (with no
particular notion of pre/post balance). We parameterize the relative importance
of this more-general dispersion in the synthetic control with δ ∈ [0, 1).
Together, then, (for given ρ and δ) we can cast the estimation procedure with
the objective function,
︸ − −︷︷ M(w̄) M(w̄)arg min (1 ρ δ) M(w︸) + ︸ρ R(ww ︷)︷ + δ D(w)R(w̄)︸ ︸ ︷︷D(w̄)Pre-treatment MSE ︸
Pre/post relativ∑e (“rho”) dispersion Overall (“delta”) dispersionK
s.t. ρ ≥ 0 , δ ≥ 0 , ρ+ δ < 1, wi = 1 , wi ≥ 0 ∀ i.
i=1
(2.5)
In (2.5), weights w are chosen by standard numerical optimization techniques
to minimize a convex combination of pre-treatment mean squared error and our
measures of dispersion, R and D. In order to make the size of the penalties and
MSE comparable, we scale both penalties by the ratio of pre-treatment MSE to the
penalty, evaluated at the solution to minimizing the pre-treatment MSE in (2.1),
which we notate as w̄.6
Some things of note
We have five quick notes to make, before proceeding to consider applications.
First, we note that the procedure we propose is not unlike that of
Doudchenko and Imbens (2016), who introduce penalties on the L1 (sum of
absolute value of weights) and L2 (sum-of-squared weights) norms of the weights.
6 Recall that the set of weights w̄ minimizes M(w) is the same as that which minimizes 2.5
given ρ = δ = 0.
11
In our procedure, while also increasing transparency and demonstrating robustness
across a range of estimates, we retain the weight restrictions of the original Abadie
et al. (2010) but allow the researcher to weigh more heavily the importance of
stability among donors to the the synthetic control. In general, we share the
prior that relaxing these restrictions can lead to substantial improvements in
the estimator. However, penalties such as those we introduce are difficult to
conceptualize without those restrictions. Negative weights, for example, create
problems given our notions of what properties we value in a synthetic control
and we feel comfortable penalizing (similarity in levels, and similarity pre/post
similarity). Put in other words, we are willingly sacrificing the benefits to relaxing
weight restrictions in exchange for desirable and informative properties in the
procedure.
Second, that we have now introduced ρ and δ as parameters, Figure 1 can
be recast with the added intuition that, as one approaches data with properties
other than that depicted in that preferred cell, it is with increases in δ that one
achieves lower overall dispersion of the data contributing to the synthetic (moving
down) and with increases in ρ that one achieves lower relative dispersion of the
data contributing to the synthetic (moving to the right).
Third, note that benchmarking the penalties to the synthetic control itself
will protect against the down-weighting of donors to the synthetic control that
truly do represent the counterfactual. Our procedure allows for the down-weighting
of untreated units to the extent they are outliers in the set of all potential donors.
To the extent their movement is common across potential donors, the penalty does
not so quickly bind and they are not down-weighted. We address related concerns
in one of our simulated environments below, assigning the treated unit to one of
12
FIGURE 1.
Examples of “overall” and “relative” dispersion
High overall / High relative High overall / Low relative
Low overall / High relative Low overall / Low relative
two unobservable types—the procedure we introduce increasingly weights those of
the same type as the treated unit, while the traditional method does not.
Fourth, we simply wish to formalize parameter conditions. Recall that
choosing ρ = 0 and δ = 0 nests an estimator that matches only on pre-treatment
outcomes, and in that way captures the typical synthetic control design.7 Note,
however, that there are K potential solutions to (2.5) when either ρ = 1 or δ = 1.
That is, at either limit, any degenerate set of weights (i.e., all weight on a single
donor) yields an objective function that evaluates to zero and the estimator cannot
7 The synth package, for example, will produce estimates that minimize (2.5) subject to ρ =
δ = 0, when all pre-treatment periods are included. (synth is currently available in Matlab, R,
and Stata, through Jens Hainmueller.)
13
distinguish any of the K donors from one another. However, as one approaches
ρ = 1 or δ = 1 and the residual weight of 1 − ρ − δ is still on pre-treatment
MSE, the estimator will collapse on the single donor that best-matches the pre-
treatment outcomes of the treated unit itself, and thereby avoid choosing donors
with attractive dispersion properties but poor fit with the treated unit. Formally,
then, we define the parameter space as ρ + δ ∈ [0, 1), recalling that the value in
our procedure is, again, in the variation in the estimated treatment effect across
the range of these parameters.8 By extension, across this parameter space, pre-
treatment MSE cannot be larger than the pre-treatment MSE of the single best-
fitting control.
Finally, synthetic control method is often referred to as a generalization of
the difference-in-differences (DD) method. However, this is only true if data are
de-meaned before estimation, accounting for the “fixed effect” implied in difference-
in-differences estimators. We follow the general convention in synthetic controls
of not demeaning our data, with recognition that de-meaning may well become
commonplace (Ferman and Pinto, 2016). Importantly, we expect that in a context
where data are de-meaned, the importance of ρ would rise, while the importance of
δ would fall. Further, the interpretation of “best-available” individual control would
move toward aligning with what one might think of as the best available for a DD
design—similar in co-movements, but not necessarily in levels.
Empirics
In this section, we produce three sets of results. First, we consider the
estimator’s behavior in three simulated environments—an environment where a
8 We have no strong prior on the relative importance one might imagine for ρ and δ.
14
subset of controls is contaminated by treatment, an environment where the donors
become increasingly disperse, and an environment where there are donors are
unlikely to have common trends with the treated unit. The bias in the traditional
synthetic control is revealed in a lack of robustness across the allowable ranges of ρ
and δ.
Second, we reconsider the Abadie et al. (2010) (ADH) analysis of California’s
1988 anti-smoking campaign, retrieving estimates of the causal effect of the policy
intervention on per-capita smoking sales that substantiate the original result, and
with some added confidence (we argue). In short, we find even the point estimate
to be insensitive to our procedure—as we increase the importance of both ρ or δ
type dispersions in the determination of the synthetic control, there is very little
movement in the estimated treatment effect and, if anything, the point estimate
increases in magnitude on approach to the best-available individual control.
Third, we reconsider one of the results in Cunningham and Shah (2018),
a well-known and recent SCM paper in which the 2003 decriminalization of
prostitution lead to a decrease in reported rape offences. Considering ρ and δ type
dispersions in this example reveals a sensitivity that traditional SCMs and point
estimates that seemingly decrease in magnitude on approach to the best-available
individual control.
In all cases, we propose that in the plotting of the estimated treatment
effect and the MSE across permutations of ρ and δ, researchers can evaluate the
reasonableness and sensitivity of a synthetic control in their chosen environment.
15
Simulated environments
It is easy to introduce bias into SCMs that only consider pre-treatment MSE.
Here, we provide four cases that strike us as instructive—a contaminated-control
problem, two scenarios in which there are unobservable types within the potential-
donor pool, and one in which there are just a bunch of “bad controls” that would
each fail the “common trends” assumption of difference-in-differences designs.
In each, and for each parameter {ρ, δ} in 0(.01).99, we summarize the relevant
properties based on 1,000 simulated draws of the idiosyncratic components of the
associated data-generating processes.
Contamination among potential donors
Data used in comparative case studies can often be subject to the criticism
that some of the potential control entities may also have experienced treatment,
which lead to bias in the estimated treatment effect. For the purpose of our
simulation we care less about why this may be—typically it is a geographic
neighbour, for example, that we anticipate experiencing some fallout from
treatment. For example, Oregon’s legalization of recreational marijuana introduced
variation in marijuana-related outcomes in Washington—retailers along the Oregon
border experienced a 41-percent decline in sales following Oregon’s market opening
(Hansen et al., 2017). We fully anticipate that institutional knowledge will continue
to be brought to bear on analysis, and that the elimination of controls thought to
be contaminated can proceed as per usual. However, where institutional knowledge
is expensive, or we wish to communicate beyond those who have this knowledge, or
where transparency will be appreciated, our procedure endogenously down-weights
contaminated controls to the extent that there are evident changes in how potential
16
donors contribute to the SC’s dispersion after treatment. In this simulated context,
we show how the introduction of a relative-dispersion penalty, in particular, can
endogenously down-weight the contaminated control and eliminate bias, if not
merely identify the sensitivity.
Here, we simulate a simple environment in which there is a true treatment
(we set equal to one), but it spills into another untreated unit—a “contaminated-
control” problem. The simulated data consists of one treated unit and eight
potential controls, one of which experiences some fraction of the treatment
experienced by the true treated unit. We otherwise let outcomes be sensitive only
to a randomly determined intercept (from N(0,1)) and idiosyncratic shocks (from
N(0,0.1)). The outcomes are observed for 40 periods, with treatment occurring
at time 20, which introduces a level increase of 1 only for the treated unit, and
0.8 for the single “contaminated” control.9 (A single random draw from this data-
generating process is depicted in Figure 2.)
In Panel A of Figure 3, we plot the estimated treatment effect across
parameters, identifying the true treatment effect with a solid line, which the DWSC
approaches smoothly in our simulated environment. This data-generating process
also reveals a potential risk, evident in the declining treatment effect at extreme
parameter values. That is, the estimator can tend toward single entities at some
level, which we are inclined to think of as finding the single-best control, which
would be desirable. However, in a data-generating process such as this, note that
the best-available control could well be the contaminated control itself, which
emphasizes that caution still be brought to bear. With high ρ, for example, the
synthetic control here converges to the contaminated control 1/8th of the time,
9 Contaminations of larger (smaller) sizes are likewise eliminated, but at lower (higher) values
of ρ.)
17
FIGURE 2.
Contamination among potential controls: A single draw of the treated unit and all
available donors
Notes: Data-generating process described in Section 2.3.
which is why the average estimated treatment drops again. (At ρ = δ = 0, the
contaminated control is given weight of .125. This is equivalent to the mean across
iterations as δ → 1, where 1/8th of the time the weight given to the contaminated
control approaches 1.) In Panel B of Figure 3, we evidence the tradeoff of higher
penalties on dispersion (i.e., higher pre-treatment MSE).
In Figure 4, we produce kernel densities of the treatment-effect estimates,
across select parameter values. In both panels—in Panel A we vary ρ, and in Panel
B we vary δ—the unpenalized SC draws from the set of potential donors similarly,
with the contaminated control receiving weight in the synthetic control with equal
probability. (Therein likes the problem, and source of bias.) As was implied in
Figure 3, as we increase ρ, the contaminated control is down-weighted—when
eliminated fully, precision around the true treatment parameter arrises. However,
when the contaminated control is not eliminated, it can end up representing a
18
FIGURE 3.
Contamination among potential controls: Estimated treatment and pre-treatment MSE
Panel A: Mean treatment effect (true = 1), across parameters
Panel B: Pre-treatment MSE (normalized to 1 at ρ = δ = 0)
Notes: In all reported results we plot the range of parameter spaces across
0(.1).99. Data-generating process described in Section 2.3.
larger share of the synthetic control, and thereby induce a bi-modality—this
second mode is precisely at the difference between the true treatment effect and
the outcome of the contaminated control (i.e., 1-.8=.2). This is more evident
in Panel B, across increases in δ. As the overall dispersion in this simulated
environment is driven largely by the level differences across donors, the exclusion
of the contaminated control need not contribute significantly to reducing overall
dispersion. On the other hand, the contamination does contribute substantially
19
FIGURE 4.
Contamination among potential controls: The effect of parameters on kernel densities
around mean estimated treatment effect (true = 1)
Panel A: ρ permutations Panel B: δ permutations
Notes: In all reported results we plot the range of parameter spaces across 0(.1).99.
Data-generating process described in Section 2.3.
to relative dispersion, which leaves ρ as the more-direct and more-effective margin
when there are contaminated controls among the potential donors.
Unobservable types, different in their trends
Here, we imagine an unobserved heterogeneity that one might anticipate
being challenging to a synthetic control environment—the treated unit being
one of two unobservable types, largely overlapping in the pre-treatment period.
In this case, we assume that the true treatment effect is zero. While all entities
in the data-generating process will follow similar paths—hence, the challenge—
we put one-third of potential donors on a path that slowly drifts away from the
other two-thirds of potential donors, as though there are unobservable components
contributing to their outcomes that do make them less and less comparable over
time. Though smooth, we purposefully configure the divergence so as to become
increasingly apparent, which highlights the potential for an MSE-driven synthetic
20
control to give weight to these entities when not considering their post-treatment
behavior.
Specifically, we posit 21 entities, each observed over 40 periods, with
“treatment” falling in the last half of observations, 14 entities are “A” types,
following Y Ait = 0.02t − 0.002t2 + it and 7 entities are “B” types, following
Y Bit = 0.07t − 0.005t2 + it, where it are drawn from N(0,0.1). Including the
“treated” unit among the “A” types tips toward having plenty of donors similar
to the teated unit from which to construct a reasonable synthetic control. Likewise,
then, including the “treated” unit among the “B” types tips toward having fewer
donors similar to the teated. We will consider both scenarios, and in so doing find
that the unpenalized environment assigns weight to donors of the “wrong” type in
both scenarios, while penalizing dispersion down-weights “wrong” types—again,
this is true regardless of which type we assign to the treated unit. (As anticipated,
given the data-generating process in this environment, there is greater sensitivity
evident around permutations of δ, controlling overall dispersion, than to ρ.) This
scenario is depicted in Figure 5, as both the theoretical data-generating process and
single representative draws.
In Figure 6, we plot the estimated treatment effects and pre-treatment MSE
across the parameter space, in each case identifying the true treatment effect
(zero) with a solid line. In Panel A, the treated unit is assigned to Type A. In
Panel B, the treated unit is assigned to Type B. In both, the traditional approach
to estimating synthetic controls is biased away from the true treatment effect,
as “other” types receive positive weight to the extent that they help minimize
pre-treatment MSE. However, in either case, permutations of our dispersion-
weighting parameters have the DWSC approach the true treatment effect smoothly.
21
Moreover, in parameters, the fraction of weight given to donors of the same type
increases—this is shown in the second row of plots, with convergence to A types
on the left and B types on the right. Where a traditional approach to deriving a
synthetic control struggles to identify type appropriately, regardless of the treated
unit’s type, our procedure has the synthetic control collapsing on the same—the set
of potential donors that we would anticipate being the appropriate counterfactuals,
a priori. Likewise, the estimated treatment converges to its true level as either
parameter increases. In the third row, we demonstrate the associated tradeoff
also evidence the fundamental tradeoff associated with greater synthetic-control
stability—higher pre-treatment MSE.
In Figure 7, we produce kernel densities that speak to the distribution around
the mean treatment effect. In this setting it is informative, as it reveals a certain
tension in the procedure. First, these kernels make evident the bias in traditional
approach (i.e., ρ = 0 and δ = 0, each in yellow). Second, with increases in either
dispersion penalty, the expected parameter clearly moves toward unbiasedness,
but there is also the initial suggestion of a bi-modality, as the underlying types
are sorted out. Third, as one might anticipate, the distribution of treatment
parameters is collapsing faster when we penalize overall dispersion (δ) than when
we penalize relative dispersion (ρ). Yet, also evident is the increase in variation
around the point estimate for extreme parameter values—this is also anticipated,
as it coincides with the synthetic control collapsing on the best-available control
as ρ → 1, or δ → 1. However, this eventual loss of precision is notably less
pronounced in extreme values of rho than in extreme values of δ, consistent with
the synthetic control (and the estimated treatment effect itself) being less sensitive
to ρ permutations (see Figure 6). While less generalizable, note also that there are
22
FIGURE 5.
Unobserved types, different in trends: The treated unit and all available donors
Panel A: Theoretical data-generating processes
Panel B: A representative draw when treated unit is assigned to Type A
Panel C: A representative draw when treated unit is assigned to Type B
Notes: Data-generating processes described in Section 2.3.
23
FIGURE 6.
Unobserved types, different in trends: Estimated treatment and pre-treatment MSE
Panel A: Treated unit is Type A Panel B: Treated unit is Type B
Mean treatment effect (true = 0) Mean treatment effect (true = 0)
Fraction of weight on Type A donors Fraction of weight on Type B donors
Pre-treatment MSE Pre-treatment MSE
(normalized to 1 at ρ = δ = 0) (normalized to 1 at ρ = δ = 0)
Notes: In all reported results we plot the range of parameter spaces across 0(.1).99.
Data-generating process described in Section 2.3.
more “A” types available in the donor pool than “B” types. As such, the synthetic
control is initially less sensitive to parameters when the teated unit is an A Type,
and collapses to the best-available control at higher parameter values—when the
treated unit is a B Type, the best-available is reached sooner.
24
FIGURE 7.
Unobserved type, different in trends: The effect of parameters on kernel densities around
mean estimated treatment effect (true = 0)
Panel A: Treated unit is Type A
ρ permutations δ permutations
Panel B: Treated unit is Type B
ρ permutations δ permutations
Notes: In all reported results we plot the range of parameter spaces across 0(.1).99.
Data-generating process described in Section 2.3.
Unobservable types, different in their idiosyncratic errors
We now imagine an unobserved heterogeneity in which δ may play the first-
order role. Specifically, we imagine the treated unit being one of two unobservable
types, where instead of those types diverging and creating bias proportional to their
25
weight in the synthetic control, we consider the estimator’s behavior in light of
having added noise to the system.
Specifically, we posit 21 entities, each observed over 40 periods, with
“treatment” (true=1) falling in the last half of observations, 10 entities are “high-
variance” types, following Y H = Hit it , where 
H
it is drawn from N(0,4) and 10 entities
are “low-variance” types, following Y Lit = 
L
it, where 
L
it are drawn from N(0,1).
Unlike the problem of Section 2.3, where our procedure had the synthetic control
collapse on the same type as the treated unit, here we will find the procedure
having the synthetic control converge to those donors with the smaller variance,
whether the treated unit is itself found among the low- or high-variance types.
(Representative draws from this data-generating process are depicted in Figure
8.)
In Figure 9, we plot the relevant properties of the estimator across 1,000
draws. In Panel A, we assign the treated unit to be among the low-variance types.
In Panel B we assign the treated unit to be among the high-variance types. Unlike
the environment above—recall that DWSC converged toward using donors of the
same type as the treated unit—in the first row we see that penalizing dispersion
has the estimator converge to using low-variance donors, regardless of the treated
unit’s type. In the second row, we plot the variance in estimated treatment,
also across parameters, demonstrating an increase in precision as the synthetic
control collapses on low-variance types. However, there is a distinct tradeoff in
this environment, that can ultimately show up as increasing variation in the
estimated treatment effect. Namely—in the third row we produce plots of the
number of donors receiving positive weight—the number of donors contributing
to the synthetic control is clearly decreasing in parameters. As such, the effect
26
FIGURE 8.
Unobserved types, different in idiosyncratic error: The treated unit and all available
donors
Panel A: A representative draw when the treated unit is a low-variance type (true = 1)
Panel B: A representative draw when the treated unit is a high-variance type (true = 1)
Notes: Data-generating processes described in Section 2.3.
on precision need not be beneficial. In this environment, the “number of donors”
ultimately overtakes the precision associated with the DWSC finding low-variance
27
FIGURE 9.
Unobserved types, different in idiosyncratic error: Precision-related properties
Panel A: Treated unit low-variance Panel B: Treated unit high-variance
Fraction of weight on low-variance donors Fraction of weight on low-variance donors
Variance in treatment effect Variance in treatment effect
(normalized to 1 at ρ = δ = 0) (normalized to 1 at ρ = δ = 0)
Number of weighted donors Number of weighted donors
Notes: In all reported results we plot the range of parameter spaces across 0(.1).99.
Data-generating process described in Section 2.3.
donors—this tradeoff is more likely to bind the closer are the two types in their
variance properties.10
10 As the difference in variance across types increases, the benefits associated with “finding
low-types” decrease yet the costs associated with “number of donors” falling remain.
28
“Bad” pre-trends
As one last simulation, we imagine the condition in which the potential
donors are trending differently and where SCM might be the go-to approach to
evaluating the effect of treatment, In some sense, we imagine hardwiring what some
have described as the typical picture of “bad” controls that SCM can potentially
work within. Within a simulated environment, of course, the mean behavior of
an SCM approach will not be biased—if potential donors are merely on different
linear trends, this is not a “bad controls” problem at all. As such, documenting the
variance properties of the estimator is our interesting here. A representative draw
of this environment is depicted in Panel A of Figure 10, where we posit 6 potential
donors, each observed over 40 periods, with “treatment” (true=1) falling in the last
half of observations, where, otherwise, the treated unit follows Yt = t ∼N(0,0.1).
Each donor follows Yit = αi + βit + it, where αi ∼N(0,1.5), and βi ∼N(0,0.2), and
it ∼N(0,0.1).
Given the linear path of each potential donor, there is a set of weights w such
that the linear combination of donors well-approximates the path of the treated
group for any given set of trending donors—given that they do not deviate from
their paths in our simulated environment (other than through an idiosyncratic
term), the counterfactual is also well-approximated in the post-treatment period
and the estimated treatment is unbiased (Panel B). As anticipated, our procedure
identifies treatment effect with increasing variance around the 1,000 point estimates
we simulate, which we show in Panel C.
Without different patterns of behavior among donors before and after
treatment, the standard SCM is not ill-equipped to retrieving the treatment effect
from such a data-generating process. That said, DWSC applied to such a data-
29
FIGURE 10.
DWSC with a “Bad controls” problem
Panel A: A representative draw (true = 10)
Panel B: Mean treatment effect (true = 10)
Panel C: Variance of treatment effect
(normalized to 1 at ρ = δ = 0)
Notes: Data-generating processes described in Section 2.3.
generating process will still afford a confidence, insofar as the estimates effect of
treatment is robust to ρ and δ permutations.
30
Cigarette sales and anti-smoking initiatives
Here, we apply our estimator to the data of Abadie et al. (2010), re-
considering the effect of California’s 1988 anti-smoking initiative known as
Proposition 99. In Figure 11, we offer 15 different plots—together they span the
ρ + δ parameter space, while in each we produce the time series of California
cigarette sales (solid black), cigarette sales in the Synthetic California (dashed
blue), and cigarette sales in each of the donors to that synthetic (solid blue). For
these individual donors, we plot them with levels of intensity that are proportional
to the weights (w) given to each state in the Synthetic California.
While the top row (i.e., ρ + δ = 0) is qualitatively similar to that of Abadie
et al. (2010), our weights do differ somewhat, which highlights the role played by
covariates in their analysis.11 In the second through fourth rows, we have allowed
for the endogenous down-weighting of donors to the synthetic control subject to
ρ and δ dispersions—we plot select examples for permutations of ρ (Panel A), of
ρ = δ (Panel B), and of δ alone (Panel C). Recall that it is donors with differences
in dispersion that are down-weighted as ρ increases and we move down Panel A, as
the estimator increasingly prefers donors that contribute similarly to SC-dispersion
on either side of treatment. As δ increases (moving down Panel C), the estimator
prefers donors who are, on average, proximate to the SC’s average each period. In
our experience, the limiting cases of ρ + δ → 1 is helpful in identifying something of
a single best-available control. In the last row, we see that two of the three columns
11 It is the addition of covariates (and necessarily, then, the dropping of several observations
of outcomes) that explains the exclusion of New Hampshire, for example. In all of our analysis
we exclude covariates but include all pre-treatment outcomes, eliminating researcher choice and
thereby increasing comparability across analyses.
31
FIGURE 11.
Per-capita cigarette sales (packs) in California, 1970 to 2000
A: ρ permutations B: ρ = δ permutations C: δ permutations
ρ = 0 and δ = 0 ρ = δ = 0 ρ = 0 and δ = 0
ρ = .10 and δ = 0 ρ = δ = .05 ρ = 0 and δ = .10
ρ = .50 and δ = 0 ρ = δ = .25 ρ = 0 and δ = .50
ρ = .90 and δ = 0 ρ = δ = .45 ρ = 0 and δ = .90
ρ→ 1 and δ = 0 ρ = δ → 1 ρ = 0 and δ → 1
Notes: For given parameters, we plot the surviving donors from the analysis described in
Section 2.3, with each line’s intensity proportional to the assigned weights.
32
the synthetic control has collapsed to the single best-available control from among
those in the set of potential donors.
While changes in the composition of the synthetic control are evident, the
overall picture in this application is one of stability in the combination of donor-
states, and even in their relative weights—they just don’t change very much. The
robust nature of the synthetic control in terms of its composition of states across
parameters is also made evident in Figure 12, where we offer a different view of the
donor weights resulting from our permutations.
In Panel A of Figure 13, we produce treatment effects across three separate
conditions—changing the importance of relative dispersion (ρ), overall dispersion
(δ) and increasing the dispersion penalties subject to an equal-weighted condition
(i.e., ρ = δ).12 In each of their origins (i.e., at ρ = δ = 0), we are approximating
the typical synthetic control, as we have highlighted above. (The point estimate
at our ρ = δ = 0 origin is -19.5, and very close to the point estimate of Abadie
et al. (2010), which is -20.) The noteworthy point, however, is the robustness of
the estimated treatment effect to our permutations, so much so that the three
plots are largely on top of one another, and not straying from the published result.
This panel speaks to a stability in the original inference of Abadie et al. (2010)—if
anything, we come away from this exercise imagining that “20-percent reductions
in cigarette sales” is a lower bound on the estimated treatment effect were one to
penalize within-SC changes in dispersion.
Possibly as important, though implicit in the stability of the estimated
treatment, is how little pre-treatment fit is given up in these parameter
permutations. The fundamental tradeoff as ρ or δ increase is to give up pre-
12 In all reported results we plot the range of parameter spaces across 0(.1).99.
33
FIGURE 12.
Donor-inclusion plots: Contributions to the dispersion-weighted synthetic controls around
California’s anti-smoking campaign
Panel A: ρ permutations
Panel B: δ permutations
Panel C: ρ = δ permutations
Notes: In all reported results we plot the range of parameter spaces across 0(.1).99. The
synthetic control in Abadie et al. (2010) is the weighted average of Utah (0.334), Nevada
(0.234), Montana (0.199), Colorado (0.164), and Connecticut (0.069).
34
FIGURE 13.
The robustness of the Abadie et al. (2010) “California result” to dispersion-weighted
synthetic control (DWSC)
Panel A: Mean treatment effect, across parameters a
Panel B: Pre-treatment MSE b
Notes: In each panel, we plot the estimated treatment effect across [0, 1), for ρ ∈ [0, 1) (while
setting δ = 0), δ ∈ [0, 1) (while setting ρ = 0), and ρ+ δ ∈ [0, 1) (while setting ρ = δ). In all
reported results we plot the range of parameter spaces across 0(.1).99. a Abadie et al. (2010)
reports that cigarette consumption was reduced by an average of almost 25.4 packs (per capita,
annually). b Abadie et al. (2010) reports a pre-treatment MSE of roughly 3.
treatment fit for greater SC-stability. In Panel B of Figure 13, we capture the
changes in MSE across the same space, and the flatness in Panel B is encouraging,
and consistent with the cost side of increasing the importance of either dispersion
penalty not being overly onerous in these data. To our eye, we would like to see
this sort of robustness, as it coincides with an appealing stability in the control
group to which California is ultimately compared.
35
While instructive, Figure 11 is exceedingly inefficient for presentation
purposes, as might be Figure 12. Thus, unless there are particular points to be
made, we propose that researchers communicate something akin to Figure 13.
Reported rape offences and the decriminalization of prostitution
The recent SCM paper of Cunningham and Shah (2018) is very well known,
and thus provides an appealing opportunity to consider the DWSC procedure
around the unexpected decriminalization of prostitution in 2003 on reported rape
offences. It is a nice example of the sort of policy variation that one with an eye for
clever sources of variation should jump on, and to which SCMs might naturally be
applied.13 Below, we follow our procedure in estimating the effect of Rhode Island’s
decriminalization on rape reports, with data from the Uniform Crime Reports,
Federal Bureau of Investigation.14
In Figure 15, we produce our collection of plots, across parameters, which
ultimately identifies New Hampshire as the single-best control. However, the
instability in building a synthetic control out of the available time series of donors
around this natural experiment is most evident when we plot the estimated
13 In short, the effect of decriminalization on reported rape offences is an empirical question, as
decriminalization yields safer work spaces to the extent firms are more willing to invest, but sex
workers could also be more willing to report to and cooperate with police upon assault.
14 Defining a synthetic control for Rhode Island (the treated state), among the conclusions
made in Cunningham and Shah (2018) is that reports of rape fall by roughly 32 percent with
the decriminalization of prostitution, from 34.1 per 100,000, to 20. As we proceed, we do so
knowingly make several departures from Cunningham and Shah (2018), even when conditioning
on ρ = δ = 0. For example, in the determination of weights, we do include all pre-treatment rape
reports between 1970 and 2002 (they match only on 1979, 1995, 2001, 2002, and 2003). We do
not include averages of any rape reports (they match on mean rape over 1992 through 1995, mean
rape over 2001 and 2002, and mean reap over 2002 and 2003). We model outcomes in levels (while
in their preferred specification, they model two-year averages “to minimize the volatility in the
series”). Our procedure applied to the smoothed data suggest that the deviation coincident with
Rhode Island’s decriminalization of prostitution is smaller than that in the level-data.
36
treatment and MSE across parameters, which we do in Figure 15. The sensitivity
of the SC’s composition as we allow for more weight on the stability properties of
the synthetic control is somewhat striking, as tipping points are clearly reached and
individual donor states are discarded from the set of donors. This was not at all a
property of the California smoking data, which was quite well behaved through all
of our perturbations—this context alone is important to drawing inference from a
particular set of ρ and δ = 0.
Again, there are only slight pre-treatment MSE costs suffered over large
regions of the parameter space. Yet, over the same space, treatment estimates
vary significantly. Unless there are very strong priors made on what contributes
to a good control, of the sort that would preclude consideration of the stability
properties as we have done, this is indicative of an environment that leaves the
estimated treatment itself the discriminating element across a very wide range of
potential SCs with similar in their pre-treatment MSE.
While our analysis supports that, relative to a synthetic controls, reported
rapes fall with decriminalization in Rhode Island, our range of estimates is
somewhat wide—declines of roughly 9 per 100,000 without any consideration
for dispersion, falling quickly as we increase the stability of the synthetic control
in either ρ or δ dimensions (to roughly -5), and, as dispersion is weighted more
heavily, estimates are as low as 1.6 fewer rape offences reported per 100,000. The
sensitivity of the composition of the synthetic control across parameters is also
made evident in Figure 16. While New Hampshire remains a significant contributor
throughout most of the parameter space (and to a lesser extent Iowa), the jumps in
estimated treatment in Figure 15 are clearly coinciding with the exit and entry of
other states from the set of weighted donors.
37
FIGURE 14.
Reported rape offences in Rhode Island, 1970 to 2009
A: ρ permutations B: ρ = δ permutations C: δ permutations
ρ = 0 and δ = 0 ρ = δ = 0 ρ = 0 and δ = 0
ρ = .10 and δ = 0 ρ = δ = .05 ρ = 0 and δ = .10
ρ = .50 and δ = 0 ρ = δ = .25 ρ = 0 and δ = .50
ρ = .75 and δ = 0 ρ = δ = .375 ρ = 0 and δ = .75
ρ→ 1 and δ = 0 ρ = δ → 1 ρ = 0 and δ → 1
Notes: Reported rape offences per 100,000. For given parameters, we plot the surviving donors
from the analysis described in Section 2.3, with each line’s intensity proportional to the
assigned weights.
38
FIGURE 15.
Reported rape offenses and Rhode Island’s decriminalization of prostitution, robustness to
dispersion-weighted synthetic control (DWSC)
Panel A: Mean treatment effect, across parameters
Panel B: Pre-treatment MSE
Notes: In each panel, we plot the estimated treatment effect across [0, 1), for ρ ∈ [0, 1) (while
setting δ = 0), δ ∈ [0, 1) (while setting ρ = 0), and ρ+ δ ∈ [0, 1) (while setting ρ = δ). In all
reported results we plot the range of parameter spaces across 0(.1).99.
39
FIGURE 16.
Donor-inclusion plots: Contributions to the dispersion-weighted synthetic controls around
Rhode Island’s decriminalization of prostitution
Panel A: ρ permutations
Panel B: δ permutations
Panel C: ρ = δ permutations
Notes: In all reported results we plot the range of parameter spaces across 0(.1).99. The
synthetic control in Cunningham and Shah (2018) is the weighted average of South Dakota
(0.356), Idaho (0.342), New Hampshire (0.162), and North Dakota (0.140).
40
Conclusion
Our intent is not to introduce one new synthetic control to fix all synthetic
controls—there is much work still being done in the area, with different emphases
and approaches (Athey et al., 2017; Xu, 2017) and with the researcher community
still refining our understanding of how individual approaches map across each other
(Doudchenko and Imbens, 2016; Rothstein et al., 2018). Rather, we propose a
procedure that reasonably nests a common approach to running synthetic controls,
while allowing for parameter choices that control the weight given to stability
among the weighted donors—a dispersion-weighted synthetic control (DWSC). We
ultimately propose that researchers produce figures that plot estimated treatment
effects and pre-treatment MSEs across the feasible ranges of these stability
influencing parameters (something like Figure 13 or Figure 15), although “donor-
inclusion plots” (e.g., figures 12 and 16) are also informative.15
While we assume, throughout, that institutional knowledge will remain
informative in assessing the potential credibility of individual donors, to rely
on institutional knowledge (or ocular econometrics) to catch such events is not
efficiently repeatable, while leaving a dimension of scope available to researchers
that may itself be concerning (Ferman et al., 2018). To be clear, then, the value
in our procedure is not in either limit but rather in the behavior of the estimated
treatment effect across parameters—its sensitivity to the ex ante importance of
SC-stability, to the researcher or as is suitable in the environment. If there is a
reasonable stability to the synthetic control across the parameter spaces we allow
for, then our confidence should increase. (We imagine confidence in the California
15 We also anticipate that inference will be performed through permutation tests, as in Abadie
et al. (2010) and much of the synthetic-control literature to date. We leave the particulars of that
area of investigation to future work.
41
smoking result increasing here, for example.) Overall, if the figures we propose
reveal an instability in the synthetic control itself, or that point estimates change
even though pre-treatment MSE does not, or that estimated treatment is falling in
magnitude as the stability of the synthetic control increases—if any of these—then
we imagine additional limits being put on resulting conclusions.
(Python script to implement the above procedure is available from the authors’
webpages.)
42
CHAPTER III
THE UNINTENDED CONSEQUENCES OF SUPPLY SIDE DRUG
INTERVENTION: EVIDENCE FROM DEA CHEMICAL CLASSIFICATION
Introduction
The negative social consequences of illicit-drug markets are well
documented and wide ranging, stemming from the behavior of both traffickers and
users of the substances. Illegal drug markets are often associated with all manner
of crime. For example, Evans et al. (2018) finds that crack-cocaine markets are
associated with violence long after their inception, while Castillo et al. (2014) finds
that cocaine scarcity leads to violence in areas of Mexico most associated with
cocaine trafficking. Drug users (and addicts especially) are often thought to commit
crimes at a higher rate than non-users, and are especially prone to committing
financially motivated crime. Nurco et al. (1991) observes that the use of heroin and
cocaine are strongly associated with criminality, with use of cocaine in particular
being substantially higher than the general population among prisoners, parolees,
probationers, and arrestees.
In light of this, law enforcement agencies worldwide devote substantial
resources to disrupting the manufacture, transportation, and distribution of
illicit substances. These types of intervention are generally intended to decrease
drug use by reducing the availability of drugs, and weakening the structure of
their markets. By reducing supply, the price of drugs should increase, and the
equilibrium quantity consumed should decrease. This philosophy underlies an
enormous portion of state attempts to combat the negative consequences of drug
43
use. Unfortunately, this approach to policy may not work as intended. Rather than
substituting away from drug use, users may select into criminal activity (or increase
the intensity of their current criminal activity) to pay the higher prices.
This drug price/crime-effect is examined in Silverman and Spruill (1977)
where the authors find a positive elasticity of property crime with respect to heroin
prices in the city of Detroit. However, this analysis suffers from two problems. First
the analysis is constrained to a single city which may not represent behavior on
a larger scale. Second, as the price variation is not associated with supply side
movement, we cannot interpret this relationship causally. The positive correlation
of prices and crime may occur because increases in property crime yield higher drug
prices and not vice versa. 1
Here, I analyze a national supply side shock to the cocaine market in the
United States stemming from the DEA placing regulations on the manufacture
and distribution of sodium permanganate, a chemical used in the production of
cocaine. Similar shocks on the methamphetamine market are examined in Dobkin
and Nicosia (2009) and Dobkin et al. (2014). While the authors find some short
term effectiveness of pseudoephedrine regulation in terms of increasing prices and
decreasing purity, they find that prices and purity rebound relatively quickly to
levels close to their pre-treatment values. In terms of impacts on criminal behavior,
they report mild increases in robbery, but do not generally observe across the board
increases in financially motivated crime.
In this analysis, I leverage variation in addition to the supply shock, in the
form of geographic differences in addiction rates. This is driven by the observation
that areas with higher levels of cocaine addiction should be more sensitive –
1For example, increases in crime unrelated to drug prices could lead to higher income for drug
users, and thus higher drug demand.
44
in expectation – to national drug prices. When accounting for this additional
variation, I find that the Drug Enforcement Admistration’s (DEA) regulation of
sodium permanganate led to higher levels of property crime in areas with more
cocaine addiction, compared to areas with low addiction. This impact is on the
order of 6.8% - 9.1% of pretreatment property crime rates. No impact appears to
be present in violent crime, nor does it manifest when interacting the shock with
geographic variation in alcohol addiction, further suggesting that the systematic
variation observed in this analysis is particular to the cocaine drug channel.
In Section 3.2, I provide background on the policy variation in question, while
in Section 3.3, I describe the origin and structure of the data used for the analysis.
The approach to econometric identification is presented in Section 3.4. Empirical
results are presented in Section 3.5, and further discussed in Section 3.6. Section V
concludes.
Background
In December of 2004, the DEA proposed the classification of sodium
permanganate as a list II chemical under the Controlled Substances Act of 1970.
Classification of this nature institutes “Know your customer” responsibilities,
manufacturing inventory and use reports, 15 day advanced DEA notice for imports
and exports, effective security controls, and required reporting for unusual sales
or losses.2 These measures are intended to avoid diversion of the chemical to the
illicit production of cocaine. A public comment period lasted until May 2005,
2Sometimes shortened as KYC, know your customer laws require that producers and
distributors of the chemical verify the identity of their customers and assess the likelihood of
that the product is used for illegal purposes.
45
the decision was finalized in September 2006, and firms were expected to be in
compliance in December of 2006.
Motivating this proposal was sodium permanganate’s “direct substitutability
for potassium permanganate in the illicit production of cocaine, as well as recent
cocaine related drug busts where the chemical was found.3 Cunningham et al.
(2015) evaluates four different cocaine precursor chemical regulations of this
nature, and concludes that they did to various extents disrupt the cocaine market.
The sodium permanganate regulation in particular was found to be especially
effective in both decreasing purity and increasing prices. Thus, I hope to exploit
the variation in price resulting from this intervention, along with geographically
distributed measures of cocaine addiction, to assess any evidence of the drug
price/crime effect.
The magnitude of this effect will depend heavily on the elasticity of drug
demand. Casual drug users may be price sensitive, though users with addiction
problems are likely much less so. Saffer and Chaloupka (1999) find evidence of
extremely price inelastic demand for cocaine, estimating it at -0.28. This stands
in contrast to measures of price elasticity of methamphetamine which are found to
be substantially higher at -1.766 even among methamphetamine dependent users
Chalmers et al. (2009). Given the strong association with cocaine use and criminal
behavior, increases in drug price may merely place financial pressure on users of the
drug, and in doing so increase their probability of committing financially motivated
crime.
Cocaine use is often thought to be associated with persons of higher
socioeconomic status, who are much less likely to fund a substance habit with
3Potassium permanganate, a similar chemical, can also used in the production of cocaine, and
underwent the same classification much earlier, in 1989.
46
property crime. This may however be a misconception. Table 1 displays data
from the 2006 National Survey of Drug Use and Health (NSDUH) which conducts
a national survey with a wide variety of questions on drug use. When surveyed
on cocaine use in the last month, unemployed persons above the age of 18 had
the highest proportion of cocaine use (3.4%) compared to those with part time
employment (1.3%), and and those with full time employment (1.0%). For the same
question, those with less than a high school education had the highest proportion
of cocaine use in the last month as well (1.4%) compared to those with a high
school education (1.0%,) those with some college (1.3%), and those with a college
degree (0.7%). Further, use of crack cocaine, often associated with property crime,
was still high with 0.3% of persons reporting use in the past month, and 2.1% of
unemployed respondents reporting use in the past month. Both the proportion
of users and their demographic characteristics lend credence to the idea that
disruptions in the cocaine market could plausibly impact crime rates.
Data
The data used in this analysis come from a variety of sources. Crime data
are pulled from the FBIs Uniform Crime Reports (UCR), which collects known
index crimes from police agencies across the United States. They include measures
of overall property and violent crime, as well as the individual crime categories
that make up these aggregates. Because each individual policing agency can choose
when (and even if) to report, monthly data can be exceptionally noisy and prone
to measurement error. Crime measures are thus aggregated to the state year level,
necessitating that this become the unit of observation. Crimes are measured in
rates, specifically the number of crimes per 100,000 population. The individual
47
TABLE 1.
Cocaine Usage in 2006 (Percentage of Respondents NSDUH 2006)
Cocaine Crack Cocaine
Unemployed 3.4 2.1
Part Time 1.3 0.3
Full Time 1.0 0.2
< High School 1.4 0.5
High School Degree 1.0 0.2
Some College 1.3 0.4
College Degree 0.7 0.2
categories of crime considered are larceny, burglary, motor vehicle theft, robbery,
homicide, aggravated assault, and rape.
Addiction measures are constructed using the the Substance Abuse and
Mental Health Services Administrations (SAMHSA) Treatment Episode Data Set
– Admissions (TEDS - A).4 This dataset catalogues every individual admission
into a substance abuse rehabilitation center in the United States.5 A variety of
information is collected on each individual, including geographic identifiers and
substances related to their admission.
Economic controls are pulled from the Federal Reserve Economic Data
(FRED) website, while population and demographic information come from the
4The exact measures are explained in Section 3.4.
5Missing data in crucial years from the District of Columbia and Arizona exclude these states
from this analysis.
48
TABLE 2.
Pre-Treatment Summary Statistics
Mean Standard Deviation Minimum Maximum
Property Crime Rate 3499.738 858.7925 1767 5849.8
Larceny Theft Rate 2426.393 555.9958 1336 3977.2
Burglary Rate 707.0099 231.0313 309.3 1241.6
Motor Vehicle 366.3357 187.2067 94.5 1116
Theft Rate
Violent Crime Rate 405.9429 181.2505 78.2 828.1
Murder Rate 4.680272 2.499123 .6 13.2
Aggravated 262.0337 132.4219 42.6 627
Assault Rate
Rape Rate 33.08776 9.226726 13.9 55.5
Robbery Rate 106.1459 59.17499 6.8 257.2
Cocaine Addiction .2636611 .1245882 .0517621 .5195239
Proportion
Population 5878864 6373103 493754 3.62e+07
Percent White 83.47112 12.56999 24.29768 97.87047
Population
Percent Black 10.77166 9.625486 .4487852 36.99424
Population
Unemployment Rate 4.84657 1.092486 2.3 8.141666
Real Median 58103.99 8686.396 40116 79735
Household Income
State 5.302517 .7503486 2.65 7.35
Minimum Wage
Home Ownership 70.33571 5.098342 53.4 81.3
N 294
49
Surveillance, Epidemiology and End Results Program (SEER). These controls
include the unemployment rate, minimum wage, real median household income,
homeownership rate, and ethnic background for each state. The dataset is limited
to the years 2000-2011 in order to minimize conflict with other similar chemical
regulations occurring in the mid 1990s.
Identification
In this analysis, I wish to identify the effect of a supply side shock in the
cocaine market on financially motivated crime through a measure of the intensity of
cocaine addiction. The measure of cocaine addiction used is of central importance
to the analysis. While one may be tempted to measure addiction using the rate
of cocaine related admissions per 100,000 population, I argue that this measure
in isolation is problematic. The overall number of admissions in a state can stem
from a variety of factors other than the number of addicted users. Specifically,
state funding for substance abuse treatment, cultural attitudes towards treatment,
and the propensity for the justice system to refer offenders to treatment can all
substantially impact the number of admissions. In order to avoid merely measuring
the impact of overall treatment admissions, I construct a cocaine admission
proportion measure, dividing the number of cocaine admissions by the number
of overall admissions.6 This then measures the extent of cocaine use among the
“addicted” population and assures that identification is not driven instead by
variation in overall admissions. If the number of substance abuse admissions is
not truly a measure of the extent of overall addiction in a state (and is instead a
6While this is the addiction measure used in a majority of econometric specifications, I could
use the number of cocaine related admissions per 100,000 population and control for other, non-
cocaine related admissions. I find qualitatively similar results when estimating this model, which I
report in Table 3.
50
FIGURE 17.
Frequency histogram of cocaine admission proportion, the central measure of sensitivity
used in Table 4
function of the above-mentioned factors), the cocaine admission proportion is a
better measure of the extent of cocaine addiction in a location.
Figures 17 and 18 show the distribution of both this measure and its
components, while Figure 19 represents these measures geographically along with a
representation of the pre-post treatment difference in property crime.7 A noticeable
relationship between cocaine admission proportion and property crime differences is
present.
7Admissions per 100,000 population and Cocaine Admissions per 100,000.
51
FIGURE 18.
Frequency histogram of total admissions per 100,000 population (left) and total cocaine
admissions per 100,000 population (right), the denominator and numerator (respectively)
of the sensitivity measure above.
52
TABLE 3.
Property Crime Rate (2005 Cocaine Admissions per 100,000 population)
(1) (2) (3) (4)
Property Crime Rate Property Crime Rate Property Crime Rate Property Crime Rate
Cocaine Admissions × 0.767∗∗ 0.755∗∗ 0.821∗∗ 0.724∗∗
Post Treatment (0.349) (0.283) (0.327) (0.282)
Non Cocaine Admissions × -0.231 -0.310∗ -0.316 -0.307
Post Treatment (0.197) (0.174) (0.199) (0.187)
Effect Size 0.155 0.153 0.166 0.147
(Pre-Treatment SD)
Impact Size 3.92% 3.86% 4.20% 3.70%
(At Pre-Treatment Mean)
N 588 588 588 588
State & Year FE Y Y Y Y
Controls N N Y Y
State Specific Trends N Y N Y
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
53
FIGURE 19.
Geographic representations of treatment and outcome, treatment admissions per 100,000 population (upper left), cocaine 
related admissions per 100,000 (upper right), proportion of admissions cocaine related (lower left), and pre-post difference in 
property crime rates (lower right).
54
Figure 20 illustrates the benefits of this approach to identification. In the first
panel of Figure 20 I show the average property crime rate8 across all states in the
sample. No discernible change is apparent until observing the second panel, which
splits states into three separate terciles based on their cocaine admission proportion
(referred to as “Low addiction”, “Mid addiction”, and “High addiction”), and plots
the average behavior of their property crime rates over time—these too suggest
that the evolution of property crime is different across states with different rates
of cocaine admission. Specifically, this suggests that states with higher cocaine
addiction have higher property crime rates than those with lower cocaine addiction
in the post treatment period. The three groups seem to display relatively similar
pre-treatment behavior, not showing evidence of a pre-existing difference in their
property crime rates.
To identify this effect econometrically, I use cocaine admission proportions in
2005, and estimate the following difference-in-differences model:
Property Crime Rateit = β · Admission Proportioni,2005 × Post Treatmentt +
γ · Admission Proportioni,2005 + θ · Post Treatmentt + α ·Xit + δt + Fi + it,
(3.1)
where α · Xit is a vector of controls (as described in section 3.3), δt is a year fixed
effect, and Fi is a state fixed effect. The coefficient of interest is β, which measures
the impact that pre-treatment cocaine addiction proportion had on property crime
rates in the post treatment period. Interpretation of this coefficient requires scaling
8Property Crimes Per 100,000 population
55
FIGURE 20.
Time series of average property crime rates split across cocaine addiction terciles. National property crime rate average 
(upper left), national property crime rate average split across addiction terciles (upper right),national property crime rate 
average split across addiction terciles with pre-treatment mean subtracted (lower left), and national property crime rate 
average split across addiction terciles with pre-treatment trend removed (lower right)
56
by the specific state’s cocaine admission proportion. As such, any reported impact
or effect sizes will use the sample mean of this proportion.
Results
Estimating this model (as shown in Table 4) shows substantial increases
in property crime in higher cocaine admission proportion states (relative to those
with low cocaine admission proportion). The effect is robust across multiple
specifications, including those with and without controls, and those with and
without state specific time trends. The impact size measures from a 6.8% to a 9.1%
increase in property crimes per 100,000 per year, a considerable rise in crime.9 The
estimates remain effectively unchanged when weighting observations by population.
Turning to the dynamics of treatment, Figure 21 splits treatment into an
event study.10 There is little to no impact of cocaine admission proportion in the
pre-treatment years with a marked rise immediately following treatment that
continues into 2008, and then levels off (yet stays large and persistent) for the
remainder of the sample.
Dividing property crime into its components, I see statistically and
economically significant impacts on both larceny and burglary, with the impact
on motor vehicle theft being not statistically significant. When analyzing the
effects on violent crime, I see no effects on the overall rate, murder rate, or assault
rate. However, I do see significant increases in the robbery rate, and significant
decreases in rape. The across the board increases in financially motivated crime
(barring motor vehicle theft, which is still estimated as a positive impact) are
9Effects calculated at the pretreatment mean of both property crime and cocaine proportion.
10This specification includes state and year fixed effects, the full set of controls, and state
specific time trends.
57
FIGURE 21.
Event study, treatment defined as 2005 cocaine admission proportion, including state and
year fixed effects, state specific time trends, and controls. Estimated treatment
parameters are scaled into impact size for interpretability.
consistent with a disruption in the cocaine market driving users to financially
motivated crime. The disruption does not seem to impact violent crime, save for
the decrease in rape. However, this could still be consistent with an addiction story.
Cunningham and Shah (2017) finds that increases in prostitution stemming from
inadvertent legalization of indoor sex work led to decreases in rape. If rather than
larceny, some users turn to prostitution, this could still be a result of the supply-
side intervention due to increased prostitution.
58
TABLE 4.
Property Crime Rate (2005 Cocaine Admission Proportion)
(1) (2) (3) (4)
Property Crime Rate Property Crime Rate Property Crime Rate Property Crime Rate
Cocaine Proportion × 991.2∗∗ 999.3∗∗∗ 1222.2∗∗∗ 921.5∗∗∗
Post Treatment (430.0) (317.4) (415.3) (338.5)
Effect Size 0.292 0.294 0.360 0.271
(Pre-Treatment SD)
Impact Size 7.38% 7.44% 9.09% 6.86%
(At Pre-Treatment Mean)
N 588 588 588 588
State & Year FE Y Y Y Y
Controls N N Y Y
State Specific Trends N Y N Y
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
59
TABLE 5.
Property Crime Rate (2005 Cocaine Admission Proportion) (Population Weighted)
(1) (2) (3) (4)
Property Crime Rate Property Crime Rate Property Crime Rate Property Crime Rate
Cocaine Proportion × 879.9∗∗ 1056.4∗∗∗ 972.2∗∗ 967.0∗∗
Post Treatment (429.3) (336.6) (425.0) (363.6)
Effect Size 0.259 0.311 0.286 0.285
(Pre-Treatment SD)
Impact Size 6.55% 7.86% 7.23% 7.1%
(At Pre-Treatment Mean)
N 588 588 588 588
State & Year FE Y Y Y Y
Controls N N Y Y
State Specific Trends N Y N Y
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
60
Further Analysis
The cocaine addiction proportion in the above specifications is calculated
using only admissions from 2005, the last year before treatment. This is done
in order to gain an addiction measure as close in time to treatment as possible.
However, the estimated results could merely be an artifact of noisy or idiosyncratic
variation in that particular year of the TEDS-A dataset. In Figure 24 the main
regression is run with the addiction proportion calculated by successively adding
more and more pre-treatment years to the proportion measure. This should to
some extent smooth our measure of addiction. The estimated treatment effect
and its statistical significance change very little, if at all, offering reassurance
that results do not stem merely from the selection of the addiction rate in 2005
as treatment.
The estimated result could also stem from a factor related to using any
addiction proportion in this model. Were I to construct a similar measure for a
different substance and see similar statistical results, this would cast serious doubt
on the mechanism driving the estimates being truly related to the cocaine market
intervention. To test this, a specification identical to equation 3.1 is run, however
the admission proportion is now calculated with alcohol related admissions.
Small and statistically insignificant effects are found across the same variety of
specifications and are presented in Table 8.
In Figure 19, the geographic variation in cocaine admission proportion is
shown along side the geographic pre-post treatment change in property crime rates.
To better illustrate how this relationship translates into an estimated treatment
effect, these two quantities are included in a scatter plot in Figure 22. In order
to ensure that the estimated effect is not based entirely on the comparison of
61
TABLE 6.
Property Crime Rate (2005 Cocaine Admission Proportion)
(1) (2) (3) (4)
Property Crime Rate Larceny Theft Rate Burglary Rate Motorvehicle Theft Rate
Cocaine Proportion × 921.5∗∗∗ 628.7∗∗∗ 161.2∗∗ 128.8
Post Treatment (338.5) (214.7) (67.49) (78.71)
Effect Size 0.271 0.293 0.1866 0.141
(Pre-Treatment SD)
Impact Size 6.86% 6.77% 6.01% 8.74%
(At Pre-Treatment Mean)
N 588 588 588 588
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
All specifications include controls, state and year fixed effects, and state-specific time trends
62
TABLE 7.
Violent Crime Rates (2005 Cocaine Admission Proportion)
(1) (2) (3) (4) (5)
Violent Crime Rate Murder Rate Assault Rate Rape Rate Robbery Rate
Cocaine Proportion × 31.37 0.649 2.869 -14.63∗∗ 42.49∗∗
Post Treatment (49.91) (0.933) (34.55) (6.606) (19.65)
Effect Size 0.034 .031 0.005 -0.336 0.119
(Pre-Treatment SD)
Impact Size 1.89% 3.17% 0.27% -11.28% 9.65%
(At Pre-Treatment Mean)
N 588 588 588 588 588
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
All specifications include controls, state and year fixed effects, and state-specific time trends
63
extreme values, I slowly remove the bottom of the sample (states with low cocaine
admission proportion), and re-estimate equation 3.1. This obviously removes
observations and lowers the precision of the estimator, but the behavior of the
estimator across these different subsets should give insight on the estimator’s
reliance on low values of admission proportion. While the statistical significance
of the estimator does not withstand the entire procedure, it stays very similar in
sign and magnitude. Estimates and confidence intervals are shown in Figure 23.
FIGURE 22.
Pre-post property crime differences plotted against 2005 proportion of admissions cocaine
related.
64
FIGURE 23.
Gradual Removal of Low Addiction States, coefficient estimate vs. minimum cocaine
admission proportion, with number of observations represented in the bar graph
65
FIGURE 24.
Treatment effect estimates from specification calculating treatment with successively more
pretreatment years.
66
67
TABLE 8.
Property Crime Rate (2005 Alcohol Admission Proportion)
(1) (2) (3) (4)
Property Crime Rate Property Crime Rate Property Crime Rate Property Crime Rate
Alcohol Proportion × 291.6 -122.5 106.7 -120.4
Post Treatment (529.9) (343.4) (566.9) (385.8)
Effect Size 0.203 -0.085 0.074% -.084%
(Pre-Treatment SD)
Impact Size 5.13% -2.15% 1.87% -2.21%
(At Pre-Treatment Mean)
N 588 588 588 588
State & Year FE Y Y Y Y
Controls N N Y Y
State Specific Trends N Y N Y
Standard errors in parentheses
∗ p < 0.10, ∗∗ p < 0.05, ∗∗∗ p < 0.01
Conclusion
The evidence presented here raises significant concerns about the
continued use of supply-side drug policy. Using the estimated parameters, this
intervention led to roughly 699,913 more larceny cases per year in the United
States.11 Using the most-conservative estimates of the cost of crime collected in
McCollister et al. (2010), I find the yearly cost in larceny alone to be 563 million
dollars.
The changes in property crime are large, and indicate that policies of this
nature impose serious indirect costs on society. The direct costs of the policy,
from every indication seemed relatively minor, as DEA reports indicated that
manufacturers and distributors offered little to no resistance to the change.
However, the costs associated with the drug price/crime effect are tied directly
to the effectiveness of the policy itself, regardless of the manner in which it
is implemented. This implies that significant costs of this type of policy are
unavoidable.
Mitigation may be possible, through either law enforcement activity and
vigilance in areas most likely to be affected, or increased support and treatment
for addicts leading up to, or in the wake of a serious supply side intervention.
Generally, much of the negative consequences of illegal drug use could be avoided
by decreasing the number of addicted users, and would diminish the negative effects
of existing supply side policy. As it stands, merely placing pricing pressure alone
does not seem to achieve this goal, and merely perpetuates the criminality of these
individuals, with major costs spilling over to the rest of society. Moving forward, a
11There were roughly 6.78 million larceny cases reported in the UCR in 2005, the last year
before the treatment.
68
re-emphasis on, and re-allocation of resources to demand side drug policy seems a
more effective approach.
69
CHAPTER IV
BIAS REDUCTION THROUGH VARIABLE SELECTION
I would like to acknowledge Glen Waddell, who contributed to the
computational simulations and figure making, as well as the writing and
presentation of the paper. I developed to the procedure explained in the paper,
contributed to the computational simulation and figures, as well as the writing and
presentation of the paper.
Introduction
In the policy evaluation literature, omitted-variable bias is classically
described as the omission of some xi that correlates both with treatment
assignment (Di = 1) and with yi, such that it’s omission from a model of yi leads to
the effect of xi loading on to treatment assignment. Moreover, we fail to retrieve an
unbiased estimate of treatment to the extent corr(xi, i) =6 0 and corr(xi, Di) 6= 0.
Variables that meet these conditions are generally referred to as confounders.
It would suffice, of course, for corr(xi, i) or for corr(xi, Di) to be zero—in
theory we well understand that our omitted-variable bias (OVB) suspicions subside
with either. Yet, we note the heavy weighting in practice toward establishing only
that corr(xi, Di) = 0 before proceeding to consider causal interpretations of β̂.
Researchers report “balance tests,” for example, as demonstrations of there being
no significant differences in observable covariates across treatment and control
groups. To establish that such balance exists (fingers crossed) is typically the
entry fee paid by all empirical micro-economists. If race and gender do differ in
levels between the treated and control group, for example, and maybe education
70
a little bit, we include those factors in what follows. If they don’t, we often throw
them in anyway, with a nod to prior literature, or to theory, or to something. As a
notion, though, we are hopeful that not many observable factors differ, as it breeds
suspicion that unobservables may also differ systematically between treatment and
control, a problem from which there is little comeback. Random assignment of
treatment assuages this worry, of course. However, even then we practice similar
traditions.1
We suggest that a second, complementary practice be adopted as part
of this tradition. Clearly, we should be convinced of the causal interpretation
available whether it is through convincing ourselves that corr(x,D) = 0 or that
corr(xi, i) = 0. With the support of that logic, then, we recommend the use of
simple and approachable variable-selection techniques, focusing on the explanatory
power of potential covariates on the outcome in question. If certain covariates (or
functions covariates) are shown to be predictive of yi, they should be included
in estimation—in expectation, this can increase the precision with which we
estimate the effect of treatment, for example. To the contrary, if one focuses solely
on the correlation of treatment with covariates—as one tends to in balance-test
justifications for covariate selection—including a variable for which corr(xi, Di) 6= 0
but corr(xi, i) = 0 will needlessly decrease the precision with which we estimate
the effect of treatment. Regardless, the somewhat haphazard approach to covariate
selection—including a set of covariates whether or not they are balanced across
treatment and control—can easily be improved upon.
In the end, with the assistance of some machinery, we will demonstrate that
even biases that originate in the unobservable component of the data-generating
1 For more on balance tests—largely regarding their inappropriate use, frankly—see Mutz
et al. (2019); Begg (1990); Senn (1994).
71
process are at times partially correctable, making this approach to model selection
informative as a sensitivity exercise, if not profoundly powerful as a diagnosis
exercise.
In Section 4.2 we consider the methods and the computational apparatus we
have in mind. In Section 4.3 we consider several different simulated environments,
with known properties that allow us to walk through the technique in ways that
applied researchers might be inclined to think of these sorts of problems. We then
offer briefly concluding thoughts.
Model selection
There are many approaches to model selection, with procedures and criteria
such as likelihood-based selection criteria like the Bayesian Information Criterion
(BIC) (Schwarz, 1978) and Akaike Information Criterion (AIC) (Akaike, 1998), to
penalized-regression methods like Lasso (Tibshirani, 1996; Shortreed and Ertefaie,
2017) and Ridge regressions. Due to its widespread use, we focus on the Bayesian
Information Criterion.2
The BIC is increasing in the number of variables included in the model, k,
and is decreasing in fit, and takes the form
( )
· RSSBIC = n ln + k · ln(n) ,
n
2 Ertefaie et al. (2018) demonstrates a technique that considers outcome and treatment
assignment simultaneously.
72
where n is the sample size and RSS is the residual sum of squares—lower values
of BIC are preferred. In general, such a rule is an actionable procedure by which a
variable must justify its inclusion in the model by adding enough fit.3
In our simulated environments below, we will be running many iterations,
recording how often covariates and their interactions are chosen for inclusion.
Of course, running the procedure in practice produces a list of covariates and
interactions that are to be included in the predictive model—a binary indication—
having evaluated all possible models and arriving at the one producing the lowest
BIC.4
Simulations
Below, we will consider environments in which we have engineered omitted-
variable biases that originate in the observable domain—omitting a relevant
interaction of covariates, omitting a higher-order polynomial, and then a
combination of these two. We then consider bias originating in the unobservable
domain, where we again reflect on the ability to control meaningfully for variation
in outcomes that better enables the identification of the treatment parameter,
despite originating in the unobservable component.
In each of the simulated environments we consider here, one can imagine the
following operating as somewhat of a baseline data-generating process (DGP). In
3 While this is generally perceived as a need for a variable to be sufficiently predictive of
y, many criteria of this form exist—AIC and adjusted-R2 are other variants, but differ in the
magnitude of their penalty for variable inclusion. BIC has the largest penalty of the three, and is
thereby set up to choose the most parsimonious model.
4 While our simulated environments do not suffer from this problem, we imagine that in some
data-generating processes the set of all feasible models will be too large to be computationally
feasible. In such instances, we recommend forward or backward selection procedures, a stepwise
process that considers only a subset of potential models.
73
each setting, we run 2,500 iterations, drawing samples of 2,000 units each time. In
each of those 2,500 iterations, we assign treatment randomly to half the sample,
and randomly draw covariates xk from joint normals with means µk, variances σ
2
k,
and known correlations. For example, where true treatment is captured in τ , the
DGP may be
y = β0 + τ1(Treated) + Σ
4
k=1βkxk +  ,
having chosen parameters of βk, µk, σ
2
k and a variance-covariance matrix. That
variance-covariance matrix will be key, and with each scenario we will be explicit
about our definition of those relationships. Importantly, though, in every simulated
environment we produce, all co-variates pass a traditional balance test—more than
xk being equal in mean across treatment and control, but actually being equivalent
distributions. For example, in Figure 25 we reproduce the full distributions of each
covariate in a baseline DGP, having set β 2k = 1, µk = 0, σk = 1, and cov(xj, xk) =
0 ∀ j 6= k. As prescribed, treatment is random with respect to covariates.
OVB originating in non-linear transformations of observables
Omitted interactions
In Figure 26, we consider a case of bias entering through an omitted
interaction—we’ll use x2 and x3 for this—that correlates differently among treated
units. In each iteration of this setting we assign treatment randomly to half the
sample, and randomly draw covariates x1 through x4 from joint normals with
means µ = µ = µ = µ = 0, variances σ2 = σ2 = σ21 2 3 4 1 2 3 = σ
2
4 = 1, and correlations
as illustrated in Panel A—in essence, we keep things clean other than to introduce
some correlation among treated units in the interaction of x2 and x3, through which
74
the bias enters. The DGP is then,
y = β0 + τ1(Treated) + Σ
4
k=1βkxk + β23x2x3 +  ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = 1, and β23 = 1. With all co-variates passing
a balance test, the naive model we imagine running is then
y = β0 + τ1(Treated) + Σ
4
k=1βkxk +  .
In Panel B of Figure 26, we summarize 2,500 iterations of this DGP—in
particular, the frequencies with which variable selection has chosen each level
and potential interaction to be included in fitting a model of y. For ease, we’ve
highlighted the “offending” interaction that the BIC procedure has (rightly) chosen
to include in the model of y, 2,500 of 2,500 times—here, that variable is x2x3.
Notably, no other interaction has been included more often than 19 times.
In Panel C, we produce two kernel densities, one capturing the collection of
τ̂ identified in iterations of the naive model, and the other capturing the collection
of τ̂ identified in iterations of the model that includes the variables selected by
BIC. As we’ve suggested, in every case, this selection includes relaxing the implicit
restriction that β23 = 0, estimating a parameter on x23. In the small number
of iterations (25) where variable selection has included other interactions, we do
include them when they are chosen. Given the DGP we’ve engineered, the naive
model retrieves, as it should, an estimated treatment parameter that is biased
upward. However, following the prescriptions of the variable-selection model leads
to τ̂ that are centered around its true value (τ = 1). The a parameter is also
estimated more precisely.
75
FIGURE 25.
Distributional co-variate balance tests: Baseline data-generating process
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x4 from joint
normals with means µ1 = µ2 = µ
2 2 2 2
3 = µ4 = 0, variances σ1 = σ2 = σ3 = σ4 = 1, and cov(xj , xk) = 0 ∀ j 6= k. The DGP is
then
4
y = β0 + τ1(Treated) + Σk=1βkxk + ,
where β0 = β1 = β2 = β3 = β4 = 1.
76
Omitted higher-order polynomials
In Figure 27, we consider a variant of omitted interaction—the bias entering
through an omitted non-linearity (x21) that varies differently among treated units.
In each iteration of this setting the DGP is given by
y = β0 + τ1(Treated) + Σ
4 2
k=1βkxk + β11x1 +  ,
where β11 = .1, and all other parameters are unchanged. Again all co-variates pass
a balance test, so the naive model we imagine estimating is y = β0 + τ1(Treated) +
Σ4k=1βkxk + .
In Panel B of Figure 27, we summarize 2,500 iterations of the DGP and
model, and the associated frequencies with which variable selection has chosen
each level and potential interaction to be included in fitting a model of y. We’ve
again highlighted the “offending” omission, that the BIC procedure has chosen to
include in the model 2,500 of 2,500 times. (Here, no other interaction has been
included more often than 25 times.) In Panel C we produce kernel densities for the
naive and for the “preferred” model identified through variable selection—in most
instances, this simply includes adds x21, but in some instanced it also added other
interactions. Where the naive model retrieves an inflated notion of the efficacy of
treatment, following the variable-selection prescription has produced an unbiased
estimate of τ̂ = 1. As before, precision around this parameter has also increased.
Multiple omitted variables
In Figure 28, we consider bias entering through multiple omitted variables
(x1x2 and x
2
4) that vary differently among treated units. While we’re at it, we also
muddy up the variance-covariance matrix somewhat, which we report in Panel A—
we note that correlations differ for treated and control groups. In each iteration of
77
FIGURE 26.
Bias entering through an omitted interaction (of x2 and x3), which correlates differently
among treated units
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x4 from joint
normals with means µ 2 2 2 21 = µ2 = µ3 = µ4 = 0, variances σ1 = σ2 = σ3 = σ4 = 1, and correlations as illustrated in Panel A. The
DGP is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + β23x2x3 + ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = 1, and β23 = 1. With all co-variates passing a balance test, the naive model estimated
is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + .
Panel A: Covariate correlation
(i) Treated observations (ii) Control observations
Panel B: Inclusion probabilities Panel C: Treatment-effect estimates
78
this setting, the DGP is given by
y = β0 + τ1(Treated) + Σ
4
k=1βkxk + β12x1x2 + β44x
2
4 +  ,
where β12 = β44 = 1, and all other parameters are unchanged. Despite the
additional noise and complexity, variable selection has chosen to include x1x2 and
x24, with no other interaction being picked up more often than 25 times. In Panel
C, the kernel densities identify that the inclusion of these interactions corrects the
OVB present in the naive model.5
OVB through unobservables
In figures 29 and 30, we consider biases entering through unobservable
components. In these settings, we again assign treatment randomly to half the
sample, and randomly draw covariates from joint normals. However, here we draw
a total of five—x1 through x5.
6 In the DGP we adopt, we allow for a level effect of
x5, and its interaction with other covariates in the model,
y = β0 + τ1(Treated) + Σ
5 4
k=1βkxk + Σj=1βj5xjx5 +  .
5 Vansteelandt et al. (2012) address the case in which there is a large number of confounders
that each have small predictive power in y. In this case, the joint exclusion of these many small
contributors can lead to bias. To guard against this, one could, for example, force the inclusion
of the full set of covariates linearly, and then perform variable selection only on the higher-order
terms—this would result in a “no worse than” position compared to the default approach that is
absent variable-selection procedures.
6 We parameterize as we have before, with means µ1 = µ2 = µ3 = µ4 = µ5 = 0, variances
σ21 = σ
2 = σ22 3 = σ
2 2
4 = σ5 = 1, and correlations (all off-diagonals are nonzero) as illustrated in
Panel A of Figure ??. We also set τ = 1, β0 = β1 = β2 = β3 = β4 = β5 = 1, and set β15 = β25 =
β35 = β45 = 1.
79
FIGURE 27.
Bias entering through an omitted non-linearity (x21) that varies differently among treated
units
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x4 from
joint normals with means µ1 = µ2 = µ3 = µ4 = 0, variances σ
2 2
1 = σ2 = σ
2 = σ23 4 = 1 among control observations
and σ21 = 2, and σ
2
2 = σ
2
3 = σ
2
4 = 1 among treated observations. Correlations of zero for both treatment and control
observations. The DGP is then
4 2
y = β0 + τ1(Treated) + Σk=1βkxk + β11x1 + ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = 1, and β11 = .1. With all co-variates passing a balance test, the naive model
estimated is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + .
Panel A: Covariate correlation
(i) Treated observations (ii) Control observations
Panel B: Inclusion probabilities Panel C: Treatment-effect estimates
80
Supposing that x5 is unobservable to the econometrician, however, we’ve created a
setting in which the naive model remains y = β0 + τ1(Treated) + Σ
4
k=1βkxk + .
(Notably, all co-variates—x5 among them—still pass a balance test.) Moreover,
we’re now in a setting in which the variable-selection procedure, which is also
restricted to the observables, cannot weight x5 or its interactions directly.
In Figure 29 we hardwire into the data-generating process unambiguously
positive bias. As x5 and  correlate differently in treated and control units, the
estimate of τ is biased up. Yet, even though x5 is unobservable, it’s influence in y
is partially estimable—given its correlation with observable covariates, the variable-
selection routine better explains variation in y with weight on interactions of the
observables. Moreover, the same source of variation that plagues the estimation of
τ̂—that x co-vary differently among treated and control units—then allows for the
absorption of that variation so to not load onto τ̂ . In Figure 30 we instead hardwire
into the DGP an unambiguously negative bias—variable selection partially corrects
this bias.
In Figure 31 we consider a mix of offsetting biases, that net out to zero
(or come close, on average). In this case, we see evidence of gains in precision.
However, inclusion of a variable need not cancel offsetting biases equally, which can
result in τ̂ moving away from τ (Steiner and Kim, 2016). While direct diagnosis of
this problem is impossible, were different parameter estimates evident across this
and baseline specifications, one would imagine a certain lack of confidence in having
identified treatment.
81
FIGURE 28.
Bias entering through multiple omitted interactions (x1x
2
2 and x4) that vary differently
among treated units
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x4 from
joint normals with means µ1 = µ2 = µ3 = µ4 = 0, variances σ
2 2 2
1 = σ2 = σ3 = σ
2
4 = 1, and correlations as illustrated in
Panel A. The DGP is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + β12x1x2 + β34x3x4 + ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = 1, and β12 = β44 = 1. With all co-variates passing a balance test, the naive
model estimated is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + .
Panel A: Covariate correlation
(i) Treated observations (ii) Control observations
Panel B: Inclusion probabilities Panel C: Treatment-effect estimates
82
FIGURE 29.
Variable selection can reduce bias entering through unobservables: Positive bias
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x5 from
joint normals with means µ1 = µ2 = µ3 = µ4 = µ5 = 0, variances σ
2
1 = σ
2
2 = σ
2 2 2
3 = σ4 = σ5 = 1, and correlations as
illustrated in Panel A. The DGP is then
5 4
y = β0 + τ1(Treated) + Σk=1βkxk + Σk=1βk5xkx5 + ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = β5 = 1, and β15 = β25 = β35 = β45 = 1. As x5 is assumed to be
unobservable to the econometrician, the naive model estimated is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + .
Notably, all co-variates—x5 among them—pass a balance test.
Panel A: Covariate correlation
(i) Treated observations (ii) Control observations
Panel B: Inclusion probabilities Panel C: Treatment-effect estimates
83
FIGURE 30.
Variable selection can reduce bias entering through unobservables: Negative bais
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x5 from
joint normals with means µ1 = µ2 = µ3 = µ4 = µ5 = 0, variances σ
2
1 = σ
2
2 = σ
2 2 2
3 = σ4 = σ5 = 1, and correlations as
illustrated in Panel A. The DGP is then
5 4
y = β0 + τ1(Treated) + Σk=1βkxk + Σk=1βk5xkx5 + ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = β5 = 1, and β15 = β25 = β35 = β45 = 1. As x5 is assumed to be
unobservable to the econometrician, the naive model estimated is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + .
Notably, all co-variates—x5 among them—pass a balance test.
Panel A: Covariate correlation
(i) Treated observations (ii) Control observations
Panel B: Inclusion probabilities Panel C: Treatment-effect estimates
84
FIGURE 31.
Where there is no bias, variable selection increases precision
In each iteration we assign treatment randomly to half the sample, and randomly draw covariates x1 through x5 from
joint normals with means µ = µ = µ = µ = µ = 0, variances σ2 21 2 3 4 5 1 = σ2 = σ
2
3 = σ
2 2
4 = σ5 = 1, and correlations as
illustrated in Panel A. The DGP is then
5 4
y = β0 + τ1(Treated) + Σk=1βkxk + Σk=1βk5xkx5 + ,
where τ = 1, β0 = β1 = β2 = β3 = β4 = β5 = 1, and β15 = β25 = β35 = β45 = 1. As x5 is assumed to be
unobservable to the econometrician, the naive model estimated is then
4
y = β0 + τ1(Treated) + Σk=1βkxk + .
Notably, all co-variates—x5 among them—pass a balance test.
Panel A: Covariate correlation
(i) Treated observations (ii) Control observations
Panel B: Inclusion probabilities Panel C: Treatment-effect estimates
85
Conclusion
As applied researchers, we are surprisingly unguided in our approach to
variable selection. However, available procedures are low-cost, with minimal
risks, and can 1) identify sources of bias due to functions of observables, 2)
increase precision in estimated treatment effects by more-efficiently conditioning
of outcomes, or 3) to the extent they are correlate systematically with functions of
observables, proxy for bias-inducing variation originating in the unobservables.
86
CHAPTER V
CONCLUSION
In chapter 1, a technique is proposed for improving estimation of synthetic
controls, while in chapter 2, I find an important unintended consequence of supply
side drug policy. Finally, in chapter 3 a technique is proposed for improving
variable selection in regressions seeking to find treatment effects. Together, these
papers contribute to the discipline of causal inference in economics.
87
REFERENCES CITED
Abadie, A., Diamond, A., and Hainmueller, J. (2010). Synthetic control methods
for comparative case studies: Estimating the effect of California’s tobacco
control program. Journal of the American Statistical Association,
105(490):493–505.
Abadie, A., Diamond, A., and Hainmueller, J. (2015). Comparative politics and the
synthetic control method. American Journal of Political Science,
59(2):495–510.
Abadie, A. and Gardeazabal, J. (2003). The economic costs of conflict: A case
study of the basque country. American Economic Review, 93(1):113–132.
Akaike, H. (1998). Information theory and an extension of the maximum likelihood
principle. In Selected papers of hirotugu akaike, pages 199–213. Springer.
Athey, S., Bayati, M., Doudchenko, N., Imbens, G., and Khosravi, K. (2017).
Matrix completion methods for causal panel data models.
Athey, S. and Imbens, G. (2017). The state of applied econometrics: Causality and
policy evaluation. Journal of Economic Perspectives, 31(2):3–32.
Begg, C. B. (1990). Significance tests of covariate imbalance in clinical trials.
Controlled clinical trials, 11(4):223–225.
Castillo, J., Mej́ıa, D., and Restrepo, P. (2014). Scarcity without leviathan: The
violent effects of cocaine supply shortages in the mexican drug war.
Chalmers, J., Bradford, D., Jones, C., et al. (2009). How do methamphetamine
users respond to changes in methamphetamine price? BOCSAR NSW Crime
and Justice Bulletins, page 16.
Cunningham, J. K., Callaghan, R. C., and Liu, L.-M. (2015). Us federal cocaine
essential (precursor) chemical regulation impacts on us cocaine availability:
an intervention time–series analysis with temporal replication. Addiction,
110(5):805–820.
Cunningham, S. and Shah, M. (2017). Decriminalizing indoor prostitution:
Implications for sexual violence and public health. The Review of Economic
Studies, 85(3):1683–1715.
Cunningham, S. and Shah, M. (2018). Decriminalizing indoor prostitution:
Implications for sexual violence and public health. The Review of Economic
Studies, 85(3):1683–1715.
88
Dobkin, C. and Nicosia, N. (2009). The war on drugs: methamphetamine, public
health, and crime. American Economic Review, 99(1):324–49.
Dobkin, C., Nicosia, N., and Weinberg, M. (2014). Are supply-side drug control
efforts effective? evaluating otc regulations targeting methamphetamine
precursors. Journal of Public Economics, 120:48–61.
Doudchenko, N. and Imbens, G. (2016). Balancing, regression,
difference-in-differences and synthetic control methods: A synthesis. Working
Paper 22791, National Bureau of Economic Research.
Ertefaie, A., Asgharian, M., and Stephens, D. A. (2018). Variable selection in
causal inference using a simultaneous penalization method. Journal of Causal
Inference, 6(1).
Evans, W. N., Garthwaite, C., and Moore, T. J. (2018). Guns and violence: The
enduring impact of crack cocaine markets on young black males. Technical
report, National Bureau of Economic Research.
Ferman, B. and Botosaru, I. (2017). On the role of covariates in the synthetic
control method.
Ferman, B. and Pinto, C. (2016). Revisiting the synthetic control estimator.
Ferman, B., Pinto, C., and Possebom, V. (2018). Cherry picking with synthetic
controls.
Hansen, B., Miller, K., and Weber, C. (2017). Drug trafficking under partial
prohibition: Evidence from recreational marijuana.
Kaul, A., Klößner, S., Pfeifer, G., and Schieler, M. (2018). Synthetic control
methods: Never use all pre-intervention outcomes together with covariates.
McCollister, K. E., French, M. T., and Fang, H. (2010). The cost of crime to
society: New crime-specific estimates for policy and program evaluation. Drug
and alcohol dependence, 108(1):98–109.
Mutz, D. C., Pemantle, R., and Pham, P. (2019). The perils of balance testing in
experimental design: Messy analyses of clean data. The American
Statistician, 73(1):32–42.
Nurco, D. N., Hanlon, T. E., and Kinlock, T. W. (1991). Recent research on the
relationship between illicit drug use and crime. Behavioral Sciences & the
Law, 9(3):221–242.
Rothstein, J., Ben-Michael, E., and Feller, A. (2018). The role of the propensity
score in the synthetic control method.
89
Saffer, H. and Chaloupka, F. (1999). The demand for illicit drugs. Economic
inquiry, 37(3):401–411.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics,
6(2):461–464.
Senn, S. (1994). Testing for baseline balance in clinical trials. Statistics in
medicine, 13(17):1715–1726.
Shortreed, S. M. and Ertefaie, A. (2017). Outcome-adaptive lasso: Variable
selection for causal inference. Biometrics, 73(4):1111–1122.
Silverman, L. P. and Spruill, N. L. (1977). Urban crime and the price of heroin.
Journal of Urban Economics, 4(1):80–103.
Steiner, P. M. and Kim, Y. (2016). The mechanics of omitted variable bias: Bias
amplification and cancellation of offsetting biases. Journal of causal inference,
4(2).
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of
the Royal Statistical Society: Series B (Methodological), 58(1):267–288.
Vansteelandt, S., Bekaert, M., and Claeskens, G. (2012). On model selection and
model misspecification in causal inference. Statistical methods in medical
research, 21(1):7–30.
Xu, Y. (2017). Generalized synthetic control method: Causal inference with
interactive fixed effects models. Political Analysis, 25(1):57–76.
90