An Exploratory Data Analysis for Average Treatment Effect Estimation based on Partial Balancing and Simultaneous Inference of Regression Models
In order to provide significant outcomes, it is imperative that health care professionals, medical practitioners and policy-makers acquire evidence of the effectiveness of different treatments and programs. This is most commonly done by looking at treatment and control groups and determining if the treatment has a causal effect on the outcome. Ideally, treatment assignment is performed through randomization so that the groups formed are comparable with respect to their features. However, some factors such as cost, time, and ethical issues behind the treatment, may make it difficult to assign treatments at random. This leads to the use of observational studies instead of randomized studies in assessing the causal effect. While an observational study has the same intent as any randomized study, which is to estimate a causal effect, it differs in one major design issue: the lack of randomization in the allocation of units in the treatment and control groups. Due to this, systematic differences in the covariates of the treatment and control groups may exist which pose an inherent problem in estimating average treatment effect (ATE). While the use of propensity score is standard in this kind of situation, in this paper, an exploratory method is presented to determine the ATE in groups that are made homogeneous apriori with respect to the categorical variables. The proposed method begins with forming homogeneous subgroups in terms of qualitative features which reduces any bias induced by systematic differences in the covariates between groups. Regression models on the treatment effect given the continuous covariates are then separately generated for the treatment and control groups. The magnitude of difference in both models is then determined using simultaneous inference in regression, which creates confidence bands that provide graphical representations of the ATE on subgroups formed. This procedure presents an advantage in being able to determine the nature of covariates where the ATE is significantly positive; hence, one is able to provide effective solutions to a more personalized level. Two real data sets are utilized to illustrate the proposed procedure. The data analyses show the capability of the method to establish for which covariate regions the treatment is effective or not.
Hoel PG. Confidence regions for linear regression. Proceedings of the Second Berkeley Symposium University of California Press, 1951.
Lalonde R. Causal effects in non-experimental studies: reevaluating the evaluation of training programs. J Am Stat Assoc. 1999; 94(448): 1053-1062.
Liu W. Simultaneous inference in regression. 1st ed. Florida: CRC Press Taylor & Francis Group; 2010.
Liu W. Jamshidian, M. Zhang, Y. and Donnelly, J. Simulation-based simultaneous confidence bands in multiple linear regression with predictor variables constrained in intervals. J Comput Graph Stat. 2005; 14(2): 459-484.
Rosenbaum PR, Rubin, DB. The central role of the propensity score in observational studies for causal effects. J Royal Stat Soc. 1983; 70(1): 41-45.
Stewart PW. The graphical advantages of finite interval confidence band procedures. Commun Stat Theory Methods. 1991; 20(12): 3975-3993.
Scheffe H. A method for judging all constraints in analysis of variance. Biometrika. 1953; 40(1/2): 87-104.
Scheffe H. The analysis of variance. New York: Wiley; 1959.
SAS Clinical Trial Data Portal, Eli Lilly Study. USA; 2018.