Applying Propensity Score Methods with Elizabeth Stuart PhD

Sarianne GruberBy Sarianne Gruber
Twitter: @subtleimpact

A statistician by training, Elizabeth Stuart, PhD is a professor of Mental Health, Biostatistics, and Health Policy and Management at Johns Hopkins Bloomberg School of Public Health, and in the age of Electronic Health Records (EHRs), her research on developing and applying methods to estimate causal effects is a hot button topic. Having recently read Dr. Stuarts’s article, Estimating Causal Effects in Observational Studies Using Electronic Health Data: Challenges and (Some) Solutions, she sheds light on the new dilemmas that many researchers are encountering with EHR data.  The opportunity to answer questions with healthcare’s “Big Data” is infinite, but the non-randomized EHR data requires a different approach than the statistics used in a conventional clinical trial design. At the International Conference on Health Policy Statistics, I had the chance to meet with Dr. Stuart and learn the differences between  EHR and clinical trial data and which statistical methods help to better estimate “causal effects” in non-experimental EHR data studies.

Here is the continuation of Dr. Elizabeth Stuart’s explanations and methodological approach.

A Case Management Scenario:   Let’s use a case management example to illustrate how propensity methods are applied.  We have a physician practice implementing a new case management program and it is available to everyone in the practice.  We can imagine that the practice has data on people that did participate and another group of people that didn’t.  We would have some sort of index date, for example, the date that they started participating.  We also know the characteristics of the people who began participating: age, gender, co-morbidities, depression scale scores and measures of their clinical symptoms.   Propensity score methods help to equate the treatment and comparison groups on the (selected) set of observed characteristics.   Essentially, the propensity score, itself, is the probability of receiving a “treatment”, and we fit a model where we predict “receiving case management” as a function of these observed characteristics.   The propensity model is used to find these individuals who did not receive the program but who look similar to those who did.  For example, for every person in the case management program, we try to find or “match” someone with a similar propensity score who was not in the case management program.   By doing this, we should be able to find groups that are similar to each other with respect to the observed characteristics, except for the fact they were in case management versus not.

Propensity Score Methods: Weighting is another way of using the propensity score.  This is similar in spirit to survey sampling weights,  where we weight a sample to look like some population. Propensity scores can be used in a similar way where we weight  the people who didn’t get the case management to look like the people who did.  Again, we are using the propensity scores as a tool to try to create groups that look similar to one another on the observed characteristics. When looking at the outcomes and seeing if people who got case management have better clinical outcomes six months later,  rather than just running a regression and predicting that as a function of whether they had case management, propensity score methods are used to kind of  “almost preprocess” the sample to make the sample look more similar.  The regression strategy where we just try to adjust for the covariates in a regression model can lead to bias if the people who got case management look very different from the people who don’t, particularly if they are a lot sicker or many more co-morbidities. Propensity score methods like weighting or matching are a way to bring the groups and make them look more like a randomized trial (at least with respect to observed characteristics) and then once we have done that, then we compare outcomes in those groups that have sort of been equated using the propensity score.

Theory:  Usually the propensity score is estimated using logistic regression.  Predicting participation in case management as a function of observed characteristics, so those characteristics may combine in different ways to give  two people the same probability of participating in case management, but there underlying characteristics might be somewhat different. The theory underlying propensity models shows that is ok.   In the analysis that we are doing is not really comparing within a pair so we don’t directly compare these two people that were matched, but rather we are comparing groups of individuals.  The theory shows that by using these techniques you can find groups of individuals whose distribution of the covariates are similar, so the average age would be similar and the average number of comorbidities would be similar between the case management and non case management groups even if individual pairs may actually look different on those specific variables.

Strategies:  Rank-ordering propensity scores and creating deciles is one strategy called sub-classification or stratification.  The idea would be everyone in case management and those not in case management had a propensity score. So we create deciles based on the propensity score. Take the  ten percent of the sample with the lowest propensity scores, and we can come outcomes between the case management and non-case management groups in that decile. And repeat that for each of those deciles.  The main strategies for propensity models  are sub-classification, matching and weighting.  Matching is very easy to understand, and it is the strategy for each case management person or each treated person you find one non-treated person who has a similar propensity score.  This one is easy for people to understand. It is very obvious that you are creating these groups that look similar.  You don’t’ need a lot of program participants, although it works best if you have a large number of comparison individuals so there are an enough people to choose from to find the matches.

Summary:  I would like to mention the real benefits of using this data. The drawback as we have been talking about is that you don’t randomize.  You are not guaranteed that the groups are similar and there may be information that is unobserved in the EMR data that differs between the participants and nonparticipants.  That is sort of the Achilles heel of non-experimental studies.  However, one of the real benefits of these data sources is that they are so comprehensive in terms of the patients they include, we are able to make inferences to much more representative and much more policy relevant groups then is generally possible in a randomized trial.   Your sample in the trial may be very restrictive and really not reflect what is really happening in the real world.  I think propensity score methods are a great tool for people to use and to study how things are working in actual practice and more “real world” settings.

For those want to start learning about these models, I would recommend a new text book Causal Inference for Statistics, Social, and Biomedical Sciences:  An Introduction by Guido Imbens and Don Rubin.   Also, resources like tutorials, papers and webinars that you can listen to for a one or two hour introduction.  I also suggest short courses taught at conferences like ICHPS and Academy health.

Recommended reading: Estimating Causal Effects in Observational Studies Using Electronic Health Data:  Challenges and (Some) Solutions, Elizabeth A. Stuart, PhD et al.

AcademyHealth webinar: Applied Propensity Score Analysis I & II featuring Dr. Elizabeth Stuart and Dr. Michael Oakes.