I wrote a tutorial on how to perform simple prepost analysis using R, which is available on my RPubs page. It covers how to compare two differences (change in value before and after an interention) using independent t test and linear regression approaches. However, it doesn’t cover how to address correlation between two dependent values. Part 2 of prepost analysis will cover those issues.
Some cool website on study design and biostatistics
This month (November 2024), I wanted to take a break from writing tutorial and articles. Instead, I wanted update myself on (and share) some very helpful/useful online resources.
A colleague of mine introduced me to website called Datamethods. It’s mainly a discussion forum, but it has some useful resources. This particular post contains references that are very useful for anyone who is interested in study design and biostatistics (link). It is a collection of papers and articles that addresses common myths and practices regarding the application of biostatistics in study designs.
Another great website is Scott Cunningham’s Mixed Taped Sessions. He has a book called Mixed Taped Session about causal inference, and he has regular workshops. I attended his Causal Inference Part 2 workshop, and it was amazing. We learned about the basic difference-in-differences methods (coding in R and Stata), and the innovations surrounding these methods (e.g., Callaway & Sant’Anna’s staggered difference-in-differences approach). Scott also provides the historical perspectives on these methods, which are insightful as they are entertaining. Moreover, he conducts interviews with prominant econometricians, which he posts on his YouTube channel.
Hopefully, these sites are useful for you as they have been for me.
Staggered difference-in-differences using R
I was interested in learning how to apply the Callaway & Sant'Anna staggered difference-in-differences framework to my work. After reading several papers and watching the video by Sant'Anna, I wrote a short tutorial on how to apply this framework to a simulated data. The tutorial is located on my RPubs site.
This is a unique method that used the R “did” package, which is based on the paper by Callaway & Sant’Anna.
Mediation analysis using R
It’s not uncommon to see covariates in a regression model that should not be there. For example, measurements that occur after the treatment assignment are included into a regression model as baseline covariates. Rather, one should consider a mediation analysis.
I wrote a tutorial on how to perform mediation analysis using R on my RPubs site (link).
I know that I make this mistake at times. This tutorial helped me to carefully consider which covariates to include in a regression model and which ones to consider for mediation analysis.
MEPS tutorial on interrupted time series analysis in R
I wrote a short tutorial on how to perform an interrupted time series analysis in R. I had a challenging time working on this because I wasn’t familiar with all the nuances of the ITSA. More importantly, I wasn’t able to leverage my Stata skills to do this in R. I’m used to the Stata margins command, which is great for creating constrasts. R has its own version of the margins command, but it lacks some of Stata’s features such as the pwcompare, which I use a lot in Stata. However, I found a workaround with linear splines, and I have uploaded this to my RPubs site (link). I hope you find this useful. I also saved my R Markdown code on my GitHub site (link).
MEPS tutorials on linkage files and trend analysis
I create two MEPS tutorials recently. One is on the use of condition-event linkage files to capture the disease-specific costs. I used migraine as a motivating example. In this tutorial, I go through the steps to identify migraine-related costs assocaited with office-based visits and inpatient night stays. In the second tutorial, I review how to perform simple trend analysis with linear regressio models. I pooled MEPS data from 2016 to 2021 and apply the approriate primary sampling units and strata from the pooled file.
The first tutorial is located on my RPubs page (MEPS Tutorial 4 - Using condition-event link (CLNK
) file: A case study with migraine). The R Markdown code to create the tutorial is located in my GitHub repository (link).
The second tutorial is also located on my Rpubs page (MEPS Tutorial 5 - Simple Trend Analysis with Linear Models). The R Markdown code to create the tutorial is located in my GitHub repository (link).
Interrupted time series analysis (ITSA) with Stata
Interrupted time series analysis (ITSA) is a study design used to study the effects of an intervention across time. An important feature of the ITSA is the time when the intevention occurs. The time before and after the intervention are of interest because we want to visualize if the trends are similar or different. Additionally, we want to visualize the change immediately after the intervention is implemenated. I call this period the index date.
In this article, I’ll review the single-group ITSA and multiple groups ITSA. Then I’ll review how to perform an ITSA in Stata
.
You can view the complete tutorial on my RPubs site.
Tweedie GLM model in R for Cost Data
I wrote a tutorial on using a Tweedie distribution for a GLM gamma model for cost data in R. Unlike Stata, R is very particular with zeroes when constructing GLM models. Hence, I opted to use the Tweedie distribution to mix and match the link function with the Gamma distribution. I settled on the identity link because it doesn’t involve retransformation and is each to interpret.
My tutorial is available on my RPubs site and GitHub site.
Two-part models in R - Application with cost data
I created a tutorial on how to use two-part models in R for cost data. I used the healthcare expenditures from the Medical Expenditure Panel Survey in 2017 as a motivating example. Normally, I use Stata when I construct two-part models. But I wanted to learn how I could do this in R. Fortunately, R has a package called twopartm that was developed by Duan and colleagues. You can find their document for the twopartm package here.
The tutorial I created is located on my GitHub page and RPubs page.
Interpreting regression models
I wrote a short explanation on how to interpret regression models.
I have posted this on RPubs, and the code is saved on my GitHub site.