R programming

Literary Cafe series: Policy analysis (Part 2) - Interrupted Times Series Analysis with publicly available data

I’m back with some Literary Cafe series updates.

I have regularly informal discussions with my students about interesting papers in the biomedical sciences. Recently, we discussed a great paper by Jurecka and colleagues on the impact of a state-wide law to change the definition of fentanyl possession on opioid-related overdose death rates.

Jurecka and colleagues used publicly available data to perform their research, and I wanted to show my students how this was done using CDC WONDER data. Hence, I started this Literary Care series to document these exercises for others to learn from.

Last month, I wrote an article on how to get data from the CDC WONDER site, which you can read here. I considered this Part 1 (Getting the data).

This is the second part of a two-part series that illustrates how to use publicly available data to replicate the findings from a published study. In Part 2, I use the data from Part 1 to analyze the impact of the statwide fentanyl possession law on opioid-related overdose death rates using an interrupted time series analysis. I posted this on my RPubs site (link) along with part 1 (link).

R - Tips and Tricks (Guide) - Part 2

I wrote a second R guide to help students navigate and use R and RStudio in their biostatistics course. I focused on creating vectors, matrices, and dataframes.

The guide can be found on my RPubs site.

Ratio of risk ratios in R

I ran into a problem where I had two risk ratios, but I wanted to evaluate the statistical difference between them. I couldn’t find an R package, but I found a paper by Altman and Bland that go over the step-by-step process. I wrote a tutorial on how to perform this method using R, which is available on my RPubs page (link).

Reference:

Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ. 2003 Jan 25;326(7382):219. doi: 10.1136/bmj.326.7382.219. PMID: 12543843; PMCID: PMC1125071.

Transform data from wide to long format using R

Often, when we input data into a spreadsheet, we use the wide format where the sequence of variables are ordered according to the columns. But when we perform longitudinal analyses, we need to transform this to the long format.

Sometimes, I forget how to do this in R, so I decided to write a tutorial to remind myself how to do this.

Therefore, I wrote a tutorial on using the pivot_longer() function to transform data from the wide to long format in preparation for longitudinal data analysis. The tutorial is located on my RPubs page.

Propensity score matching in R

I wrote an introductory tutorial on how to perform propensity score matching using R, which has been posted on my RPubs site (link).

Propensity score matching is a statistical approach to balancing the observed covariates between groups. In observational studies, this method has the potential to mitigate potential confounding and allow us to make causal interpretations. However, there are a lot of approaches and nuances. This intorductory tutorial presents the basics of propensity score methods and how we can use these in our conventional analyses.

Prepost analysis with continuous data using R - Part 1

I wrote a tutorial on how to perform simple prepost analysis using R, which is available on my RPubs page. It covers how to compare two differences (change in value before and after an interention) using independent t test and linear regression approaches. However, it doesn’t cover how to address correlation between two dependent values. Part 2 of prepost analysis will cover those issues.

Staggered difference-in-differences using R

I was interested in learning how to apply the Callaway & Sant'Anna staggered difference-in-differences framework to my work. After reading several papers and watching the video by Sant'Anna, I wrote a short tutorial on how to apply this framework to a simulated data. The tutorial is located on my RPubs site.

This is a unique method that used the R “did” package, which is based on the paper by Callaway & Sant’Anna.

Mediation analysis using R

It’s not uncommon to see covariates in a regression model that should not be there. For example, measurements that occur after the treatment assignment are included into a regression model as baseline covariates. Rather, one should consider a mediation analysis.

I wrote a tutorial on how to perform mediation analysis using R on my RPubs site (link).

I know that I make this mistake at times. This tutorial helped me to carefully consider which covariates to include in a regression model and which ones to consider for mediation analysis.