statistics

Two-Part Model with Bootstrap using R

In this article, I wanted to expand on a previous post that describes using a two-part model to model cost (or total expenditure) as an outcome with data from the Agency for Healthcare Research and Quality (AHRQ) Medical Expenditure Panel Survey (MEPS). In the previous article, I used the twopartm package, which is great at leveraging the two-part model approach. However, it does not appear to handle data from complex survey designs like MEPS.

The best way to handle complex survey design data with weights using the two-part model approach is to perform the estimations for each part separately and then combine them.

With a little help from some AI chatbots, I was able to construct a viable code that not only estimates and combines both parts of the two-part model, but also allows me to bootstrap the results to generate 95% confidence intervals (CI).

The complete article on how to construct a two-part model with bootstrap using R is available on my RPubs site (link)

One-sample z-test of proportions in R

There are situations where you will be asked to compare the performance of your institution with another institution. This is commonly done with projects that I’m on where data collection occurs at a single site, and stakeholders want to compare the single site’s findings with a reference site. More commonly, stakeholders want to compare their performance to a published paper’s findings. In other words, we want to compare an observed finding to a theoretical one.

In the case of proportions, we can compare the proportion of individuals who experienced an event in single site to the proportion from a published study. To do that, we can use the one-sample z-test of proportions.

I wrote a guide on how to perform one-sample z-test of proportions to determine if the proportion of events observed is significantly different from an expected proportion, which is available on my RPubs site (link).

R - Tips and Tricks (Guide) - Part 2

I wrote a second R guide to help students navigate and use R and RStudio in their biostatistics course. I focused on creating vectors, matrices, and dataframes.

The guide can be found on my RPubs site.

Ratio of risk ratios in R

I ran into a problem where I had two risk ratios, but I wanted to evaluate the statistical difference between them. I couldn’t find an R package, but I found a paper by Altman and Bland that go over the step-by-step process. I wrote a tutorial on how to perform this method using R, which is available on my RPubs page (link).

Reference:

Altman DG, Bland JM. Interaction revisited: the difference between two estimates. BMJ. 2003 Jan 25;326(7382):219. doi: 10.1136/bmj.326.7382.219. PMID: 12543843; PMCID: PMC1125071.

Propensity score matching in R

I wrote an introductory tutorial on how to perform propensity score matching using R, which has been posted on my RPubs site (link).

Propensity score matching is a statistical approach to balancing the observed covariates between groups. In observational studies, this method has the potential to mitigate potential confounding and allow us to make causal interpretations. However, there are a lot of approaches and nuances. This intorductory tutorial presents the basics of propensity score methods and how we can use these in our conventional analyses.

Stata - marginsplot & mplotoffset commands for plotting average marginal effects

In Stata, users have a lot of flexibility with creating plots, particularly after the margins command has been executed. Once a regression command has been run, users can estimate the average marginal effect of a factor with respect to another variable using the margins command in Stata. Once the average marginal effect has been estimated, users can plot this using the marginsplot or mplotoffset commands. These are power tools that allow us to visualize the average marginal effects, particularly when we have interaction terms.

I posted a tutorail on my RPubs site that revieweed some basic features of the marginsplot and mplotoffset commands and provide some practical examples of customization.

Prepost analysis with continuous data using R - Part 1

I wrote a tutorial on how to perform simple prepost analysis using R, which is available on my RPubs page. It covers how to compare two differences (change in value before and after an interention) using independent t test and linear regression approaches. However, it doesn’t cover how to address correlation between two dependent values. Part 2 of prepost analysis will cover those issues.

Linear spline (piecewise) models in Stata

I wrote a tutorial on how to construct linear spline (also known as piecewise) models using Stata, which has been uploaded to my RPubs site.

Previously, I have developed tutorial on using the linear spline method for interrupted time series analsyis with Stata. However, I did not properly go over the mkspline commands.

In this tutorial, I review the mkspline command and the marginal option to generate coefficients that could be interpreted as the slope within each segment or the change in slope between segments, respectively.

Staggered difference-in-differences using R

I was interested in learning how to apply the Callaway & Sant'Anna staggered difference-in-differences framework to my work. After reading several papers and watching the video by Sant'Anna, I wrote a short tutorial on how to apply this framework to a simulated data. The tutorial is located on my RPubs site.

This is a unique method that used the R “did” package, which is based on the paper by Callaway & Sant’Anna.