Statistics & Probability

Exact matching using R - MatchIt package

Recently, I was asked to help create a matching algorithm for a retrospective cohort study. The request was to perform an exact match on a single variable using a 2 to 1 ratio (unexposed to exposed). Normally, I would use a propensity score match (PSM) approach, but the data did not have enough variables for each unique subject. With PSM, I tend to build a logit (or probit) model using variables that would be theoretically associated with the treatment assignment. However, this approach requires enough observable variables to construct these PSM models. For this request, there were a few variables for each subjects; the only variable available were the unique identifier, site, and a continuous variable.

This problem led to a tutorial on how to perform an exact match using the MatchIt package in R, which can be viewed here in my RPubs page.

In this tutorial, you will learn how to perform an exact match with a single variable using a hypothetical dataset with 30 subjects.

Tweedie GLM model in R for Cost Data

I wrote a tutorial on using a Tweedie distribution for a GLM gamma model for cost data in R. Unlike Stata, R is very particular with zeroes when constructing GLM models. Hence, I opted to use the Tweedie distribution to mix and match the link function with the Gamma distribution. I settled on the identity link because it doesn’t involve retransformation and is each to interpret.

My tutorial is available on my RPubs site and GitHub site.

Sample size estimation using the odds ratio in a case-control study

I wrote a short tutorial on how to use an odds ratio to estimate the sample size needed for a case-control study.

The tutorial is located on my RPubs page (link)

The R Markdown source code is located on my GitHub site (link)

Survival analysis in R

I wrote a tutorial on survival analysis using R, which is located on my RPubs page. The R Markdown code is located on my GitHub site.

I provide an introduction to survivor and hazard functions, Kaplan-Meier curves, and Cox proportional hazards model.

R tutorials on confounding/interaction and linear regression model - Updates

Last year, I created several tutorials on how to use R for identifying confounding/interaction and visualizing linear regression models. I updated these tutorials recently to address some errors and mistakes. They have received new hyperlinks:

R tutorial on confounding and interactions using the epitool and epiR packages is located on my RPubs page here. The R Markdown code is located on my GitHub page here.

R tutorial on linear regression model is located on my RPubs page here. The R Markdown code is located on my GitHub page here.

Sample size estimation and Power analysis in R

I wrote a tutorial on how to perform sample size estimations and power analysis using R “pwr” package. These are simple examples that will hopefully lead to more complicated estimations. The tutorial is available on RPubs (link). The R Markdown code is available on my Github site (link).

Logistic regression in R - Part 2 (Goodness of fit tests)

In a previous tutorial, I discussed how to perform logistic regression using R. I wrote a follow-up tutorial on how to conduct goodness of fit tests for logistic regression models in R and posted it on RPubs. The R Markdown code is available on my Github site.

I’ve learned how to assess model fit using Pearson correlations, deviance, and modified Hosmer-Lemeshow Goodness Of Fit (GOF) tests. I think these are important tools when assessing the fit of a logistic regression model. However, I wanted to focus on the HL GOF tests for this tutorial because there are a lot of nuances that I learned and wanted to share.

Additionally, I added the usefulness of visualizing whether the model over- or under-predicts the actual observed data using the calibration plot in R.

Logistic regression in R

I wrote a tutorial on how to construct logistic regression models in R. This tutorial was published on RPubs (link). I go through the use of the glm() command to perform a crude logistic regression model and a multivariable logistic regression model. The data (diabetes.csv) that I used for the motivating example is located here.

The RMarkdown code I used to create the tutorial is located on my GitHub site (link).

Visualizing linear regression models using R - Part 1

I wrote a tutorial on how to visualize linear regression models using R. In the tutorial I used the lm() command and the predict3d package to generate the models and visualize them using R. You can view the RPubs tutorial here. (NOTE: on 30 January 2022, I updated this tutorial and it can be found in my RPubs page here.) I created this tutorial using R Markdown, and the codes are available on my GitHub site (link).