Monte Carlo Simulation: An Invaluable Tool for 21st Century Statisticians

class: bottom, left, inverse, title-slide

.title[
# Monte Carlo Simulation: An <em>Invaluable</em> Tool for 21<sup>st</sup> Century Statisticians
]
.author[
### Alessandro Gasparini · <a href = 'mailto:alessandro.gasparini@ki.se'>alessandro.gasparini(at)ki.se</a>
]

---

class: outline

# Outline

## 1. Introduction

## 2. Planning a simulation study

## 3. Coding a simulation study

## 4. Analysing a simulation study

## 5. Reporting and visualising results

## 6. Wrap-up

---
class: center, middle, inverse

# 1. Introduction

---

# About Me:

* Currently a post-doctoral researcher in biostatistics at the Department of Medical Epidemiology and Biostatistics, Karolinska Institutet;

* Graduated from my PhD in January 2020 at the University of Leicester, UK. 
  My thesis focussed on multilevel models for healthcare utilisation data;
  
--

* Graduated from the MSc in Milano-Bicocca in 2015, after spending a few months at KI to work on my thesis (supervised by Prof. Bellocco);

* (Some) other research interests: survival analysis (simple and complex), multilevel and longitudinal models, simulation and microsimulation methods, statistical issues in electronic health records research, tools development in R, kidney disease epidemiology, statistical modelling, ...

---

# About Me:

* .grey[Currently a post-doctoral researcher in biostatistics at the Department of Medical Epidemiology and Biostatistics, Karolinska Institutet;]

* .grey[Graduated from my PhD in January 2020 at the University of Leicester, UK. My thesis focussed on multilevel models for healthcare utilisation data;]

* .grey[Graduated from the MSc in Milano-Bicocca in 2015, after spending a few months at KI to work on my thesis (supervised by Prof. Bellocco);]

* .grey[(Some) other research interests: survival analysis (simple and complex), multilevel and longitudinal models,] <span style = "text-decoration: underline;">simulation and microsimulation methods</span>, .grey[statistical issues in electronic health records research, tools development in R, kidney disease epidemiology, statistical modelling, ...]

---
class: center, middle

## What are Monte Carlo simulation studies?

---
class: center, middle

# Monte Carlo simulation studies are computer experiments that involve generating data by pseudo-random sampling.

---
class: center, middle

.large[
Note that, from now on, I will be using the terms <br/> .accent[Monte Carlo simulation study] and .accent[simulation study] interchangeably.
]

---

# The Process&trade;

In a simulation study, we design an experiment such that:

1. Generate data from a known distribution/model (so that we know the .accent[truth]);

2. Analyse the data;

3. Compare the analysis results with the truth;

4. Then, we (generally) repeat steps 1-3 .accent[B] times.

> After running the experiment .accent[B] times, we collect and analyse the results to draw our conclusions.

---

# Simulation studies are useful...

Monte Carlo simulation studies provide an invaluable tool for statistical and biostatistical research — and are great for learning/understanding.

They can help to answer questions such as:

* Is an estimator biased in a finite sample?

* Do confidence intervals for a given parameter achieve the desired nominal level of coverage?

* How does a newly developed method compare to an established one?

* What is the power to detect a desired effect size under complex experimental settings and analysis methods?

* ...you name it.

---

# ...and increasingly popular!

.footnote[.tiny[
Web of Science search key: `TS = ("simulation study") AND WC = ("Statistics & Probability")`]]

---
class: center, middle

# Throughout this seminar we will see how to plan, code, analyse,  report/visualise simulation studies.

---

# Tutorial paper

---
class: center, middle, inverse

# 2. Planning a simulation study

---

# There are issues...

1. Monte Carlo standard errors of performance measures are often not computed/reported;

2. Reproducibility of results;

3. Dissemination of results.

> As statisticians, we should follow our advice — and could do better — when it comes to the design, reporting, and reproducibility of our own .accent[experiments].

> The problem is: the .accent[average] statistical training does not teach .accent[enough] about practical aspects of simulation studies.

---

# Enter the ADEMP framework

* .accent[A] is for _Aims_

* .accent[D] is for _Data-generating mechanisms_

* .accent[E] is for _Estimands_

* .accent[M] is for _Methods_

* .accent[P] is for _Performance measures_

> Carefully thinking and planning a simulation study makes your life easier (and more reproducible) in the long run.

---

# ADEMP: Aims

Before starting with a simulation study, we need to .accent[know what we want to learn].

For instance:

* Whether/when an estimation procedure converges?

* Consistency? (large-sample properties)

* Small-sample issues?

* Correct variance formula? (e.g. when derived from asymptotic theory)

* Comparison with other approaches?

* Robustness to model misspecification?

---

# ADEMP: Data-generating mechanisms

The data-generating mechanisms [DGM] are processes that we assume generate the data under study.
For instance, the process underlying a clinical trial (eligibility, randomisation, etc.) or an observational study.

Defining each DGM is aim-specific, but we aim to vary several factors at once: sample size, true value of the estimand of interest, additional DGM-related parameters, ...

We can vary them factorially, or use e.g. fractional designs for greater efficiency: we do need to be pragmatic, after all.

---

# ADEMP: Estimands

The .accent[estimand] is what we are estimating, the target of our inference.

It typically is a parameter (or a function of parameters), but it might be a summary measure (e.g. predictive performance) or a null hypothesis.

For instance, the estimand of interest in a simulation study mimicking a clinical trial might be the treatment effect.

This depends once again on the aims of the simulation study, hence the importance of defining that from the start.

---

# ADEMP: Methods

Methods are often what we are studying, e.g. a newly-developed method or a method that we aim to apply in unusual settings.

A .accent[comparator] is more often than not included, e.g. a gold standard.
However, it is not always necessary to include one.

It is also recommended to include familiar methods as a check.

---

# ADEMP: Performance measures

Let's define the chosen estimand as `$\theta$`, which is estimated by a method as `$\hat{\theta}$` with standard error `$\widehat{\text{se}}(\hat{\theta})$`.

Performance measures of interest might target:

* Properties of the estimator, such as .accent[bias], .accent[empirical standard error], .accent[mean squared error];

* Properties of the standard error of the estimator, such as .accent[average model-based standard error];

* Properties of a confidence interval for `$\hat{\theta}$`, such as .accent[coverage probability] and .accent[power];

* Convergence;

* Computational speed.

---

# In brief...

_Bias_ estimates how close the average estimate `$\hat{\theta}$` is to the true value `$\theta$`: `$E(\hat{\theta}) - \theta$`.
For instance, the maximum likelihood estimator is asymptotically unbiased.

The _empirical standard error_ describes the precision of an estimator, and is defined as `$\sqrt{\text{var}(\hat{\theta})}$`.
Ideally, we are happier with _precise_ estimators with low variance.

The _mean squared error_ (MSE) combines bias and emp. SE in an _overall_ performance measure
It is defined as bias<sup>2</sup> + (emp. SE)<sup>2</sup>.

_Coverage probability_ of a confidence interval is defined as the proportion of confidence intervals containing the true value: `$P(\hat{\theta}_{\text{low}}, \hat{\theta}_{\text{upp}})$`. 
For a 95% CI, this probability targets 95%; poor coverage might arise because of (1) biased estimator, (2) biased model SEs, (3) poor distributional properties.

---
class: center, middle

## All of these performance measures are described in more detail in the tutorial paper. Make sure to check that out and read carefully before planning your first simulation study.

---
class: center, middle, inverse

# Example

---

# Confounding

.pull-left[

We have a study which can be summarised by the following DAG:

Where `$X$` is the exposure, `$Y$` the outcome, and `$Z$` a confounder.

]

.pull-right[

> Note that the exposure and the outcome are independent: there is no edge between `$X$` and `$Y$`.

]

---

# Aim

The aim of this simulation study is pedagogical.

We know that there should be no association between `$X$` and `$Y$` once we adjust for `$Z$` in our analysis.

> However:

1. Is it actually true that once we adjust for `$Z$` we observe no association between `$X$` and `$Y$`?

1. What happens if we fail to adjust for a `$Z$` in our analysis?

---

# Data-generating mechanisms

* `$X$` is a binary treatment variable, `$Z$` is a continuous confounder, and `$Y$` is a continuous outcome;

* `$Z$` is simulated from a standard normal distribution;

* `$X$` depends on `$Z$` according to a logistic model:
$$
\log \frac{P(X = 1)}{P(X = 0)} = \gamma_0 + \gamma_1 Z
$$

* `$Y$` depends on `$Z$` according to a linear model:
$$
Y = \alpha_0 + \alpha_1 Z + \varepsilon
$$
with `$\varepsilon$` following a standard normal distribution.

---

# Data-generating mechanisms

We fix the parameters `$\gamma_0 = 1$`, `$\gamma_1 = 3$`, `$\alpha_0 = 10$`.

For `$\alpha_1$`, we actually simulate two scenarios:

1. `$\alpha_1 = 5$`: in this scenario (.accent[DGM = 1]), `$Z$` is actually a confounder;

1. `$\alpha_1 = 0$`: in the second scenario (.accent[DGM = 2]), we remove the edge between `$Z$` and `$Y$` as well. Here `$Z$` is not a confounder anymore.

Finally, we generate `$N = 200$` independent subjects per each simulated dataset.

---

# Estimands

We have a single estimand of interest here, the regression coefficient `$\beta_1$` (which we define how to estimate in the next slide) that quantifies the association between `$X$` and `$Y$`.

Under certain structural assumptions (see e.g. [John et. al. (2019), DOI: 10.1186/s12874-019-0858-x](https://doi.org/10.1186/s12874-019-0858-x)), the regression coefficient `$\beta_1$` can be interpreted as the average causal effect (ACE) of the exposure `$X$` on the outcome `$Y$`.

> Remember that, according to our data-generating mechanisms, we expect .accent[no association] between `$X$` and `$Y$`.

---

# Methods

We estimate `$\beta_1$` using linear regression.
Specifically, we fit the following two models:

1. A model that fails to adjust for the observed confounder `$Z$` (.accent[model 1]):
$$
Y = \beta_0 + \beta_1 X + \varepsilon
$$

1. A model that properly adjusts for `$Z$` (.accent[model 2]):
$$
Y = \beta_0 + \beta_1 X + \beta_2 Z + \varepsilon
$$

---

# Performance measures

The key performance measure of interest is .accent[bias], defined as
$$
E(\hat{\beta_1})
$$
as we know that — according to our DGMs — the true `$\beta_1 = 0$`.

We might also be interested in estimating the .accent[power] of a significance test for `$\hat{\beta_1}$` at the `$\alpha = 0.05$` level.
In our example, think of that as a test for the null hypothesis `$\beta_1 = 0$`.

---

# Number of replications

The rationale is that we need .accent[enough replications] to be able to estimate the key performance measures with .accent[good precision].

> Think of this as having a .accent[large enough] dataset for an epidemiological study, or power calculations for a clinical trial.
> It is the same concept!

For instance, our example is simple so we don't need to worry too much about computational time. 
We can then run a large number of replications, e.g., `$B = 10,000$` or more, depending on how long we can wait for the results to be ready.

---

# It's complicated...

In general, choosing the number of replications `$B = n_{\text{sim}}$` is not trivial:

> Precision ↔ Computational Effort

Here is an example from a paper we recently published:

---
class: center, middle, inverse

# 3. Coding a simulation study

---

# Coding a simulation study

The most important thing to remember is that .accent[there is no right way] of coding a simulation study.

However, your code .accent[should] be:

* Reproducible;

* Immune to the [bus factor](https://en.wikipedia.org/wiki/Bus_factor);

* Openly available (as much as possible).

---
class: center, middle

# It's time to switch to a live demo.

### We are going to use R for now, but code is available in Stata as well. All material is available online: <br/> https://tinyurl.com/mcsim-example-code
---
class: center, middle, inverse

# 4. Analysing a simulation study

---

# Analysing a simulation study

The analysis of a simulation study is not different from a clinical study.

You should start with descriptive statistics:

1. Inspect your data,

1. Compute summary statistics (mean, median, etc.),

1. Create some descriptive plots e.g. to check for outliers.

Then, you can move onto estimating the performance measures of interest: .accent[bias], .accent[coverage], etc.

Now, if you feel fancy, you can go into regression modelling — but that's a whole big topic for another day.
It is also a much less established practice.

---
class: center, middle

# Let's go back to the example.

---
class: center, middle, inverse

# 5. Reporting and visualising results

---

# Reporting results

There are potentially 4 dimensions (D times E times M times P), so complexity can grow quickly.

You want to think about:

* Which is the comparison of interest?

* Across DGMs or across methods?

* Can my reporting strategy .accent[effectively] tell the story of the study?

As with coding before, there is .accent[no right way] to do this.
Generally, I recommend using plots rather than tables, as tables can easily become daunting.

---

# Example 1

---

# Example 2

See the supplementary material for the paper titled _Impact of model misspecification in shared frailty survival models_ by Gasparini et al., available online at:

* https://github.com/ellessenne/frailtymcsim/blob/master/suppl.pdf

.footnote[

**Warning:** it's not pretty!

]

---
class: center, middle

## Plots are (more often than not) a better option, and {rsimsum} can help with that. <br/> <br/>
## Let's go back to the example.

---

# Interactivity?

With large simulation studies, we really want/need interactive graphics that allow the user to control what is displayed (or not).

Furthermore, more info (e.g., estimated values) could appear when hovering over plots.

Frameworks that let you do that are (among others) d3.js, Shiny, Plotly, etc.

We actually did develop an interactive Shiny app that does some of that:

* GitHub repository: https://github.com/ellessenne/interest

* Paper: https://doi.org/10.52933/jdssv.v1i4.9

---
class: center, middle, inverse

# 6. Wrap-up

---

# Replicability and reproducibility

Compared to, e.g., clinical trials or observational studies, simulation studies offer one of the most straightforward opportunities for replication.

Therefore:

1. Write up your study always assuming that other people will try to replicate it.
   Many actually do, see e.g. the [Simulation Replication Challenge](https://www.replisims.org/).
   The .accent[A-D-E-M-P] framework can help with this.

1. Another suggestion is to .accent[release the full code], e.g. on GitHub or other repositories.
   This allows direct replicability (and reproducibility!), and forces you to check (and organise, and document) your code thoroughly.
   Some journals even require you to share code nowadays.

---

# Key points

* Plan your simulation study on paper before starting to code it;

* Structure and describe your study assuming that other people will try to reproduce it;

* Build your code slowly, start small and only then generalise/expand it;

* Report Monte Carlo standard errors (which are displayed automatically by {rsimsum} in R and `simsum` in Stata);

* Make sure you're .accent[selling the story] of your study well, with engaging/effective visualisations and reporting.

---

# A Swiss Army Knife?

> ...yes!
> Let's recap some key uses:

1. Invaluable tool for learning (and teaching!) statistics;

2. Helpful to test/challenge your understanding of advanced statistical concepts;

3. Helpful to test/evaluate new methods you develop;

4. Helpful to compare several methods that have been proposed for a certain analysis task;

5. Helpful to assess the performance of methods when key assumptions are violated;

6. ...and many more.

---

# Some additional food for thought...

.pull-left[
<iframe width="400" height="450" src="https://www.youtube.com/embed/qaU2jXW2xcE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

.footnotesize[
URL: https://youtu.be/qaU2jXW2xcE
]
]

.pull-right[
<iframe width="400" height="450" src="https://www.youtube.com/embed/_ahUlw0HzSc" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

.footnotesize[
URL: https://youtu.be/_ahUlw0HzSc
]
]

---

# Material

All material you have seen today is openly available online.

* Slides: https://tinyurl.com/simulation-10-27

* Code: https://github.com/ellessenne/simulation-study-example

Feel free to get in touch if you have questions, I am always happy to chat about this topic.