class: bottom, left, inverse, title-slide # Open Tools to Analyse and Report Monte Carlo Simulation Studies ### Alessandro Gasparini ###
alessandro.gasparini@ki.se
·
@ellessenne
### FMS Spring Meeting · 2020-05-29 --- class: outline <script type="text/x-mathjax-config"> MathJax.Hub.Config({ TeX: { Macros: { '\\_': '_' } } }); </script> # Outline ## 1. Monte Carlo simulation studies ## 2. The {`rsimsum`} R package ## 3. The INTEREST Shiny app ## 4. Some considerations and conclusions --- class: center, middle, inverse # Monte Carlo simulation studies --- class: center, middle # Monte Carlo simulation studies are computer experiments that involve generating data by pseudo-random sampling. --- class: center, middle .large[ From now on, I will be using the terms _Monte Carlo simulation study_ and _simulation study_ interchangeably. ] --- # The Process™ In a simulation study, we design an experiment such that: -- 1. Generate data from a known distribution/model (so that we know the _truth_); -- 2. Analyse the data; -- 3. Compare the analysis results with the truth; -- 4. Then, we (generally) repeat steps 1-3 _B_ times. -- > After running the experiment _B_ times, we collect and analyse the results to draw our conclusions. --- # Simulation studies are useful... Monte Carlo simulation studies provide an invaluable tool for statistical and biostatistical research. -- They can help to answer questions such as: * Is an estimator biased in a finite sample? * Do confidence intervals for a given parameter achieve the desired nominal level of coverage? * How does a newly developed method compare to an established one? * What is the power to detect a desired effect size under complex experimental settings and analysis methods? * ...you name it! --- # ...and increasingly popular! <img src="fms-spring-meeting-ag-20200529_files/figure-html/scopus-1.png" width="90%" style="display: block; margin: auto;" /> .footnote[.tiny[ Scopus search key: `TITLE-ABS-KEY ("simulation study") AND SUBJAREA (math)` ]] --- # There are issues... 1. Monte Carlo standard errors of performance measures are often not computed/reported; -- 2. Reproducibility of results; -- 3. Dissemination of results. -- > As statisticians, we should follow our advice — and could do better — when it comes to the design, reporting, and reproducibility of our own _experiments_. --- class: center, middle # Disclaimer: This talk will not cover how to plan, design, and run a Monte Carlo simulation study. A full example can be found in the [tutorial paper by Morris, White, and Crowther](https://onlinelibrary.wiley.com/doi/10.1002/sim.8086) (2019). --- class: center, middle, inverse # The {`rsimsum`} R package --- # Enter `rsimsum` > Why _yet_ another R package? -- * There is a similar package in [Stata](https://www.stata-journal.com/article.html?article=st0200), but I could not find anything comparable in R; * Most performance measures are supported out of the box — no need to code tedious (and error-prone) calculations by hand; * Monte Carlo standard errors are computed and displayed by default; * Several quick plotting options for fast iteration/exploration. --- # Example (1) We compare three different methods for estimating the hazard ratio in a randomised trial with a survival outcome; it's the same example from the tutorial paper by Morris, White, and Crowther (2019). Specifically, we fit the proportional hazards model .center[.large[ h<sub>i</sub>(t) = h<sub>0</sub>(t)exp(X<sub>i</sub>θ) ]] where X<sub>i</sub> is a binary treatment and θ is the log-hazard ratio. The dataset with the results of the simulation is readily available online: ```r library(haven) df <- haven::read_dta(file = "http://tiny.cc/example-data-dta") ``` --- # Example (2) ## Aims of the simulation study Evaluating the impact of model misspecification when using proportional hazards survival models to estimate the hazard ratio. -- ## Data-generating mechanisms 500 patients, with X<sub>i</sub> a binary treatment with simple randomisation and equal allocation ratio and θ = -0.5. Survival times are simulated assuming 1. An exponential baseline hazard with scale parameter λ = 0.1; 1. A Weibull baseline hazard with scale λ = 0.1 and shape γ = 1.5. --- # Example (3) ## Estimand(s) The log-hazard ratio θ. -- ## Methods An (1) exponential, (2) Weibull, and (3) Cox proportional hazards model. -- ## Performance measure(s) The study focuses on convergence, bias, coverage, empirical and model-based standard errors for θ. 1,600 replications per DGM are required to control Monte Carlo SE of bias to be lower than 0.005 (assuming Var(θ) ≤ 0.04). --- # `rsimsum`'s main function: `simsum` ```r str(rsimsum::simsum) ``` ``` ## function (data, estvarname, se = NULL, true = NULL, ## methodvar = NULL, ref = NULL, by = NULL, ci.limits = NULL, ## df = NULL, dropbig = FALSE, x = FALSE, control = list()) ``` > The full documentation is available online at https://ellessenne.github.io/rsimsum/ --- # Summarising a simulation study (1) ```r s <- rsimsum::simsum( data = df, estvarname = "theta", se = "se", true = -0.5, methodvar = "method", by = "dgm", x = TRUE ) s ``` ``` ## Summary of a simulation study with a single estimand. ## True value of the estimand: -0.5 ## ## Method variable: method ## Unique methods: Exp, Wei, Cox ## Reference method: Exp ## ## By factors: dgm ## ## Monte Carlo standard errors were computed. ``` --- # Summarising a simulation study (2) ```r summary(s, stats = c("bias", "cover")) ``` ``` ## Values are: ## Point Estimate (Monte Carlo Standard Error) ## ## Bias in point estimate: ## dgm Exp Wei Cox ## 1 -0.0029 (0.0052) -0.0032 (0.0052) -0.0021 (0.0052) ## 2 0.0494 (0.0035) 0.0048 (0.0038) 0.0062 (0.0038) ## ## Coverage of nominal 95% confidence interval: ## dgm Exp Wei Cox ## 1 0.9544 (0.0052) 0.9544 (0.0052) 0.9544 (0.0052) ## 2 0.9600 (0.0049) 0.9556 (0.0051) 0.9575 (0.0050) ``` --- # Performance measures in `rsimsum` The following performance measures are implemented in `rsimsum`: * Bias; * Empirical SE, relative % increase in precision, model-based SE, and relative % error in model-based SE; * Mean squared error (MSE); * Coverage probability and bias-corrected coverage probability; * Power of type I error. Each performance measure is described in more detail in the [tutorial paper](https://onlinelibrary.wiley.com/doi/10.1002/sim.8086). --- class: center, middle # `rsimsum` supports a variety of (opinionated) fast plotting options for point estimates, standard errors, summary statistics --- # Plotting point estimates (1) ```r autoplot(object = s, type = "est") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-est-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting point estimates (2) ```r autoplot(object = s, type = "est_density") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-est-density-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting point estimates (3) ```r autoplot(object = s, type = "est_hex") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-est-hex-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting point estimates (4) ```r autoplot(object = s, type = "est_ba") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-est-ba-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting point estimates (5) ```r autoplot(object = s, type = "est_ridge") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-est-ridge-1.png" width="90%" style="display: block; margin: auto;" /> --- # Same plots available for SEs too! ```r autoplot(object = s, type = "se_ba") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-se-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting summary statistics (1) ```r autoplot(object = summary(s), type = "forest", stats = "bias") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-forest-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting summary statistics (2) ```r autoplot(object = summary(s), type = "lolly", stats = "cover") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-lolly-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting summary statistics (3) ```r autoplot(object = s, type = "heat", stats = "bias") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-heat-1.png" width="90%" style="display: block; margin: auto;" /> --- # Plotting coverage probability ```r autoplot(object = s, type = "zip", zoom = 0.1) ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-zip-1.png" width="90%" style="display: block; margin: auto;" /> --- # Nested loop plots ```r autoplot(object = obj, type = "nlp", stats = "bias") ``` <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-nlp-1.png" width="90%" style="display: block; margin: auto;" /> .footnote[ This uses a different dataset: `data("nlp", package = "rsimsum")` ] --- # Customising plots ```r * autoplot(object = s, type = "est_ridge") + viridis::scale_fill_viridis(discrete = TRUE) + viridis::scale_color_viridis(discrete = TRUE) + ggplot2::scale_x_continuous(labels = scales::comma) + ggplot2::theme_minimal( base_size = 12, base_family = "Arial Narrow" ) + ggplot2::theme(legend.position = "top") + ggplot2::labs( x = expression(Point ~ estimates ~ of ~ theta), y = "DGM", color = "", fill = "" ) ``` --- class: center, middle <img src="fms-spring-meeting-ag-20200529_files/figure-html/autoplot-heat-viridis-display-1.png" width="100%" style="display: block; margin: auto;" /> --- # Multiple estimands ```r str(rsimsum::multisimsum) ``` ``` ## function (data, par, estvarname, se = NULL, true = NULL, ## methodvar = NULL, ref = NULL, by = NULL, ci.limits = NULL, ## df = NULL, dropbig = FALSE, x = FALSE, control = list()) ``` > This supports multiple estimands at once via the `par` argument. > Everything else works analogously as with `simsum`, including plotting! --- class: center, middle, inverse # The INTEREST Shiny app --- class: center, middle # INTEREST: ## **IN**teractive **T**ool for **E**xploring **RE**sults ## from **S**imulation s**T**udies --- # INTEREST: Why? > Why a Shiny app? * Dissemination of results and open science; * Fast(er) iteration and exploration of results, with a point-and-click interface; * Supporting devices where R does not run natively (smartphones, iPads, Chromebooks, ...); * Supporting devices where you are not allowed to install R (e.g. because of IT policies). --- # INTEREST: Homepage <img src="homepage.png" width="80%" style="display: block; margin: auto;" /> --- # INTEREST: Workflow <img src="interest-workflow.png" width="90%" style="display: block; margin: auto;" /> --- class: center, middle # Demo: ## http://interest.shinyapps.io/interest/ --- class: center, middle, inverse # Considerations and conclusions --- # How to get `rsimsum` and INTEREST? `rsimsum` can be installed directly from CRAN: ```r install.packages("rsimsum") # Development version on GitHub: # require("remotes") # remotes::install_github(repo = "ellessenne/rsimsum") ``` INTEREST is on GitHub: ```r # require("remotes") remotes::install_github(repo = "ellessenne/interest") ``` --- class: center, middle ## `rsimsum` and INTEREST are actively developed: bugs are getting squished, new features are being implemented, ... ### If there is anything you'd like to see implemented, feel free to [e-mail me](mailto:alessandro.gasparini@ki.se) or [open an issue on GitHub](https://github.com/ellessenne). --- class: center, middle ## Report _Monte Carlo standard errors_ — how would you feel if a paper in NEJM did not report standard errors of effect estimates? --- class: center, middle ## The importance of _reproducibility_ and _dissemination_ of research findings --- class: center, middle ## The opportunities of _open-source software_ --- # That's all — thank you! Some references: * _Using simulation studies to evaluate statistical methods_. Morris TP, White IR, and Crowther MJ (2019). Statistics in Medicine 38(11):2074--2102, DOI: 10.1002/sim.8086 * _rsimsum: Summarise results from Monte Carlo simulation studies_. Gasparini A (2018). Journal of Open Source Software 3(26):739, DOI: 10.21105/joss.00739 * _INTEREST: INteractive Tool for Exploring REsults from Simulation sTudies_. Gasparini A, Morris TP, Crowther MJ (2020). arXiv:1909.03813 * `rsimsum`'s website: https://ellessenne.github.io/rsimsum/