class: center, middle, inverse, title-slide # Version Control Will Save Your Life ### Alessandro Gasparini
<
ag475@leicester.ac.uk
> ### May 16
th
, 2018
Health Sciences Postgraduate Forum 2018 --- # What is version control? > [...] version control [...] is the management of changes to documents, computer programs, large web sites, and other collections of information .footnote[Source: https://en.wikipedia.org/wiki/Version_control] -- > Changes are usually identified by a number or letter code, termed the "revision number", "revision level", or simply "revision". For example, an initial set of files is "revision 1". When the first change is made, the resulting set is "revision 2", and so on. .center[ <img src="Figures/git-when-revisions-updated.svg" style="display: block; margin: auto;" /> ] -- > Each revision is associated with a timestamp and the person making the change. Revisions can be compared, restored, and with some types of files, merged. --- # Why version control? 1. Backup (sort of); 2. Versioning; 3. Contributing to open source projects; 4. Reproducibility and trasparency in science [[doi: 10.1186/1751-0473-8-7](https://doi.org/10.1186/1751-0473-8-7)]. ??? So Why Do We Need A Version Control System (VCS)? Our shared folder/naming system is fine for class projects or one-time papers. But software projects? Not a chance. A good VCS does the following: * __Backup and Restore__: Files are saved as they are edited, and you can jump to any moment in time. Need that file as it was on Feb 23, 2007? No problem. * __Synchronization__: Lets people share files and stay up-to-date with the latest version. * __Short-term undo__: Monkeying with a file and messed it up? (That’s just like you, isn’t it?). Throw away your changes and go back to the “last known good” version in the database. * __Long-term undo__: Sometimes we mess up bad. Suppose you made a change a year ago, and it had a bug. Jump back to the old version, and see what change was made that day. * __Track Changes__: As files are updated, you can leave messages explaining why the change happened (stored in the VCS, not the file). This makes it easy to see how a file is evolving over time, and why. * __Track Ownership__: A VCS tags every change with the name of the person who made it. Helpful for blamestorming giving credit. * __Sandboxing, or insurance against yourself__: Making a big change? You can make temporary changes in an isolated area, test and work out the kinks before “checking in” your changes. * __Branching and merging__: A larger sandbox. You can branch a copy of your code into a separate area and modify it in isolation (tracking changes separately). Later, you can merge your work back into the common area. --- # Backup and versioning <img src="Figures/vc-xkcd.jpg" width="100%" style="display: block; margin: auto;" /> .footnote[Source: http://smutch.github.io/VersionControlTutorial/] --- # Backup and versioning > The fundamental idea of version control is to manage multiple revisions of the same unit of information. The idea of backup is to copy the latest version of information to a safe place, where it may be used to restore the original after a data loss event. -- .pull-left[ <img src="Figures/drive0.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="Figures/version-control-1.jpg" width="80%" style="display: block; margin: auto;" /> ] .footnote[Source: http://geek-and-poke.com/ and me, just a few years ago] ??? With version control we actually preserve the history of all changes to a file! In a single file! Whoop whoop! This is bad: 1- using a VCS is more elegant, there is a single file "script.R" 2- automatic diff tools 3- no need to actually open all files --- # Backup and versioning What kind of content is suitable for version control? * Data-cleaning script; * Analysis script; * Manuscript draft; * ...any text-based document, really! -- Do you write any kind of software? Do you analyse data using R/Stata/SAS/SPSS/...? Do you write documents in LaTeX/Markdown? Do you use Bash scripts on ALICE/SPECTRE? > You _need_ version control! --- # Collaboration .pull-left[ The _headache_ way: <img src="Figures/phd101212s.gif" width="90%" style="display: block; margin: auto;" /> ] -- .pull-right[ The _Version Control System_ (VCS) way: ![https://wac-cdn.atlassian.com/dam/jcr:0869c664-5bc1-4bf2-bef0-12f3814b3187/01.svg](https://wac-cdn.atlassian.com/dam/jcr:0869c664-5bc1-4bf2-bef0-12f3814b3187/01.svg) * Each collaborator can suggest changes; * Each change can be _merged_ into the main project by the person in charge; * Resolving conflicts is easy (-ish). ] --- # Contributing to open source projects <img src="Figures/ggplot2-1.PNG" width="95%" style="display: block; margin: auto;" /> --- count: false # Contributing to open source projects <img src="Figures/ggplot2-2.PNG" width="95%" style="display: block; margin: auto;" /> --- count: false # Contributing to open source projects <img src="Figures/ggplot2-3.PNG" width="95%" style="display: block; margin: auto;" /> --- # Version control systems [VCS] .pull-left[ <img src="Figures/vcs-c.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="Figures/vcs-d.png" width="100%" style="display: block; margin: auto;" /> ] .footnote[Source: https://homes.cs.washington.edu/~mernst/advice/version-control.html] .large[ Popular VCS: Git, Mercurial (Hg), Subversion (SVN), ... ] --- # Git .pull-left[ ![https://git-scm.com/images/logos/downloads/Git-Icon-1788C.png](https://git-scm.com/images/logos/downloads/Git-Icon-1788C.png) Homepage: [https://git-scm.com/](https://git-scm.com/) ] .pull-right[ .Large[ Why Git? ] * Git is good: 1. Performance 2. Security 3. Flexibility * Git is a de-facto standard; * GitHub (and Bitbucket, GitLab, ...)! * Loads of tutorials online. ] --- # Git workflow .center[ <img src="Figures/git_everthing_is_local.png" width="80%" style="display: block; margin: auto;" /> ] .footnote[Source: https://www.silverpeas.org/] --- # Git is hard though... .pull-left[ ![https://imgs.xkcd.com/comics/git.png](https://imgs.xkcd.com/comics/git.png) Source: https://xkcd.com/1597/ ] .pull-right[ Some references: * Quick start tutorial: https://try.github.io * Git book: https://git-scm.com/book/en/v2 * ARCHER national supercomputing service training: http://hpcarcher.github.io/git-novice/ * PGR workshop (possibly) ] --- # Take-home message Version control will not _actually_ save your life... -- ... but if: 1. you stick with it, and 2. you fight through, and 3. you deal with the occasional quirks .center[ .Large[ ... it will make your .tiny[(research)] life easier .tiny[(and safer)]! ] ] -- > .Huge[Thank you for listening!]