Using_Git_with_R

Logo

Repo for Intro to R for Biologists Bootcamp

View the Project on GitHub bioinformatics-core-shared-training/Using_Git_with_R

Using Github with R (and Rstudio)

Aims of this section:
1) To introduce the concept of version control and to name some version control software and to focus on one (git).
2) To widen your knowledge about Github i.e. what it is and why you should be using it.
3) How we can integrate the use of Git into Rstudio.

Version Control - we have a problem.
Anyone who has edited any non-trivial document (think a thesis here) or a collaborative document (think a paper here) will have experienced the problem of having to revert to an earlier version of the manuscript e.g. upon discovering a mistake.

Many ways have been evolved involving convoluted naming and numbering schemes that necessitate human involvement and knowledge to operate e.g. Finaldoc4.5_final.Doc as this classic PHDComics cartoon illustrates.
Final from PHDComics.com.
Enter version control .
The above scenarios are commonly encountered in software development and consequently systems have been put into place to allow consistent transition between versions of documents.
Versioning software is similar to the incremental backup system often used on computers. In that example a full backup of the disk is initially made and thereafter periodic (smaller & quicker) backups of the changes are created. The restoration step uses the initial reference backup and ‘replays’ the incremental ones to re-create the last backed-up state. Commonly used version controls systems are sub-version and git (which is used in developing the Linux Operating System).
git
Git stores the initial files and then the user adds (marks changes to be committed), commits (explaining what the purpose of the changes are) and then pushes the commit to the repository. Git stores all of these changes in a network structure that can have branches off of the main trunk. git can be driven by text commands (but don’t panic - there are other options).

Example command-line git commands:

git config –global user.name ‘Your Name’.
git config –global user.email ‘your@email.com’.
git status.
git add [filename(s)].
git commit -m “[meaningful message]”.
git push.
git pull.

The above are examples of using a git command-line tool. There are many different tools that we can use with git. These include the command-line, the Github web interface, the Github Desktop program as well as from within Rstudio. Due to time considerations, this course only uses the web & Rstudo Git tools. You can learn about the others by following the links in the bibliography. Regardless of the tool used, we aim to demonstrate the usefulness and principles of git versioning.
By storing the initial and the differences of subsequent commits we can traverse this versioning timeline. This enables a member of a collaborating group to work seperately on an aspect of the files and then use git to merge the branched version back into the main trunk. git can also help the maintainer(Owner) to resolve conflicts e.g. where two people working on different branches make differing changes to the same file.
Versioning. =======

What is this Github of which you speak? Github.
Github can be thought of as a collaborative web-hosting for git repositories. A place to store (and even to share) code and materials. Github enables you to decide if a repo is private or public and who can create and edit materials within it. It is an example of distributed version control. You can work offline on the local repository copy and then upon reconnection to the Internet ‘push’ your changes to the Github repo.
distributed repositories
Github also enables you to create web-pages & blogs by writing material in the Markdown language and HyperText Markup Language (HTML). Rstudio also uses a flavour of Markdown (called unexcitedly enough RMarkdown) to allow annotation of R scripts. Github can render this (a .Rmd file) as a web-page. Hopefully, earlier this week you will have encountered Markdown and some of its usages.
In fact, this week you have been using materials that are hosted on Github repositories. It also enables you to create wikis to document your projects and an issue tracker for people to report any problems that need fixing.
It has been used to crowd source science e.g. the European EHEC E.coli outbreak.

Practical 1.

We will now use the Github web interface to create a GH repository & a document that we can edit to show how git records edits.

Using git within Rstudio Rstudio Rstudio can support git and subversion version control systems. We will restrict ourselves to the former. So we can access all of the git goodness without leaving the Rstudio environment.
To use them within Rstudio you must install the relevant version control system. For git you should download the relevant version from here. Instructions can be found for variuos operating systems at the end of this material.
In Rstudio go to the Tools menu and select Global Options. Then click git and enable ‘Version control interface for Rstudio projects’. If SSH is needed then you can add an RSA key here as well.
The git support revolves around the concept of Rstudio Projects (how Rstudio organises your code).


Summary

Reference materials