Development¶
Thanks for your interest in helping us develop the GA4GH reference implementation! There are lots of ways to contribute, and it’s easy to get up and running. This page should provide the basic information required to get started; if you encounter any difficulties please let us know
Warning
This guide is a work in progress, and is incomplete.
Development environment¶
We need a development Python 2.7 installation, Git, and some basic libraries. On Debian or Ubuntu, we can install these using
$ sudo apt-get install python-dev zlib1g-dev git
Note
TODO: Document this basic step for other platforms? We definitely want to tell people how to do this with Brew or ports on a Mac.
If you don’t have admin access to your machine, please contact your system administrator, and ask them to install the development version of Python 2.7 and the development headers for zlib.
Once these basic prerequisites are in place, we can then bootstrap our local Python installation so that we have all of the packages we require and we can keep them up to date. Because we use the functionality of the recent versions of pip and other tools, it is important to use our own version of it and not any older versions that may be already on the system.
$ wget https://bootstrap.pypa.io/get-pip.py
$ python get-pip.py --user
This creates a user specific site-packages installation for Python, which is based in your ~/.local directory. This means that you can now install any Python packages you like without needing to either bother your sysadmin or worry about breaking your system Python installation. To use this, you need to add the newly installed version of pip to your PATH. This can be done by adding something like
export PATH=$HOME/.local/bin:$PATH
to your ~/.bashrc file. (This will be slightly different if you use another shell like csh or zsh.)
We then need to activate this configuration by logging out, and logging back in. Then, test this by running:
$ pip --version
pip 6.0.8 from /home/username/.local/lib/python2.7/site-packages (python 2.7)
We are now ready to start developing!
GitHub workflow¶
First, go to https://github.com/ga4gh/server and click on the ‘Fork’ button in the top right-hand corner. This will allow you to create your own private fork of the server project where you can work. See the GitHub documentation for help on forking repositories. Once you have created your own fork on GitHub, you’ll need to clone a local copy of this repo. This might look something like:
$ git clone git@github.com:username/server.git
We can then install all of the packages that we need for developing the GA4GH reference server:
$ cd server
$ pip install -r requirements.txt --user
This will take a little time as the libraries that we require are fetched from PyPI and built.
It is also important to set up an upstream remote for your repo so that you can sync up with the changes that other people are making:
$ git remote add upstream https://github.com/ga4gh/server.git
All development is done against the develop branch. The stable branch is meant to be kept stable since it is the branch releases are based on – don’t touch it! These are the two mainline branches. Our branching model is loosely based on the one described here.
All development should be done in a topic branch. That is, a branch that the developer creates him or herself. These steps will create a topic branch (replace TOPIC_BRANCH_NAME appropriately):
$ git fetch --all
$ git checkout develop
$ git merge --ff-only upstream/develop
$ git checkout -b TOPIC_BRANCH_NAME
Topic branch names should include the issue number (if there is a tracked issue this change is addressing) and provide some hint as to what the changes include. For instance, a branch that addresses the (imaginary) tracked issue with issue number #123 to add more widgets to the code might be named 123_more_widgets.
At this point, you are ready to start adding, editing and deleting files. Stage changes with git add. Afterwards, checkpoint your progress by making commits:
$ git commit -m 'Awesome changes'
(You can also pass the --amend flag to git commit if you want to incorporate staged changes into the most recent commit.)
Once you have changes that you want to share with others, push your topic branch to GitHub:
$ git push origin TOPIC_BRANCH_NAME
Then create a pull request using the GitHub interface. This pull request should be against the develop branch (this should happen automatically).
At this point, other developers will weigh in on your changes and will likely suggest modifications before the change can be merged into develop. When you get around to incorporating these suggestions, it is likely that more commits will have been added to the develop branch. Since you (almost) always want to be developing off of the latest version of the code, you need to perform a rebase to incorporate the most recent changes from develop into your branch.
$ git fetch --all
$ git checkout develop
$ git merge --ff-only upstream/develop
$ git checkout TOPIC_BRANCH_NAME
$ git rebase develop
At this point, several things could happen. In the best case, the rebase will complete without problems and you can continue developing. In other cases, the rebase will stop midway and report a merge conflict. That is, git has determined that it is impossible for it to determine how to combine the changes from the new commits in the develop branch and your changes in your topic branch and needs manual intervention to proceed. GitHub has some documentation on how to resolve rebase merge conflicts.
Once you have updated your branch to the point where you think that you want to re-submit the code for other developers to consider, push the branch again, this time using the force flag:
$ git push --force origin TOPIC_BRANCH_NAME
If you had tried to push the topic branch without using the force flag, it would have failed. This is because non-force pushes only succeed when you are only adding new commits to the tip of the existing remote branch. When you want to do something other than that, such as insert commits in the middle of the branch history (what git rebase does), or modify a commit (what git commit --amend does) you need to blow away the remote version of your branch and replace it with the local version. This is exactly what a force push does.
Warning
Never use the force flag to push to upstream. Never use the force flag to push to develop or stable. Only use the force flag on your repository and on your topic branches. Otherwise you run the risk of screwing up the mainline branches, which will require manual intervention by a senior developer and manual changes by every downstream developer. That is a recoverable situation, but also one that we would rather avoid. (Note: a hint that this has happened is that one of the above listed merge commands that uses the --ff-only flag to merge a remote mainline branch into a local mainline branch fails.)
One task that you might be asked to do before your topic branch can be merged is “squashing your commits.” We want the git history to be clean and informative, and we do that by crafting one and only one commit message per logical change. In the normal course of development (unless one is constantly committing with the --amend flag) many intermediate commits can be created that should be squashed down to (usually) one before it can be merged. Do this with (assuming you are in your topic branch):
$ git rebase -i develop
This will launch an editor that will give you control over how you want to structure your commits. Usually you just want to “pick” the first commit and “squash” all of the subsequent commits, and then ensure that the final commit message is clean (best practice is to give a short summary of the change on the first line, a blank line, and then a more detailed description of the change following, with the issue number – if there is one – in the detailed description). More information about the interactive rebase process can be found here. Once the commits are to your liking, you can push the branch to your remote repository (which will require a force push if you reordered or deleted commits that existed in the remote version of the branch).
(It usually is a good idea to squash commits before rebasing your topic branch on top of a mainline branch. Any commit in your topic branch could cause a merge conflict, and it’s usually easier to ensure only one merge conflict will potentially occur, rather than performing a merge conflict resolution for each commit in your topic branch – the worst case.)
Once your pull request has been merged into develop, you can close the pull request and delete the remote branch in the GitHub interface. Locally, run this command to delete the topic branch:
$ git branch -D TOPIC_BRANCH_NAME
Only the tip of the iceberg of git and GitHub has been covered in this section, and much more can be learned by browsing their documentation. For instance, get help on the git commit command by running:
$ git help commit
Contributing¶
See the files CONTRIBUTING.md and STYLE.md for an overview of the processes for contributing code and the style guidelines that we use.
Development utilities¶
All of the command line interface utilities have local scripts that simplify development: for example, we can run the local version of the ga2sam program by using:
$ python ga2sam_dev.py
To run the server locally in development mode, we can use the server_dev.py script, e.g.:
$ python server_dev.py
will run a server using the default configuration. This default configuration expects a data hierarchy to exist in the ga4gh-example-data directory. This default configuration can be changed by providing a (fully qualified) path to a configuration file (see the Configuration section for details).
Organisation¶
The code for the project is held in the ga4gh package, which corresponds to the ga4gh directory in the project root. Within this package, the functionality is split between the client, server, protocol and cli modules. The cli module contains the definitions for the ga4gh_client and ga4gh_server programs.