Commit abf4e87a authored by ADGDT's avatar ADGDT

Initial commit

parents
Please delete the text below before submitting your contribution.
---
Thanks for contributing! If this contribution is for instructor training, please send an email to checkout@carpentries.org with a link to this contribution so we can record your progress. You’ve completed your contribution step for instructor checkout just by submitting this contribution.
Please keep in mind that lesson maintainers are volunteers and it may be some time before they can respond to your contribution. Although not all contributions can be incorporated into the lesson materials, we appreciate your time and effort to improve the curriculum. If you have any questions about the lesson maintenance process or would like to volunteer your time as a contribution reviewer, please contact Kate Hertweck (k8hertweck@gmail.com).
---
Please delete the text below before submitting your contribution.
---
Thanks for contributing! If this contribution is for instructor training, please send an email to checkout@carpentries.org with a link to this contribution so we can record your progress. You’ve completed your contribution step for instructor checkout just by submitting this contribution.
Please keep in mind that lesson maintainers are volunteers and it may be some time before they can respond to your contribution. Although not all contributions can be incorporated into the lesson materials, we appreciate your time and effort to improve the curriculum. If you have any questions about the lesson maintenance process or would like to volunteer your time as a contribution reviewer, please contact Kate Hertweck (k8hertweck@gmail.com).
---
*.pyc
*~
.DS_Store
.ipynb_checkpoints
.sass-cache
__pycache__
_site
Abigail Cabunoc Mayes <abigail.cabunoc@gmail.com>
Abigail Cabunoc Mayes <abigail.cabunoc@gmail.com> <abigail.cabunoc@oicr.on.ca>
Alois Heilmaier <a.heilmaier@hotmail.com>
Andrew Lonsdale <andrew.lonsdale@lonsbio.com.au>
Andrew Rohl <a.rohl@curtin.edu.au>
Ariel Rokem <arokem@gmail.com>
Arnstein Orten <arnstein.orten@gmail.com>
Bennet Fauber <justbennet@users.noreply.github.com>
Bérénice Batut <berenice.batut@gmail.com>
Bérénice Batut <berenice.batut@gmail.com> <berenice.batut@udamail.fr>
Bernhard Konrad <bernhard.konrad@gmail.com>
Bill Mills <mills.wj@gmail.com>
Brenna O'Brien <info@brennaobrien.com>
Chris Pawsey <chris.bording@pawsey.org.au>
Christoph Junghans <christoph.junghans@gmail.com> <junghans@votca.org>
Daisie Huang <daisieh@gmail.com>
Danielle Traphagen <dtrapezoid@gmail.com>
Dorota Jarecka <djarecka@gmail.com>
Emily Dolson <emilyldolson@gmail.com>
Emmanouil Farsarakis <farsarakis@gmail.com> <farsarakis@epcc.ed.ac.uk>
Erin Becker <erinstellabecker@gmail.com>
Evan P. Williamson <evanpeterw@gmail.com>
François Michonneau <francois.michonneau@gmail.com>
Greg Watson <g.watson@computer.org>
Greg Wilson <gvwilson@software-carpentry.org> <gvwilson@third-bit.com>
Ivan Gonzalez <iglpdc@gmail.com> <iglpdc@users.noreply.github.com>
James Allen <james@sharelatex.com> <jamesallen0108@gmail.com>
Jane Charlesworth <janepipistrelle@gmail.com>
Kate Lee <kl167@le.ac.uk>
Luke W. Johnston <lwjohnst@gmail.com>
Marisa Guarinello <mguarinello@gmail.com>
Mark Wheelhouse <mjw03@doc.ic.ac.uk>
Mary C. Kinniburgh <mckinniburgh@gmail.com>
Mateusz Kuzak <mateusz.kuzak@gmail.com>
Matthias Haeni <datamat@users.noreply.github.com>
Maxim Belkin <maxim.belkin@gmail.com> <maxim-belkin@users.noreply.github.com>
Michael Panitz <michpa@hotmail.com>
Mike Jackson <m.jackson@software.ac.uk> <michaelj@epcc.ed.ac.uk>
Nicholas Hannah <nchlshnnh@gmail.com> <nicholash@users.noreply.github.com>
Nicola Soranzo <nicola.soranzo@earlham.ac.uk> <nsoranzo@tiscali.it>
Patrick C. Shriwise <shriwise@wisc.edu>
Pauline Barmby <pbarmby@uwo.ca>
Peter Steinbach <steinbac@mpi-cbg.de> <steinbach@scionics.de>
Raniere Silva <raniere@rgaiacs.com> <ra092767@ime.unicamp.br>
Raniere Silva <raniere@rgaiacs.com> <raniere@ime.unicamp.br>
Rémi Emonet <remi@heeere.com> <remi.emonet@reverse--com.heeere>
Rémi Emonet <remi@heeere.com> <twitwi@users.noreply.github.com>
Sarah Stevens <ssteven2@wisc.edu> <sstevens2@wisc.edu>
Sean Aubin <saubin@uwaterloo.ca>
Steve Vandervalk <steven.vandervalk@jcu.edu.au> <stevenvandervalk@users.noreply.github.com>
Tiffany Timbers <tiffany.timbers@gmail.com>
Timothée Poisot <t.poisot@gmail.com> <tim@poisotlab.io>
Tom Kelly <tomkellygenetics@gmail.com>
Vijay P. Nagraj <vpnagraj@virginia.edu>
Yuandra Ismiraldi <me@yuandraismiraldi.net>
zz-abracarambar <abracarambar@users.noreply.github.com>
[project]
vcs: Git
[files]
authors: yes
files: no
Alison Appling
Sean Aubin
Pete Bachant
Daniel Baird
Pauline Barmby
Bérénice Batut
Maxim Belkin
Madeleine Bonsma
Jon Borrelli
Andy Boughton
Daina Bouquin
Rudi Brauning
Matthew Brett
Amy Brown
Jane Charlesworth
Billy Charlton
Daniel Chen
Garret Christensen
Ruth Collings
Marianne Corvellec
Matt Davis
Emily Dolson
Laurent Duchesne
Jonah Duckles
Rémi Emonet
Loïc Estève
Emmanouil Farsarakis
Bennet Fauber
Anne Fouilloux
Stuart Geiger
Ivan Gonzalez
Marisa Guarinello
Stéphane Guillou
Jamie Hadwin
Matthias Haeni
Pierre Haessig
Nicholas Hannah
Sumana Harihareswara
Alois Heilmaier
Martin Heroux
Kate Hertweck
Daisie Huang
Yuandra Ismiraldi
Christian Jacobs
Dorota Jarecka
Luke W. Johnston
David Jones
Zbigniew Jędrzejewski-Szmek
Tom Kelly
W. Trevor King
Thomas Kluyver
Bernhard Konrad
Mateusz Kuzak
Arne Küderle
Kathleen Labrie
Hilmar Lapp
Mark Laufersweiler
David LeBauer
Kate Lee
Matthias Liffers
Clara Llebot
Catrina Loucks
Keith Ma
Kunal Marwaha
Ryan May
Bill Mills
Andreas Mueller
Madicken Munk
Juan Nunez-Iglesias
Brenna O'Brien
Catherine Olsson
Michael Panitz
Chris Pawsey
Stefan Pfenninger
Paul Preney
Timothy Rice
Kristina Riemer
Annika Rockenberger
Andrew Rohl
Ariel Rokem
Bill Sacks
Michael Sarahan
Sebastian Schmeier
Hartmut Schmider
Peter Shellito
Patrick C. Shriwise
Raniere Silva
Brendan Smithyman
Nicola Soranzo
Peter Steinbach
Sarah Stevens
Oliver Stueker
Benjamin Stuermer
Tiffany Timbers
Danielle Traphagen
Tim Tröndle
Anelda van der Walt
Steve Vandervalk
Greg Watson
Belinda Weaver
Mark Wheelhouse
Ethan White
Greg Wilson
Steven Wu
Qingpeng Zhang
Andrew Lonsdale
Arnstein Orten
Ben Bolker
Christoph Junghans
David Jennings
James Tocknell
Jonathan Cooper
Katrin Leinweber
Leo Browning
Mary C. Kinniburgh
Matthew Bourque
Matthew Hartley
Raphaël Grolimund
Scott Bailey
Todd Gamblin
Tommy Keswick
Vijay P. Nagraj
Will Usher
Please cite as:
Daisie Huang and Ivan Gonzalez (eds): "Software Carpentry: Version
Control with Git." Version 2016.06, June 2016,
https://github.com/swcarpentry/git-novice, 10.5281/zenodo.57467.
---
layout: page
title: "Contributor Code of Conduct"
permalink: /conduct/
---
As contributors and maintainers of this project,
we pledge to respect all people who contribute through reporting issues,
posting feature requests,
updating documentation,
submitting pull requests or patches,
and other activities.
We are committed to making participation in this project a harassment-free experience for everyone,
regardless of level of experience,
gender,
gender identity and expression,
sexual orientation,
disability,
personal appearance,
body size,
race,
ethnicity,
age,
or religion.
Examples of unacceptable behavior by participants include the use of sexual language or imagery,
derogatory comments or personal attacks,
trolling,
public or private harassment,
insults,
or other unprofessional conduct.
Project maintainers have the right and responsibility to remove, edit, or reject
comments, commits, code, wiki edits, issues, and other contributions
that are not aligned to our [Code of Conduct][coc].
Project maintainers who do not follow the Code of Conduct may be removed from the project team.
Instances of abusive, harassing, or otherwise unacceptable behavior
may be reported by following our [reporting guidelines][coc-reporting].
- [Software and Data Carpentry Code of Conduct][coc]
- [Code of Conduct Reporting Guide][coc-reporting]
{% include links.md %}
# Contributing
[Software Carpentry][swc-site] and [Data Carpentry][dc-site] are open source projects,
and we welcome contributions of all kinds:
new lessons,
fixes to existing material,
bug reports,
and reviews of proposed changes are all welcome.
## Contributor Agreement
By contributing,
you agree that we may redistribute your work under [our license](LICENSE.md).
In exchange,
we will address your issues and/or assess your change proposal as promptly as we can,
and help you become a member of our community.
Everyone involved in [Software Carpentry][swc-site] and [Data Carpentry][dc-site]
agrees to abide by our [code of conduct](CONDUCT.md).
## How to Contribute
The easiest way to get started is to file an issue
to tell us about a spelling mistake,
some awkward wording,
or a factual error.
This is a good way to introduce yourself
and to meet some of our community members.
1. If you do not have a [GitHub][github] account,
you can [send us comments by email][contact].
However,
we will be able to respond more quickly if you use one of the other methods described below.
2. If you have a [GitHub][github] account,
or are willing to [create one][github-join],
but do not know how to use Git,
you can report problems or suggest improvements by [creating an issue][new-issue].
This allows us to assign the item to someone
and to respond to it in a threaded discussion.
3. If you are comfortable with Git,
and would like to add or change material,
you can submit a pull request (PR).
Instructions for doing this are [included below](#using-github).
## Where to Contribute
1. If you wish to change this lesson,
please work in <https://github.com/swcarpentry/git-novice>,
which can be viewed at <https://swcarpentry.github.io/git-novice>.
2. If you wish to change the example lesson,
please work in <https://github.com/swcarpentry/lesson-example>,
which documents the format of our lessons
and can be viewed at <https://swcarpentry.github.io/lesson-example>.
3. If you wish to change the template used for workshop websites,
please work in <https://github.com/swcarpentry/workshop-template>.
The home page of that repository explains how to set up workshop websites,
while the extra pages in <https://swcarpentry.github.io/workshop-template>
provide more background on our design choices.
4. If you wish to change CSS style files, tools,
or HTML boilerplate for lessons or workshops stored in `_includes` or `_layouts`,
please work in <https://github.com/swcarpentry/styles>.
## What to Contribute
There are many ways to contribute,
from writing new exercises and improving existing ones
to updating or filling in the documentation
and submitting [bug reports][new-issue]
about things that don't work, aren't clear, or are missing.
If you are looking for ideas,
please see [the list of issues for this repository][issues],
or the issues for [Data Carpentry][dc-issues]
and [Software Carpentry][swc-issues] projects.
Comments on issues and reviews of pull requests are just as welcome:
we are smarter together than we are on our own.
Reviews from novices and newcomers are particularly valuable:
it's easy for people who have been using these lessons for a while
to forget how impenetrable some of this material can be,
so fresh eyes are always welcome.
## What *Not* to Contribute
Our lessons already contain more material than we can cover in a typical workshop,
so we are usually *not* looking for more concepts or tools to add to them.
As a rule,
if you want to introduce a new idea,
you must (a) estimate how long it will take to teach
and (b) explain what you would take out to make room for it.
The first encourages contributors to be honest about requirements;
the second, to think hard about priorities.
We are also not looking for exercises or other material that only run on one platform.
Our workshops typically contain a mixture of Windows, Mac OS X, and Linux users;
in order to be usable,
our lessons must run equally well on all three.
## Using GitHub
If you choose to contribute via GitHub,
you may want to look at
[How to Contribute to an Open Source Project on GitHub][how-contribute].
In brief:
1. The published copy of the lesson is in the `gh-pages` branch of the repository
(so that GitHub will regenerate it automatically).
Please create all branches from that,
and merge the [master repository][repo]'s `gh-pages` branch into your `gh-pages` branch
before starting work.
Please do *not* work directly in your `gh-pages` branch,
since that will make it difficult for you to work on other contributions.
2. We use [GitHub flow][github-flow] to manage changes:
1. Create a new branch in your desktop copy of this repository for each significant change.
2. Commit the change in that branch.
3. Push that branch to your fork of this repository on GitHub.
4. Submit a pull request from that branch to the [master repository][repo].
5. If you receive feedback,
make changes on your desktop and push to your branch on GitHub:
the pull request will update automatically.
Each lesson has two maintainers who review issues and pull requests
or encourage others to do so.
The maintainers are community volunteers,
and have final say over what gets merged into the lesson.
## Other Resources
General discussion of [Software Carpentry][swc-site] and [Data Carpentry][dc-site]
happens on the [discussion mailing list][discuss-list],
which everyone is welcome to join.
You can also [reach us by email][contact].
[contact]: mailto:admin@software-carpentry.org
[dc-issues]: https://github.com/issues?q=user%3Adatacarpentry
[dc-lessons]: http://datacarpentry.org/lessons/
[dc-site]: http://datacarpentry.org/
[discuss-list]: http://lists.software-carpentry.org/listinfo/discuss
[github]: http://github.com
[github-flow]: https://guides.github.com/introduction/flow/
[github-join]: https://github.com/join
[how-contribute]: https://egghead.io/series/how-to-contribute-to-an-open-source-project-on-github
[new-issue]: https://github.com/swcarpentry/git-novice/issues/new
[issues]: https://github.com/swcarpentry/git-novice/issues/
[repo]: https://github.com/swcarpentry/git-novice/
[swc-issues]: https://github.com/issues?q=user%3Aswcarpentry
[swc-lessons]: http://software-carpentry.org/lessons/
[swc-site]: http://software-carpentry.org/
---
layout: page
title: "Licenses"
permalink: /license/
---
## Instructional Material
All Software Carpentry and Data Carpentry instructional material is
made available under the [Creative Commons Attribution
license][cc-by-human]. The following is a human-readable summary of
(and not a substitute for) the [full legal text of the CC BY 4.0
license][cc-by-legal].
You are free:
* to **Share**---copy and redistribute the material in any medium or format
* to **Adapt**---remix, transform, and build upon the material
for any purpose, even commercially.
The licensor cannot revoke these freedoms as long as you follow the
license terms.
Under the following terms:
* **Attribution**---You must give appropriate credit (mentioning that
your work is derived from work that is Copyright © Software
Carpentry and, where practical, linking to
http://software-carpentry.org/), provide a [link to the
license][cc-by-human], and indicate if changes were made. You may do
so in any reasonable manner, but not in any way that suggests the
licensor endorses you or your use.
**No additional restrictions**---You may not apply legal terms or
technological measures that legally restrict others from doing
anything the license permits. With the understanding that:
Notices:
* You do not have to comply with the license for elements of the
material in the public domain or where your use is permitted by an
applicable exception or limitation.
* No warranties are given. The license may not give you all of the
permissions necessary for your intended use. For example, other
rights such as publicity, privacy, or moral rights may limit how you
use the material.
## Software
Except where otherwise noted, the example programs and other software
provided by Software Carpentry and Data Carpentry are made available under the
[OSI][osi]-approved
[MIT license][mit-license].
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
## Trademark
"Software Carpentry" and "Data Carpentry" and their respective logos
are registered trademarks of [NumFOCUS][numfocus].
[cc-by-human]: https://creativecommons.org/licenses/by/4.0/
[cc-by-legal]: https://creativecommons.org/licenses/by/4.0/legalcode
[mit-license]: http://opensource.org/licenses/mit-license.html
[numfocus]: http://numfocus.org/
[osi]: http://opensource.org
## ========================================
## Commands for both workshop and lesson websites.
# Settings
MAKEFILES=Makefile $(wildcard *.mk)
JEKYLL=jekyll
PARSER=bin/markdown_ast.rb
DST=_site
# Controls
.PHONY : commands clean files
.NOTPARALLEL:
all : commands
## commands : show all commands.
commands :
@grep -h -E '^##' ${MAKEFILES} | sed -e 's/## //g'
## serve : run a local server.
serve : lesson-md
${JEKYLL} serve
## site : build files but do not run a server.
site : lesson-md
${JEKYLL} build
# repo-check : check repository settings.
repo-check :
@bin/repo_check.py -s .
## clean : clean up junk files.
clean :
@rm -rf ${DST}
@rm -rf .sass-cache
@rm -rf bin/__pycache__
@find . -name .DS_Store -exec rm {} \;
@find . -name '*~' -exec rm {} \;
@find . -name '*.pyc' -exec rm {} \;
## clean-rmd : clean intermediate R files (that need to be committed to the repo).
clear-rmd :
@rm -rf ${RMD_DST}
@rm -rf fig/rmd-*
## ----------------------------------------
## Commands specific to workshop websites.
.PHONY : workshop-check
## workshop-check : check workshop homepage.
workshop-check :
@bin/workshop_check.py .
## ----------------------------------------
## Commands specific to lesson websites.
.PHONY : lesson-check lesson-md lesson-files lesson-fixme
# RMarkdown files
RMD_SRC = $(wildcard _episodes_rmd/??-*.Rmd)
RMD_DST = $(patsubst _episodes_rmd/%.Rmd,_episodes/%.md,$(RMD_SRC))
# Lesson source files in the order they appear in the navigation menu.
MARKDOWN_SRC = \
index.md \
CONDUCT.md \
setup.md \
$(wildcard _episodes/*.md) \
reference.md \
$(wildcard _extras/*.md) \
LICENSE.md
# Generated lesson files in the order they appear in the navigation menu.
HTML_DST = \
${DST}/index.html \
${DST}/conduct/index.html \
${DST}/setup/index.html \
$(patsubst _episodes/%.md,${DST}/%/index.html,$(wildcard _episodes/*.md)) \
${DST}/reference/index.html \
$(patsubst _extras/%.md,${DST}/%/index.html,$(wildcard _extras/*.md)) \
${DST}/license/index.html
## lesson-md : convert Rmarkdown files to markdown
lesson-md : ${RMD_DST}
# Use of .NOTPARALLEL makes rule execute only once
${RMD_DST} : ${RMD_SRC}
@bin/knit_lessons.sh ${RMD_SRC}
## lesson-check : validate lesson Markdown.
lesson-check :
@bin/lesson_check.py -s . -p ${PARSER} -r _includes/links.md
## lesson-check-all : validate lesson Markdown, checking line lengths and trailing whitespace.
lesson-check-all :
@bin/lesson_check.py -s . -p ${PARSER} -l -w
## lesson-figures : re-generate inclusion displaying all figures.
lesson-figures :
@bin/extract_figures.py -p ${PARSER} ${MARKDOWN_SRC} > _includes/all_figures.html
## unittest : run unit tests on checking tools.
unittest :
python bin/test_lesson_check.py
## lesson-files : show expected names of generated files for debugging.
lesson-files :
@echo 'RMD_SRC:' ${RMD_SRC}
@echo 'RMD_DST:' ${RMD_DST}
@echo 'MARKDOWN_SRC:' ${MARKDOWN_SRC}
@echo 'HTML_DST:' ${HTML_DST}
## lesson-fixme : show FIXME markers embedded in source files.
lesson-fixme :
@fgrep -i -n FIXME ${MARKDOWN_SRC} || true
#-------------------------------------------------------------------------------
# Include extra commands if available.
#-------------------------------------------------------------------------------
-include commands.mk
git-novice
==========
An introduction to version control for novices using Git.
Please see <https://swcarpentry.github.io/git-novice/> for a rendered version of this material,
[the lesson template documentation][lesson-example]
for instructions on formatting, building, and submitting material,
or run `make` in this directory for a list of helpful commands.
Maintainers:
* [Ivan Gonzalez][gonzalez_ivan]
* [Daisie Huang][huang_daisie]
[gonzalez_ivan]: http://software-carpentry.org/team/#gonzalez_ivan
[huang_daisie]: http://software-carpentry.org/team/#huang_daisie
[lesson-example]: https://swcarpentry.github.io/lesson-example
#------------------------------------------------------------
# Values for this lesson.
#------------------------------------------------------------
# Which carpentry is this ("swc" or "dc")?
carpentry: "swc"
# Overall title for pages.
title: "Version Control with Git"
# Contact email address.
email: lessons@software-carpentry.org
#------------------------------------------------------------
# Generic settings (should not need to change).
#------------------------------------------------------------
# What kind of thing is this ("workshop" or "lesson")?
kind: "lesson"
# Magic to make URLs resolve both locally and on GitHub.
# See https://help.github.com/articles/repository-metadata-on-github-pages/.
repository: <USERNAME>/<PROJECT>
# Sites.
amy_site: "https://amy.software-carpentry.org/workshops"
dc_site: "http://datacarpentry.org"
swc_github: "https://github.com/swcarpentry"
swc_site: "https://software-carpentry.org"
swc_pages: "https://swcarpentry.github.io"
template_repo: "https://github.com/swcarpentry/styles"
example_repo: "https://github.com/swcarpentry/lesson-example"
example_site: "https://swcarpentry.github.com/lesson-example"
workshop_repo: "https://github.com/swcarpentry/workshop-template"
workshop_site: "https://swcarpentry.github.io/workshop-template"
training_site: "https://swcarpentry.github.io/instructor-training"
# Surveys.
pre_survey: "https://www.surveymonkey.com/r/swc_pre_workshop_v1?workshop_id="
post_survey: "https://www.surveymonkey.com/r/swc_post_workshop_v1?workshop_id="
# Start time in minutes (0 to be clock-independent, 540 to show a start at 09:00 am)
start_time: 0
# Specify that things in the episodes collection should be output.
collections:
episodes:
output: true
permalink: /:path/
extras:
output: true
# Set the default layout for things in the episodes collection.
defaults:
- values:
root: ..
- scope:
path: ""
type: episodes
values:
layout: episode
# Files and directories that are not to be copied.
exclude:
- Makefile
- bin
# Turn off built-in syntax highlighting.
highlighter: false
---
title: Automated Version Control
teaching: 5
exercises: 0
questions:
- "What is version control and why should I use it?"
objectives:
- "Understand the benefits of an automated version control system."
- "Understand the basics of how Git works."
keypoints:
- "Version control is like an unlimited 'undo'."
- "Version control also allows many people to work in parallel."
---
We'll start by exploring how version control can be used
to keep track of what one person did and when.
Even if you aren't collaborating with other people,
automated version control is much better than this situation:
[![Piled Higher and Deeper by Jorge Cham, http://www.phdcomics.com/comics/archive_print.php?comicid=1531](../fig/phd101212s.png)](http://www.phdcomics.com)
"Piled Higher and Deeper" by Jorge Cham, http://www.phdcomics.com
We've all been in this situation before: it seems ridiculous to have
multiple nearly-identical versions of the same document. Some word
processors let us deal with this a little better, such as Microsoft
Word's [Track Changes](https://support.office.com/en-us/article/Track-changes-in-Word-197ba630-0f5f-4a8e-9a77-3712475e806a), Google Docs' [version
history](https://support.google.com/docs/answer/190843?hl=en), or LibreOffice's [Recording and Displaying Changes](https://help.libreoffice.org/Common/Recording_and_Displaying_Changes).
Version control systems start with a base version of the document and
then save just the changes you made at each step of the way. You can
think of it as a tape: if you rewind the tape and start at the base
document, then you can play back each change and end up with your
latest version.
![Changes Are Saved Sequentially](../fig/play-changes.svg)
Once you think of changes as separate from the document itself, you
can then think about "playing back" different sets of changes onto the
base document and getting different versions of the document. For
example, two users can make independent sets of changes based on the
same document.
![Different Versions Can be Saved](../fig/versions.svg)
Unless there are conflicts, you can even play two sets of changes onto the same base document.
![Multiple Versions Can be Merged](../fig/merge.svg)
A version control system is a tool that keeps track of these changes for us and
helps us version and merge our files. It allows you to
decide which changes make up the next version, called a
[commit]({{ page.root }}/reference/#commit), and keeps useful metadata about them. The
complete history of commits for a particular project and their metadata make up
a [repository]({{ page.root }}/reference/#repository). Repositories can be kept in sync
across different computers facilitating collaboration among different people.
> ## The Long History of Version Control Systems
>
> Automated version control systems are nothing new.
> Tools like RCS, CVS, or Subversion have been around since the early 1980s and are used by many large companies.
> However, many of these are now becoming considered as legacy systems due to various limitations in their capabilities.
> In particular, the more modern systems, such as Git and [Mercurial](http://swcarpentry.github.io/hg-novice/)
> are *distributed*, meaning that they do not need a centralized server to host the repository.
> These modern systems also include powerful merging tools that make it possible for multiple authors to work within
> the same files concurrently.
{: .callout}
> ## Paper Writing
>
> * Imagine you drafted an excellent paragraph for a paper you are writing, but later ruin it. How would you retrieve
> the *excellent* version of your conclusion? Is it even possible?
>
> * Imagine you have 5 co-authors. How would you manage the changes and comments they make to your paper?
> If you use LibreOffice Writer or Microsoft Word, what happens if you accept changes made using the
> `Track Changes` option? Do you have a history of those changes?
{: .challenge}
---
title: Setting Up Git
teaching: 5
exercises: 0
questions:
- "How do I get set up to use Git?"
objectives:
- "Configure `git` the first time it is used on a computer."
- "Understand the meaning of the `--global` configuration flag."
keypoints:
- "Use `git config` to configure a user name, email address, editor, and other preferences once per machine."
---
When we use Git on a new computer for the first time,
we need to configure a few things. Below are a few examples
of configurations we will set as we get started with Git:
* our name and email address,
* to colorize our output,
* what our preferred text editor is,
* and that we want to use these settings globally (i.e. for every project)
On a command line, Git commands are written as `git verb`,
where `verb` is what we actually want to do. So here is how
Dracula sets up his new laptop:
~~~
$ git config --global user.name "Vlad Dracula"
$ git config --global user.email "vlad@tran.sylvan.ia"
$ git config --global color.ui "auto"
~~~
{: .bash}
Please use your own name and email address instead of Dracula's. This user name and email will be associated with your subsequent Git activity,
which means that any changes pushed to
[GitHub](http://github.com/),
[BitBucket](http://bitbucket.org/),
[GitLab](http://gitlab.com/) or
another Git host server
in a later lesson will include this information.
For these lessons, we will be interacting with [GitHub](http://github.com/) and so the email address used should be the same as the one used when setting up your GitHub account. If you are concerned about privacy, please review [GitHub's instructions for keeping your email address private][git-privacy].
If you elect to use a private email address with GitHub, then use that same email address for the `user.email` value, e.g. `username@users.noreply.github.com` replacing `username` with your GitHub one. You can change the email address later on by using the `git config` command again.
Dracula also has to set his favorite text editor, following this table:
| Editor | Configuration command |
|:-------------------|:-------------------------------------------------|
| Atom | `$ git config --global core.editor "atom --wait"`|
| nano | `$ git config --global core.editor "nano -w"` |
| Text Wrangler (Mac) | `$ git config --global core.editor "edit -w"` |
| Sublime Text (Mac) | `$ git config --global core.editor "subl -n -w"` |
| Sublime Text (Win, 32-bit install) | `$ git config --global core.editor "'c:/program files (x86)/sublime text 3/sublime_text.exe' -w"` |
| Sublime Text (Win, 64-bit install) | `$ git config --global core.editor "'c:/program files/sublime text 3/sublime_text.exe' -w"` |
| Notepad++ (Win, 32-bit install) | `$ git config --global core.editor "'c:/program files (x86)/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"`|
| Notepad++ (Win, 64-bit install) | `$ git config --global core.editor "'c:/program files/Notepad++/notepad++.exe' -multiInst -notabbar -nosession -noPlugin"`|
| Kate (Linux) | `$ git config --global core.editor "kate"` |
| Gedit (Linux) | `$ git config --global core.editor "gedit --wait --new-window"` |
| Scratch (Linux) | `$ git config --global core.editor "scratch-text-editor"` |
| emacs | `$ git config --global core.editor "emacs"` |
| vim | `$ git config --global core.editor "vim"` |
It is possible to reconfigure the text editor for Git whenever you want to change it.
> ## Exiting Vim
>
> Note that `vim` is the default editor for for many programs, if you haven't used `vim` before and wish to exit a session, type `Esc` then `:q!` and `Enter`.
{: .callout}
The four commands we just ran above only need to be run once: the flag `--global` tells Git
to use the settings for every project, in your user account, on this computer.
You can check your settings at any time:
~~~
$ git config --list
~~~
{: .bash}
You can change your configuration as many times as you want: just use the
same commands to choose another editor or update your email address.
> ## Proxy
>
> In some networks you need to use a
> [proxy](https://en.wikipedia.org/wiki/Proxy_server). If this is the case, you
> may also need to tell Git about the proxy:
>
> ~~~
> $ git config --global http.proxy proxy-url
> $ git config --global https.proxy proxy-url
> ~~~
> {: .bash}
>
> To disable the proxy, use
>
> ~~~
> $ git config --global --unset http.proxy
> $ git config --global --unset https.proxy
> ~~~
> {: .bash}
{: .callout}
> ## Git Help and Manual
>
> Always remember that if you forget a `git` command, you can access the list of commands by using `-h` and access the Git manual by using `--help` :
>
> ~~~
> $ git config -h
> $ git config --help
> ~~~
> {: .bash}
{: .callout}
[git-privacy]: https://help.github.com/articles/keeping-your-email-address-private/
---
title: Creating a Repository
teaching: 10
exercises: 0
questions:
- "Where does Git store information?"
objectives:
- "Create a local Git repository."
keypoints:
- "`git init` initializes a repository."
---
Once Git is configured,
we can start using it.
Let's create a directory for our work and then move into that directory:
~~~
$ mkdir planets
$ cd planets
~~~
{: .bash}
Then we tell Git to make `planets` a [repository]({{ page.root }}/reference/#repository)—a place where
Git can store versions of our files:
~~~
$ git init
~~~
{: .bash}
If we use `ls` to show the directory's contents,
it appears that nothing has changed:
~~~
$ ls
~~~
{: .bash}
But if we add the `-a` flag to show everything,
we can see that Git has created a hidden directory within `planets` called `.git`:
~~~
$ ls -a
~~~
{: .bash}
~~~
. .. .git
~~~
{: .output}
Git stores information about the project in this special sub-directory.
If we ever delete it,
we will lose the project's history.
We can check that everything is set up correctly
by asking Git to tell us the status of our project:
~~~
$ git status
~~~
{: .bash}
~~~
# On branch master
#
# Initial commit
#
nothing to commit (create/copy files and use "git add" to track)
~~~
{: .output}
> ## Places to Create Git Repositories
>
> Dracula starts a new project, `moons`, related to his `planets` project.
> Despite Wolfman's concerns, he enters the following sequence of commands to
> create one Git repository inside another:
>
> ~~~
> $ cd # return to home directory
> $ mkdir planets # make a new directory planets
> $ cd planets # go into planets
> $ git init # make the planets directory a Git repository
> $ mkdir moons # make a sub-directory planets/moons
> $ cd moons # go into planets/moons
> $ git init # make the moons sub-directory a Git repository
> ~~~
> {: .bash}
>
> Why is it a bad idea to do this? (Notice here that the `planets` project is now also tracking the entire `moons` repository.)
> How can Dracula undo his last `git init`?
>
> > ## Solution
> >
> > Git repositories can interfere with each other if they are "nested" in the
> > directory of another: the outer repository will try to version-control
> > the inner repository. Therefore, it's best to create each new Git
> > repository in a separate directory. To be sure that there is no conflicting
> > repository in the directory, check the output of `git status`. If it looks
> > like the following, you are good to go to create a new repository as shown
> > above:
> >
> > ~~~
> > $ git status
> > ~~~
> > {: .bash}
> > ~~~
> > fatal: Not a git repository (or any of the parent directories): .git
> > ~~~
> > {: .output}
> >
> > Note that we can track files in directories within a Git:
> >
> > ~~~
> > $ touch moon phobos deimos titan # create moon files
> > $ cd .. # return to planets directory
> > $ ls moons # list contents of the moons directory
> > $ git add moons/* # add all contents of planets/moons
> > $ git status # show moons files in staging area
> > $ git commit -m "add moon files" # commit planets/moons to planets Git repository
> > ~~~
> > {: .bash}
> >
> > Similarly, we can ignore (as discussed later) entire directories, such as the `moons` directory:
> >
> > ~~~
> > $ nano .gitignore # open the .gitignore file in the texteditor to add the moons directory
> > $ cat .gitignore # if you run cat afterwards, it should look like this:
> > ~~~
> > {: .bash}
> >
> > ~~~
> > moons
> > ~~~
> > {: .output}
> >
> > To recover from this little mistake, Dracula can just remove the `.git`
> > folder in the moons subdirectory. To do so he can run the following command from inside the 'moons' directory:
> >
> > ~~~
> > $ rm -rf moons/.git
> > ~~~
> > {: .bash}
> >
> > But be careful! Running this command in the wrong directory, will remove
> > the entire git-history of a project you might wanted to keep. Therefore, always check your current directory using the
> > command `pwd`.
> {: .solution}
{: .challenge}
This diff is collapsed.
This diff is collapsed.
---
title: Ignoring Things
teaching: 5
exercises: 0
questions:
- "How can I tell Git to ignore files I don't want to track?"
objectives:
- "Configure Git to ignore specific files."
- "Explain why ignoring files can be useful."
keypoints:
- "The `.gitignore` file tells Git what files to ignore."
---
What if we have files that we do not want Git to track for us,
like backup files created by our editor
or intermediate files created during data analysis.
Let's create a few dummy files:
~~~
$ mkdir results
$ touch a.dat b.dat c.dat results/a.out results/b.out
~~~
{: .bash}
and see what Git says:
~~~
$ git status
~~~
{: .bash}
~~~
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
a.dat
b.dat
c.dat
results/
nothing added to commit but untracked files present (use "git add" to track)
~~~
{: .output}
Putting these files under version control would be a waste of disk space.
What's worse,
having them all listed could distract us from changes that actually matter,
so let's tell Git to ignore them.
We do this by creating a file in the root directory of our project called `.gitignore`:
~~~
$ nano .gitignore
$ cat .gitignore
~~~
{: .bash}
~~~
*.dat
results/
~~~
{: .output}
These patterns tell Git to ignore any file whose name ends in `.dat`
and everything in the `results` directory.
(If any of these files were already being tracked,
Git would continue to track them.)
Once we have created this file,
the output of `git status` is much cleaner:
~~~
$ git status
~~~
{: .bash}
~~~
On branch master
Untracked files:
(use "git add <file>..." to include in what will be committed)
.gitignore
nothing added to commit but untracked files present (use "git add" to track)
~~~
{: .output}
The only thing Git notices now is the newly-created `.gitignore` file.
You might think we wouldn't want to track it,
but everyone we're sharing our repository with will probably want to ignore
the same things that we're ignoring.
Let's add and commit `.gitignore`:
~~~
$ git add .gitignore
$ git commit -m "Add the ignore file"
$ git status
~~~
{: .bash}
~~~
# On branch master
nothing to commit, working directory clean
~~~
{: .output}
As a bonus, using `.gitignore` helps us avoid accidentally adding to the repository files that we don't want to track:
~~~
$ git add a.dat
~~~
{: .bash}
~~~
The following paths are ignored by one of your .gitignore files:
a.dat
Use -f if you really want to add them.
~~~
{: .output}
If we really want to override our ignore settings,
we can use `git add -f` to force Git to add something. For example,
`git add -f a.dat`.
We can also always see the status of ignored files if we want:
~~~
$ git status --ignored
~~~
{: .bash}
~~~
On branch master
Ignored files:
(use "git add -f <file>..." to include in what will be committed)
a.dat
b.dat
c.dat
results/
nothing to commit, working directory clean
~~~
{: .output}
> ## Ignoring Nested Files
>
> Given a directory structure that looks like:
>
> ~~~
> results/data
> results/plots
> ~~~
> {: .bash}
>
> How would you ignore only `results/plots` and not `results/data`?
>
> > ## Solution
> >
> > As with most programming issues, there are a few ways that you
> > could solve this. If you only want to ignore the contents of
> > `results/plots`, you can change your `.gitignore` to ignore
> > only the `/plots/` subfolder by adding the following line to
> > your .gitignore:
> >
> > `results/plots/`
> >
> > If, instead, you want to ignore everything in `/results/`, but wanted to track
> > `results/data`, then you can add `results/` to your .gitignore
> > and create an exception for the `results/data/` folder.
> > The next challenge will cover this type of solution.
> >
> > Sometimes the `**` pattern comes in handy, too, which matches
> > multiple directory levels. E.g. `**/results/plots/*` would make git ignore
> > the `results/plots` directory in any root directory.
> {: .solution}
{: .challenge}
> ## Including Specific Files
>
> How would you ignore all `.data` files in your root directory except for
> `final.data`?
> Hint: Find out what `!` (the exclamation point operator) does
>
> > ## Solution
> >
> > You would add the following two lines to your .gitignore:
> >
> > ~~~
> > *.data # ignore all data files
> > !final.data # except final.data
> > ~~~
> > {: .output}
> >
> > The exclamation point operator will include a previously excluded entry.
> {: .solution}
{: .challenge}
> ## Ignoring all data Files in a Directory
>
> Given a directory structure that looks like:
>
> ~~~
> results/data/position/gps/a.data
> results/data/position/gps/b.data
> results/data/position/gps/c.data
> results/data/position/gps/info.txt
> results/plots
> ~~~
> {: .bash}
>
> What's the shortest `.gitignore` rule you could write to ignore all `.data`
> files in `result/data/position/gps`? Do not ignore the `info.txt`.
>
> > ## Solution
> >
> > Appending `results/data/position/gps/*.data` will match every file in `results/data/position/gps` that ends with `.data`.
> > The file `results/data/position/gps/info.txt` will not be ignored.
> {: .solution}
{: .challenge}
> ## The Order of Rules
>
> Given a `.gitignore` file with the following contents:
>
> ~~~
> *.data
> !*.data
> ~~~
> {: .bash}
>
> What will be the result?
>
> > ## Solution
> >
> > The `!` modifier will negate an entry from a previously defined ignore pattern.
> > Because the `!*.data` entry negates all of the previous `.data` files in the `.gitignore`,
> > none of them will be ignored, and all `.data` files will be tracked.
> >
> {: .solution}
{: .challenge}
> ## Log Files
>
> You wrote a script that creates many intermediate log-files of the form `log_01`, `log_02`, `log_03`, etc.
> You want to keep them but you do not want to track them through `git`.
>
> 1. Write **one** `.gitignore` entry that excludes files of the form `log_01`, `log_02`, etc.
>
> 2. Test your "ignore pattern" by creating some dummy files of the form `log_01`, etc.
>
> 3. You find that the file `log_01` is very important after all, add it to the tracked files without changing the `.gitignore` again.
>
> 4. Discuss with your neighbor what other types of files could reside in your directory that you do not want to track and thus would exclude via `.gitignore`.
>
> > ## Solution
> >
> > 1. append either `log_*` or `log*` as a new entry in your .gitignore
> > 3. track `log_01` using `git add -f log_01`
> {: .solution}
{: .challenge}
This diff is collapsed.
---
title: Collaborating
teaching: 25
exercises: 0
questions:
- "How can I use version control to collaborate with other people?"
objectives:
- "Clone a remote repository."
- "Collaborate pushing to a common repository."
keypoints:
- "`git clone` copies a remote repository to create a local repository with a remote called `origin` automatically set up."
---
For the next step, get into pairs. One person will be the "Owner" and the other
will be the "Collaborator". The goal is that the Collaborator add changes into
the Owner's repository. We will switch roles at the end, so both persons will
play Owner and Collaborator.
> ## Practicing By Yourself
>
> If you're working through this lesson on your own, you can carry on by opening
> a second terminal window.
> This window will represent your partner, working on another computer. You
> won't need to give anyone access on GitHub, because both 'partners' are you.
{: .callout}
The Owner needs to give the Collaborator access.
On GitHub, click the settings button on the right,
then select Collaborators, and enter your partner's username.
![Adding Collaborators on GitHub](../fig/github-add-collaborators.png)
To accept access to the Owner's repo, the Collaborator
needs to go to [https://github.com/notifications](https://github.com/notifications).
Once there she can accept access to the Owner's repo.
Next, the Collaborator needs to download a copy of the Owner's repository to her
machine. This is called "cloning a repo". To clone the Owner's repo into
her `Desktop` folder, the Collaborator enters:
~~~
$ git clone https://github.com/vlad/planets.git ~/Desktop/vlad-planets
~~~
{: .bash}
Replace 'vlad' with the Owner's username.
![After Creating Clone of Repository](../fig/github-collaboration.svg)
The Collaborator can now make a change in her clone of the Owner's repository,
exactly the same way as we've been doing before:
~~~
$ cd ~/Desktop/vlad-planets
$ nano pluto.txt
$ cat pluto.txt
~~~
{: .bash}
~~~
It is so a planet!
~~~
{: .output}
~~~
$ git add pluto.txt
$ git commit -m "Add notes about Pluto"
~~~
{: .bash}
~~~
1 file changed, 1 insertion(+)
create mode 100644 pluto.txt
~~~
{: .output}
Then push the change to the *Owner's repository* on GitHub:
~~~
$ git push origin master
~~~
{: .bash}
~~~
Counting objects: 4, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 306 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To https://github.com/vlad/planets.git
9272da5..29aba7c master -> master
~~~
{: .output}
Note that we didn't have to create a remote called `origin`: Git uses this
name by default when we clone a repository. (This is why `origin` was a
sensible choice earlier when we were setting up remotes by hand.)
Take a look to the Owner's repository on its GitHub website now (maybe you need
to refresh your browser.) You should be able to see the new commit made by the
Collaborator.
To download the Collaborator's changes from GitHub, the Owner now enters:
~~~
$ git pull origin master
~~~
{: .bash}
~~~
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (2/2), done.
remote: Total 3 (delta 0), reused 3 (delta 0)
Unpacking objects: 100% (3/3), done.
From https://github.com/vlad/planets
* branch master -> FETCH_HEAD
Updating 9272da5..29aba7c
Fast-forward
pluto.txt | 1 +
1 file changed, 1 insertion(+)
create mode 100644 pluto.txt
~~~
{: .output}
Now the three repositories (Owner's local, Collaborator's local, and Owner's on
GitHub) are back in sync.
> ## A Basic Collaborative Workflow
>
> In practice, it is good to be sure that you have an updated version of the
> repository you are collaborating on, so you should `git pull` before making
> our changes. The basic collaborative workflow would be:
>
> * update your local repo with `git pull origin master`,
> * make your changes and stage them with `git add`,
> * commit your changes with `git commit -m`, and
> * upload the changes to GitHub with `git push origin master`
>
> It is better to make many commits with smaller changes rather than
> of one commit with massive changes: small commits are easier to
> read and review.
{: .callout}
> ## Switch Roles and Repeat
>
> Switch roles and repeat the whole process.
{: .challenge}
> ## Review Changes
>
> The Owner push commits to the repository without giving any information
> to the Collaborator. How can the Collaborator find out what has changed with
> command line? And on GitHub?
>
> > ## Solution
> > On the command line, the Collaborator can use ```git fetch origin master```
> > to get the remote changes into the local repository, but without merging
> > them. Then by running ```git diff master origin/master``` the Collaborator
> > will see the changes output in the terminal.
> >
> > On GitHub, the Collaborator can go to their own fork of the repository and
> > look right above the light blue latest commit bar for a gray bar saying
> > "This branch is 1 commit behind Our-Respository:master." On the far right of
> > that gray bar is a Compare icon and link. On the Compare page the
> > Collaborator should change the base fork to their own repository, then click
> > the link in the paragraph above to "compare across forks", and finally
> > change the head fork to the main repository. This will show all the commits
> > that are different.
> {: .solution}
{: .challenge}
> ## Comment Changes in GitHub
>
> The Collaborator has some questions about one line change made by the Owner and
> has some suggestions to propose.
>
> With GitHub, it is possible to comment the diff of a commit. Over the line of
> code to comment, a blue comment icon appears to open a comment window.
>
> The Collaborator posts its comments and suggestions using GitHub interface.
{: .challenge}
> ## Version History, Backup, and Version Control
>
> Some backup software can keep a history of the versions of your files. They also
> allows you to recover specific versions. How is this functionality different from version control?
> What are some of the benifits of using version control, Git and GitHub?
{: .challenge}
This diff is collapsed.
---
title: Open Science
teaching: 5
exercises: 5
questions:
- "How can version control help me make my work more open?"
objectives:
- "Explain how a version control system can be leveraged as an electronic lab notebook for computational work."
keypoints:
- "Open scientific work is more useful and more highly cited than closed."
---
> The opposite of "open" isn't "closed".
> The opposite of "open" is "broken".
>
> --- John Wilbanks
{: .quotation}
Free sharing of information might be the ideal in science,
but the reality is often more complicated.
Normal practice today looks something like this:
* A scientist collects some data and stores it on a machine
that is occasionally backed up by her department.
* She then writes or modifies a few small programs
(which also reside on her machine)
to analyze that data.
* Once she has some results,
she writes them up and submits her paper.
She might include her data—a growing number of journals require this—but
she probably doesn't include her code.
* Time passes.
* The journal sends her reviews written anonymously by a handful of other people in her field.
She revises her paper to satisfy them,
during which time she might also modify the scripts she wrote earlier,
and resubmits.
* More time passes.
* The paper is eventually published.
It might include a link to an online copy of her data,
but the paper itself will be behind a paywall:
only people who have personal or institutional access
will be able to read it.
For a growing number of scientists,
though,
the process looks like this:
* The data that the scientist collects is stored in an open access repository
like [figshare](http://figshare.com/) or
[Zenodo](http://zenodo.org), possibly as soon as it's collected,
and given its own
[Digital Object Identifier](https://en.wikipedia.org/wiki/Digital_object_identifier) (DOI).
Or the data was already published and is stored in
[Dryad](http://datadryad.org/).
* The scientist creates a new repository on GitHub to hold her work.
* As she does her analysis,
she pushes changes to her scripts
(and possibly some output files)
to that repository.
She also uses the repository for her paper;
that repository is then the hub for collaboration with her colleagues.
* When she's happy with the state of her paper,
she posts a version to [arXiv](http://arxiv.org/)
or some other preprint server
to invite feedback from peers.
* Based on that feedback,
she may post several revisions
before finally submitting her paper to a journal.
* The published paper includes links to her preprint
and to her code and data repositories,
which makes it much easier for other scientists
to use her work as starting point for their own research.
This open model accelerates discovery:
the more open work is,
[the more widely it is cited and re-used](http://dx.doi.org/10.1371/journal.pone.0000308).
However,
people who want to work this way need to make some decisions
about what exactly "open" means and how to do it. You can find more on the different aspects of Open Science in [this book](http://link.springer.com/book/10.1007/978-3-319-00026-8).
This is one of the (many) reasons we teach version control.
When used diligently,
it answers the "how" question
by acting as a shareable electronic lab notebook for computational work:
* The conceptual stages of your work are documented, including who did
what and when. Every step is stamped with an identifier (the commit ID)
that is for most intents and purposes unique.
* You can tie documentation of rationale, ideas, and other
intellectual work directly to the changes that spring from them.
* You can refer to what you used in your research to obtain your
computational results in a way that is unique and recoverable.
* With a distributed version control system such as Git, the version
control repository is easy to archive for perpetuity, and contains
the entire history.
> ## Making Code Citable
>
> [This short guide](https://guides.github.com/activities/citable-code/) from GitHub
> explains how to create a Digital Object Identifier (DOI) for your code,
> your papers,
> or anything else hosted in a version control repository.
{: .callout}
> ## How Reproducible Is My Work?
>
> Ask one of your labmates to reproduce a result you recently obtained
> using only what they can find in your papers or on the web.
> Try to do the same for one of their results,
> then try to do it for a result from a lab you work with.
{: .challenge}
> ## How to Find an Appropriate Data Repository?
>
> Surf the internet for a couple of minutes and check out the data repositories
> mentioned above: [Figshare](http://figshare.com/), [Zenodo](http://zenodo.org),
> [Dryad](http://datadryad.org/). Depending on your field of research, you might
> find community-recognized repositories that are well-known in your field.
> You might also find useful [these data repositories recommended by Nature](
> http://www.nature.com/sdata/data-policies/repositories).
> Discuss with your neighbor which data repository you might want to
> approach for your current project and explain why.
{: .challenge}
> ## Can I Also Publish Code?
>
> There are many new ways to publish code and to make it citable. One
> way is described [on the homepage of GitHub itself](
> https://guides.github.com/activities/citable-code/).
> Basically it's a combination of GitHub (where the code is) and Zenodo (the
> repository creating the DOI). Read through this page while being aware
> that this is only one of many ways to making your code citable.
{: .challenge}
---
title: Licensing
teaching: 5
exercises: 0
questions:
- "What licensing information should I include with my work?"
objectives:
- "Explain why adding licensing information to a repository is important."
- "Choose a proper license."
- "Explain differences in licensing and social expectations."
keypoints:
- "People who incorporate GPL'd software into their own software must make their software also open under the GPL license; most other open licenses do not require this."
- "The Creative Commons family of licenses allow people to mix and match requirements and restrictions on attribution, creation of derivative works, further sharing, and commercialization."
- "People who are not lawyers should not try to write licenses from scratch."
---
When a repository with source code, a manuscript or other creative
works becomes public, it should include a file `LICENSE` or
`LICENSE.txt` in the base directory of the repository that clearly
states under which license the content is being made available. This
is because creative works are automatically eligible for intellectual
property (and thus copyright) protection. Reusing creative works
without a license is dangerous, because the copyright holders could
sue you for copyright infringement.
A license solves this problem by granting rights to others (the
licensees) that they would otherwise not have. What rights are being
granted under which conditions differs, often only slightly, from one
license to another. In practice, a few licenses are by far the most
popular, and [choosealicense.com](http://choosealicense.com/) will
help you find a common license that suits your needs. Important
considerations include:
* Whether you want to address patent rights.
* Whether you require people distributing derivative works to also
distribute their source code.
* Whether the content you are licensing is source code.
* Whether you want to license the code at all.
Choosing a licence that is in common use makes life easier for
contributors and users, because they are more likely to already be
familiar with the license and don't have to wade through a bunch of
jargon to decide if they're ok with it. The [Open Source
Inititative](http://opensource.org/licenses) and [Free Software
Foundation](http://www.gnu.org/licenses/license-list.html) both
maintain lists of licenses which are good choices.
[This article][software-licensing] provides an excellent overview of
licensing and licensing options from the perspective of scientists who
also write code.
At the end of the day what matters is that there is a clear statement
as to what the license is. Also, the license is best chosen from the
get-go, even if for a repository that is not public. Pushing off the
decision only makes it more complicated later, because each time a new
collaborator starts contributing, they, too, hold copyright and will
thus need to be asked for approval once a license is chosen.
> ## Can I Use Open License?
>
> Find out whether you are allowed to apply an open license to your software.
> Can you do this unilaterally,
> or do you need permission from someone in your institution?
> If so, who?
{: .challenge}
> ## What licenses have I already accepted?
>
> Many of the software tools we use on a daily basis (including in this workshop) are
> released as open-source software. Pick a project on GitHub from the list below, or
> one of your own choosing. Find its license (usually in a file called `LICENSE` or
> `COPYING`) and talk about how it restricts your use of the software. Is it one of
> the licenses discussed in this session? How is it different?
> - [Git](https://github.com/git/git), the source-code management tool
> - [CPython](https://github.com/python/cpython), the standard implementation of the Python language
> - [Jupyter](https://github.com/jupyter), the project behind the web-based Python notebooks we'll be using
> - [EtherPad](https://github.com/ether/etherpad-lite), a real-time collaborative editor
{: .challenge}
[software-licensing]: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002598
---
title: Citation
teaching: 2
exercises: 0
questions:
- "How can I make my work easier to cite?"
objectives:
- "Make your work easy to cite"
keypoints:
- "Add a CITATION file to a repository to explain how you want your work cited."
---
You may want to include a file called `CITATION` or `CITATION.txt`
that describes how to reference your project;
the [one for Software
Carpentry](https://github.com/swcarpentry/website/blob/gh-pages/CITATION)
states:
~~~
To reference Software Carpentry in publications, please cite both of the following:
Greg Wilson: "Software Carpentry: Getting Scientists to Write Better
Code by Making Them More Productive". Computing in Science &
Engineering, Nov-Dec 2006.
Greg Wilson: "Software Carpentry: Lessons Learned". arXiv:1307.5448,
July 2013.
@article{wilson-software-carpentry-2006,
author = {Greg Wilson},
title = {Software Carpentry: Getting Scientists to Write Better Code by Making Them More Productive},
journal = {Computing in Science \& Engineering},
month = {November--December},
year = {2006},
}
@online{wilson-software-carpentry-2013,
author = {Greg Wilson},
title = {Software Carpentry: Lessons Learned},
version = {1},
date = {2013-07-20},
eprinttype = {arxiv},
eprint = {1307.5448}
}
~~~
{: .source}
---
title: Hosting
teaching: 10
exercises: 0
questions:
- "Where should I host my version control repositories?"
objectives:
- "Explain different options for hosting scientific work."
keypoints:
- "Projects can be hosted on university servers, on personal domains, or on public forges."
- "Rules regarding intellectual property and storage of sensitive information apply no matter where code and data are hosted."
---
The second big question for groups that want to open up their work is where to
host their code and data. One option is for the lab, the department, or the
university to provide a server, manage accounts and backups, and so on. The
main benefit of this is that it clarifies who owns what, which is particularly
important if any of the material is sensitive (i.e., relates to experiments
involving human subjects or may be used in a patent application). The main
drawbacks are the cost of providing the service and its longevity: a scientist
who has spent ten years collecting data would like to be sure that data will
still be available ten years from now, but that's well beyond the lifespan of
most of the grants that fund academic infrastructure.
Another option is to purchase a domain and pay an Internet service provider
(ISP) to host it. This gives the individual or group more control, and
sidesteps problems that can arise when moving from one institution to another,
but requires more time and effort to set up than either the option above or the
option below.
The third option is to use a public hosting service like
[GitHub](http://github.com), [GitLab](http://gitlab.com),
[BitBucket](http://bitbucket.org), or [SourceForge](http://sourceforge.net).
Each of these services provides a web interface that enables people to create,
view, and edit their code repositories. These services also provide
communication and project management tools including issue tracking, wiki pages,
email notifications, and code reviews. These services benefit from economies of
scale and network effects: it's easier to run one large service well than to run
many smaller services to the same standard. It's also easier for people to
collaborate. Using a popular service can help connect your project with
communities already using the same service.
As an example, Software Carpentry [is on
GitHub]({{ swc_github }}) where you can find the [source for this
page]({{page.root}}/_episodes/13-hosting.md).
Anyone with a GitHub account can suggest changes to this text.
Using large, well-established services can also help you quickly take advantage
of powerful tools. One such tool, continuous integration (CI), can
automatically run software builds and tests whenever code is committed or pull
requests are submitted. Direct integration of CI with an online hosting service
means this information is present in any pull request, and helps maintain code
integrity and quality standards. While CI is still available in self-hosted
situations, there is much less setup and maintenance involved with using an
online service. Furthermore, such tools are often provided free of charge to
open source projects, and are also available for private repositories for a fee.
> ## Institutional Barriers
>
> Sharing is the ideal for science,
> but many institutions place restrictions on sharing,
> for example to protect potentially patentable intellectual property.
> If you encounter such restrictions,
> it can be productive to inquire about the underlying motivations
> either to request an exception for a specific project or domain,
> or to push more broadly for institutional reform to support more open science.
{: .callout}
> ## Can My Work Be Public?
>
> Find out whether you are allowed to host your work openly on a public forge.
> Can you do this unilaterally,
> or do you need permission from someone in your institution?
> If so, who?
{: .challenge}
> ## Where Can I Share My Work?
>
> Does your institution have a repository or repositories that you can
> use to share your papers, data and software? How do institutional repositories
> differ from services like [arXiV](http://arxiv.org/), [figshare](http://figshare.com/) and [GitHub](http://github.com/)?
{: .challenge}
---
title: Using Git from RStudio
teaching: 10
exercises: 0
questions:
- "How can I use Git with RStudio?"
objectives:
- "Understand how to use Git from RStudio."
keypoints:
- "Create an RStudio project"
---
Since version control is so useful when developing scripts, RStudio has built-in
integration with Git. There are some more obscure Git features that you still
need to use the command-line for, but RStudio has a nice interface for most
common operations.
RStudio let's you create a [project][rstudio-projects] associated with
a given directory. This is a way to keep track of related files. One
of the way to keep track of them is via version control! To get
started using RStudio for version control, let's make a new project:
![](../fig/RStudio_screenshot_newproject.png)
This will pop up a window asking us how we want to create the project. We have
some options here. Let's say that we want to use RStudio with the planets
repository that we already made. Since that repository lives in a directory on
our computer, we'll choose "existing directory":
![](../fig/RStudio_screenshot_existingdirectory.png)
> ## Do You See a "Version Control" Option?
>
> Although we're not going to use it here, there should be a "version control"
> option on this menu. That is what you would click on if you wanted to
> create a project on your computer by cloning a repository from github.
> If that option is not present, it probably means that RStudio doesn't know
> where your Git executable is. See
> [this page](https://stat545-ubc.github.io/git03_rstudio-meet-git.html)
> for some debugging advice. Even if you have Git installed, you may need
> to accept the XCode license if you are using MacOSX.
{: .callout}
Next, RStudio will ask which existing directory we want to use. Click "browse"
to navigate to the correct directory on your computer, then click "create
project":
![](../fig/RStudio_screenshot_navigateexisting.png)
Ta-da! Now you have an R project containing your repository. Notice the
vertical "Git" menu that is now on the menu bar. This means RStudio has
recognized that this directory is a git repository, so it's giving you tools
to use Git:
![](../fig/RStudio_screenshot_afterclone.png)
To edit the files in your repository, you can click on them from the panel in
the lower right. Let's add some more information about Pluto:
![](../fig/RStudio_screenshot_editfiles.png)
Once we have saved our edited files, we can also use RStudio to commit these changes. Go to the git menu and click
"commit":
![](../fig/RStudio_screenshot_commit.png)
This will bring up a screen where you can select which files to commit (check
the boxes in the "staged" column) and enter a commit message (in the upper
right). The icons in the "status" column indicate the current status of each
file. You can also see the changes to each file by clicking on its name. Once
everything is the way you want it, click "commit":
![](../fig/RStudio_screenshot_review.png)
You can push these changes by selecting "push" from the Git menu. There are
also options there to pull from a remote version of the repository, and view
the history:
![](../fig/RStudio_screenshot_history.png)
> ## Are the Push/Pull Commands Grayed Out?
>
> If this is the case, it generally means that RStudio doesn't know the
> location of any other version of your repository (i.e. the one on GitHub).
> To fix this, open a terminal to the repository and enter the command:
> `git push -u origin master`. Then restart RStudio.
{: .callout}
If we click on "history", we can see a pretty graphical version of what
`git log` would tell us:
![](../fig/RStudio_screenshot_viewhistory.png)
RStudio creates some files that is uses to keep track of your project. You
generally don't want to track these, so adding them to your .gitignore file
is a good idea:
![](../fig/RStudio_screenshot_gitignore.png)
There are many more features buried in the RStudio git interface, but these
should be enough to get you started!
[rstudio-projects]: https://support.rstudio.com/hc/en-us/articles/200526207-Using-Projects
---
layout: page
title: About
permalink: /about/
---
{% include carpentries.html %}
This diff is collapsed.
---
layout: page
title: Figures
permalink: /figures/
---
{% include all_figures.html %}
This diff is collapsed.
<p><img alt="Adding Collaborators on GitHub" src="../fig/github-add-collaborators.png" /></p>
<hr/>
<p><img alt="After Creating Clone of Repository" src="../fig/github-collaboration.svg" /></p>
<hr/>
<p><img alt="The Git Staging Area" src="../fig/git-staging-area.svg" /></p>
<hr/>
<p><img alt="The Git Commit Workflow" src="../fig/git-committing.svg" /></p>
<hr/>
<p><img alt="Piled Higher and Deeper by Jorge Cham, http://www.phdcomics.com/comics/archive_print.php?comicid=1531" src="../fig/phd101212s.png" /></p>
<hr/>
<p><img alt="Changes Are Saved Sequentially" src="../fig/play-changes.svg" /></p>
<hr/>
<p><img alt="Different Versions Can be Saved" src="../fig/versions.svg" /></p>
<hr/>
<p><img alt="Multiple Versions Can be Merged" src="../fig/merge.svg" /></p>
<hr/>
<p><img alt="Git Checkout" src="../fig/git-checkout.svg" /></p>
<hr/>
<p><img alt="http://figshare.com/articles/How_Git_works_a_cartoon/1328266" src="../fig/git_staging.svg" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_newproject.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_existingdirectory.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_navigateexisting.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_afterclone.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_editfiles.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_commit.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_review.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_history.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_viewhistory.png" /></p>
<hr/>
<p><img alt="" src="../fig/RStudio_screenshot_gitignore.png" /></p>
<hr/>
<p><img alt="Creating a Repository on GitHub (Step 1)" src="../fig/github-create-repo-01.png" /></p>
<hr/>
<p><img alt="Creating a Repository on GitHub (Step 2)" src="../fig/github-create-repo-02.png" /></p>
<hr/>
<p><img alt="Creating a Repository on GitHub (Step 3)" src="../fig/github-create-repo-03.png" /></p>
<hr/>
<p><img alt="Freshly-Made GitHub Repository" src="../fig/git-freshly-made-github-repo.svg" /></p>
<hr/>
<p><img alt="Where to Find Repository URL on GitHub" src="../fig/github-find-repo-string.png" /></p>
<hr/>
<p><img alt="Changing the Repository URL on GitHub" src="../fig/github-change-repo-string.png" /></p>
<hr/>
<p><img alt="GitHub Repository After First Push" src="../fig/github-repo-after-first-push.svg" /></p>
<hr/>
<p><img alt="The Conflicting Changes" src="../fig/conflict.svg" /></p>
{% comment %}
Display key points of all episodes for reference.
{% endcomment %}
<h2>Key Points</h2>
<table class="table table-striped">
{% for episode in site.episodes %}
{% unless episode.break %}
<tr>
<td class="col-md-3">
<a href="{{ page.root }}{{ episode.url }}">{{ episode.title }}</a>
</td>
<td class="col-md-9">
<ul>
{% for keypoint in episode.keypoints %}
<li>{{ keypoint|markdownify }}</li>
{% endfor %}
</ul>
</td>
</tr>
{% endunless %}
{% endfor %}
</table>
{% comment %}
General description of Software and Data Carpentry.
{% endcomment %}
<div class="row">
<div class="col-md-2" align="center">
<a href="{{ site.swc_site }}"><img src="{{ page.root }}/assets/img/swc-icon-blue.svg" alt="Software Carpentry logo" /></a>
</div>
<div class="col-md-8">
Since 1998,
<a href="{{ site.swc_site }}">Software Carpentry</a>
has been teaching researchers in science, engineering, medicine, and related disciplines
the computing skills they need to get more done in less time and with less pain.
Its volunteer instructors have run hundreds of events
for thousands of learners in the past two and a half years.
</div>
</div>
<br/>
<div class="row">
<div class="col-md-2" align="center">
<a href="{{ site.dc_site }}"><img src="{{ page.root }}/assets/img/dc-icon-black.svg" alt="Data Carpentry logo" /></a>
</div>
<div class="col-md-8">
<a href="{{ site.dc_site }}">Data Carpentry</a> develops and teaches workshops on the fundamental data skills needed to conduct research.
Its target audience is researchers who have little to no prior computational experience,
and its lessons are domain specific,
building on learners' existing knowledge to enable them to quickly apply skills learned to their own research.
</div>
</div>
<br/>
<div class="row">
<div class="col-md-2" align="center">
<a href="{{ site.lc_site }}"><img src="{{ page.root }}/assets/img/lc-icon-black.svg" alt="Library Carpentry logo" /></a>
</div>
<div class="col-md-8">
Library Carpentry is made by librarians to help librarians
automate repetitive, boring, error-prone tasks;
create, maintain and analyse sustainable and reusable data;
work effectively with IT and systems colleagues;
better understand the use of software in research;
and much more.
Library Carpentry was the winner of the 2016
<a href="http://labs.bl.uk/British+Library+Labs+Awards">British Library Labs Teaching and Learning Award</a>.
</div>
</div>
<p>
<a href="{{site.dc_site}}">Data Carpentry</a>
aims to help researchers get their work done
in less time and with less pain
by teaching them basic research computing skills.
This hands-on workshop will cover basic concepts and tools,
including program design, version control, data management,
and task automation.
Participants will be encouraged to help one another
and to apply what they have learned to their own research problems.
</p>
<p align="center">
<em>
For more information on what we teach and why,
please see our paper
"<a href="http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745">Best Practices for Scientific Computing</a>".
</em>
</p>
<div class="row">
<div class="col-md-6">
<h3>Day 1</h3>
<table class="table table-striped">
<tr> <td>09:00</td> <td>Automating tasks with the Unix shell</td> </tr>
<tr> <td>10:30</td> <td>Coffee</td> </tr>
<tr> <td>12:00</td> <td>Lunch break</td> </tr>
<tr> <td>13:00</td> <td>Building programs with Python</td> </tr>
<tr> <td>14:30</td> <td>Coffee</td> </tr>
<tr> <td>16:00</td> <td>Wrap-up</td> </tr>
</table>
</div>
<div class="col-md-6">
<h3>Day 2</h3>
<table class="table table-striped">
<tr> <td>09:00</td> <td>Version control with Git</td> </tr>
<tr> <td>10:30</td> <td>Coffee</td> </tr>
<tr> <td>12:00</td> <td>Lunch break</td> </tr>
<tr> <td>13:00</td> <td>Managing data with SQL</td> </tr>
<tr> <td>14:30</td> <td>Coffee</td> </tr>
<tr> <td>16:00</td> <td>Wrap-up</td> </tr>
</table>
</div>
</div>
This diff is collapsed.
<p id="who">
<strong>Who:</strong>
The course is aimed at graduate students and other researchers.
<strong>
You don't need to have any previous knowledge of the tools
that will be presented at the workshop.
</strong>
</p>
{% comment %}
Display a break's timings in a box similar to a learning episode's.
{% endcomment %}
<blockquote class="objectives">
<h2>Overview</h2>
<div class="row">
<div class="col-md-3">
<strong>Break:</strong> {{ page.break }} min
</div>
<div class="col-md-9">
</div>
</div>
</blockquote>
{% comment %}
Display key points for an episode.
{% endcomment %}
<blockquote class="keypoints">
<h2>Key Points</h2>
<ul>
{% for keypoint in page.keypoints %}
<li>{{ keypoint|markdownify }}</li>
{% endfor %}
</ul>
</blockquote>
This diff is collapsed.
This diff is collapsed.
<div class="row">
<div class="col-md-1">
</div>
<div class="col-md-10">
<h1 class="maintitle">{{ page.title }}</h1>
</div>
<div class="col-md-1">
</div>
</div>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
{% comment %}
Main title for lesson pages.
{% endcomment %}
<h1 class="maintitle"><a href="{{ page.root }}/">{{ site.title }}</a>{% if page.title %}: {{ page.title }}{% endif %}</h1>
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment