One Way to Think About git

Recently a coworker told me that, working with git, he was unclear on the benefits of feature or bugfix branches in specific and not very clear on the concept and utility of branches in general. This is an expansion of the explanation I gave him.

diff

The diff program compares files and produces an output explaining the differences.

So say you have two files:

$  cat a.txt
Hello world.
$ cat b.txt
Hello world.
Happy to be here.

Running diff on those files will show the difference between them.

$ diff a.txt b.txt 
1a2
Happy to be here.

The first line of that output is a change command—the letter a stands for “append”, the 2 after it is the line number of the second file to append to the first file, and the 1 before it is the line number of the first file after which the new lines will be appended. So, in other words, “add line 2 of the second file after the first line of the first file”. And the second line is the line to add.

Say these files change.

$ cat a.txt 
Hello world.
This is a great place.
I love it here.
$ cat b.txt 
Hello world.
I'm so happy to be here.
This is a great place.

Then this diff:

$ diff a.txt b.txt 
1a2
> I'm so happy to be here.
3d3
< I love it here.

contains two change instructions. The first says “append line two of the second file (shown here by the line after the >) after line one of the first file”, and the second says “delete line 3 of the first file (shown here by the line after the <), which would have appeared in the second file as line three if it weren’t deleted”.

There’s a variation of this output called the “unified format”, and you can see this verison by including the -u option in the diff command.

$ diff -u a.txt b.txt
1. --- a.txt       2016-04-08 14:15:30.000000000 -0700
2. +++ b.txt       2016-04-08 14:22:52.000000000 -0700
3. @@ -1,3 +1,3 @@
4.  Hello world.
5. +I'm so happy to be here.
6.  This is a great place.
7. -I love it here.

Line by line, this output shows (the line numbers don’t appear in the output—those were added to match the lines in this list):

  1. A reference to the “from” file.
  2. A reference to the “to” file.
  3. A “chunk” header. The number ranges -1,3 and +1,3 show, by the numbers, the line to start the changes on and the number of lines from that one that the following changes apply to, and, by the - and + signs, that the ranges apply to the “from” and “to” files respectively. The lines following a chunk header are change instructions.
  4. A space as the first character of a line in a change chunk indicates a line both files have in common. So this line is present in both files as the first line.
  5. A + as the first character of a line in a change chunk indicates a line present in the “to” file but absent in the “from” file.
  6. Another line both files have in common. In the “from” file, this line would be the second, since it lacks the preceding line. In the “to” file, this line would be the third.
  7. A - as the first character of a line in a change chunk indicates a line present in the “from” file but absent in the “to” file.

Another way to look at this output is as a set of instructions on how to change the “from” file into the “to” file. So to change a.txt into b.txt, you’d add the text on line 5 (as the new second line) and subtract the text in line 7 (the old third line). Changing files in this way—by using this sort of instruction set—is called “patching”.

patch

You can save a set of changes to a file:

$ diff -u a.txt b.txt > changes

And the patch program can read that file and peform those changes:

$ patch < changes
patching file a.txt
$ cat a.txt 
Hello world.
I'm so happy to be here.
This is a great place.
$ cat b.txt 
Hello world.
I'm so happy to be here.
This is a great place.

Now say that you didn’t have the file b.txt. With the changes patch file, you could still apply the changes to a.txt, thereby changing a.txt into b.txt:

$ rm b.txt
$ ls
a.txt           changes
$ cat a.txt
Hello world.
This is a great place.
I love it here.
$ patch a.txt < changes 
patching file a.txt
$ cat a.txt 
Hello world.
I'm so happy to be here.
This is a great place.

Patch files are one way to update software—you could distribute patch files to people that have copies of your program, those people could apply those patches and thereby upgrade to your latest release.

Now imagine the logistics of using patch files to track your releases. To separate the development version of your program from the stable version you might keep two copies, program-stable/ and program-dev/. After you’ve added some new feature to the dev version, say you generate a series of patch files and zip them up, naming the file according to the new release number. Say the difference between your 0.1 and 0.2 release contains a dozen new files and dozens of patches.

A more robust method would be to keep copies of the program according to each release number, like program-0.1-stable, program-0.2-stable, program-0.2.1-stable, program-0.2.1-dev. You could then generate sets of patch files for upgrading from any version to any version (and your users might expect you to).

The complexity of maintaining different versions and different sets of patch files would grow both with the complexity and with the longevity of your program. And this doesn’t factor in the complexity of collaboration, where someone might be in charge of program-0.2.2-new-feature-a and someone else in charge of program-0.2.2-new-feature-b. This also doesn’t factor in the disk space required by all these copies.

git

One way to think about git is as a Grand Unified diff and patch program. It manages all that complexity for you.

A repository is essentially a codebase. A commit is essentially a set of patches. A branch is essentially a set of commits. You can checkout any branch to apply the patches at the HEAD (which is the currently-active commit of the currently-active branch) of that branch, and you can then checkout any previous commit in that branch. If your changes are good you can merge them into another branch (say your main stable branch), or you can revert them if they’re bad. When you’re ready to release your program, you can tag the commit you’ll be sending out with the version number.

To make collaboration and working with backups easy, you can pull and push your codebase to other copies of the codebase. You’ll pull and push the code’s history along with the files, so your coworkers will be able to access the previous changes.

Feature Branches

This is a long way to say that bugfix and feature branches are useful in that they enable you to isolate sets of related changes. Adding one feature could result in many commits. Though it’s true that merging a feature branch into the main branch will bring all of that feature’s commits into the main history, if you want to review the history of a certain feature it’s much more convenient to checkout that feature’s branch, which should only contain the commits specific to that feature, than to hunt through the merged history of the entire project.