Recently a coworker told me that, working with
git, he was unclear on the benefits of feature or bugfix branches in specific and not very clear on the concept and utility of branches in general. This is an expansion of the explanation I gave him.
diff program compares files and produces an output explaining the differences.
So say you have two files:
$ cat a.txt Hello world. $ cat b.txt Hello world. Happy to be here.
diff on those files will show the difference between them.
$ diff a.txt b.txt 1a2 Happy to be here.
The first line of that output is a change command—the letter
a stands for “append”, the
2 after it is the line number of the second file to append to the first file, and the
1 before it is the line number of the first file after which the new lines will be appended. So, in other words, “add line 2 of the second file after the first line of the first file”. And the second line is the line to add.
Say these files change.
$ cat a.txt Hello world. This is a great place. I love it here. $ cat b.txt Hello world. I'm so happy to be here. This is a great place.
$ diff a.txt b.txt 1a2 > I'm so happy to be here. 3d3 < I love it here.
contains two change instructions. The first says “append line two of the second file (shown here by the line after the
>) after line one of the first file”, and the second says “delete line 3 of the first file (shown here by the line after the
<), which would have appeared in the second file as line three if it weren’t deleted”.
There’s a variation of this output called the “unified format”, and you can see this verison by including the
-u option in the
$ diff -u a.txt b.txt 1. --- a.txt 2016-04-08 14:15:30.000000000 -0700 2. +++ b.txt 2016-04-08 14:22:52.000000000 -0700 3. @@ -1,3 +1,3 @@ 4. Hello world. 5. +I'm so happy to be here. 6. This is a great place. 7. -I love it here.
Line by line, this output shows (the line numbers don’t appear in the output—those were added to match the lines in this list):
+1,3show, by the numbers, the line to start the changes on and the number of lines from that one that the following changes apply to, and, by the
+signs, that the ranges apply to the “from” and “to” files respectively. The lines following a chunk header are change instructions.
+as the first character of a line in a change chunk indicates a line present in the “to” file but absent in the “from” file.
-as the first character of a line in a change chunk indicates a line present in the “from” file but absent in the “to” file.
Another way to look at this output is as a set of instructions on how to change the “from” file into the “to” file. So to change
b.txt, you’d add the text on line 5 (as the new second line) and subtract the text in line 7 (the old third line). Changing files in this way—by using this sort of instruction set—is called “patching”.
You can save a set of changes to a file:
$ diff -u a.txt b.txt > changes
patch program can read that file and peform those changes:
$ patch < changes patching file a.txt $ cat a.txt Hello world. I'm so happy to be here. This is a great place. $ cat b.txt Hello world. I'm so happy to be here. This is a great place.
Now say that you didn’t have the file
b.txt. With the
changes patch file, you could still apply the changes to
a.txt, thereby changing
$ rm b.txt $ ls a.txt changes $ cat a.txt Hello world. This is a great place. I love it here. $ patch a.txt < changes patching file a.txt $ cat a.txt Hello world. I'm so happy to be here. This is a great place.
Patch files are one way to update software—you could distribute patch files to people that have copies of your program, those people could apply those patches and thereby upgrade to your latest release.
Now imagine the logistics of using patch files to track your releases. To separate the development version of your program from the stable version you might keep two copies,
program-dev/. After you’ve added some new feature to the
dev version, say you generate a series of patch files and
zip them up, naming the file according to the new release number. Say the difference between your
0.2 release contains a dozen new files and dozens of patches.
A more robust method would be to keep copies of the program according to each release number, like
program-0.2.1-dev. You could then generate sets of patch files for upgrading from any version to any version (and your users might expect you to).
The complexity of maintaining different versions and different sets of patch files would grow both with the complexity and with the longevity of your program. And this doesn’t factor in the complexity of collaboration, where someone might be in charge of
program-0.2.2-new-feature-a and someone else in charge of
program-0.2.2-new-feature-b. This also doesn’t factor in the disk space required by all these copies.
One way to think about
git is as a Grand Unified
patch program. It manages all that complexity for you.
A repository is essentially a codebase. A
commit is essentially a set of patches. A
branch is essentially a set of commits. You can
checkout any branch to apply the patches at the
HEAD (which is the currently-active commit of the currently-active branch) of that branch, and you can then
checkout any previous commit in that branch. If your changes are good you can
merge them into another branch (say your main stable branch), or you can
revert them if they’re bad. When you’re ready to release your program, you can
tag the commit you’ll be sending out with the version number.
To make collaboration and working with backups easy, you can
push your codebase to other copies of the codebase. You’ll
push the code’s history along with the files, so your coworkers will be able to access the previous changes.
This is a long way to say that bugfix and feature branches are useful in that they enable you to isolate sets of related changes. Adding one feature could result in many commits. Though it’s true that merging a feature branch into the main branch will bring all of that feature’s commits into the main history, if you want to review the history of a certain feature it’s much more convenient to
checkout that feature’s branch, which should only contain the commits specific to that feature, than to hunt through the merged history of the entire project.