Merge vs Rebase: Part 1 - What is a commit hash?

Merge vs Rebase: Part 1 - What is a commit hash?

Table of Contents:

One of the biggest things I struggled with in the past when learning how to use Git was the difference between a merge and a rebase. Most people understand the concept of a merge pretty quickly but get lost when trying to understand what a rebase does differently. In this 3 part post I'm going to cover what the differences are in the simplest way I can. In order to do that, we first need to understand what a commit hash is.

If you've ever looked at your commit history in Git then you've probably seen something like this:

commit a9ca2c9f4e1e0061075aa47cbb97201a43b0f66f
Author: Alex Ford
Date:   Mon Sep 8 6:49:17 2014

Initial commit.

You've probably considered that long string of letters and numbers to be a simple unique ID for that particular commit. While you'd be right, what you may not have known is that it is a generated SHA-1 hash that represents the git commit object. Without going into the gruesome details of git commit objects, just know that it's just a big cryptographic string that is directly generated based on the information it represents. Because it is generated based on the information contained in the commit, the hash cannot be changed. The only way to change a commit hash is to change details about the commit which essentially generates a whole new commit object with a brand new hash.

In addition to all the obvious information such as the commit author, the date, and the stored data, the commit also contains the hash of the commit before it. This is exactly how your commit history is generated. Every commit knows the hash of the commit that immediately preceded it.

In the above image you can see my SourceTree window open for a demo repository I created. I made three commits to that repository. Source Tree is smart enough to read each of the commits in my repository and build a graphical representation of this history. It can see that Commit 2 directly refrences Commit 1 which directly references Commit 0. Keep in mind that I'm referencing the commits by the commit text which I'm entering as I make commits; I'm just making the text be sequential numbers to make it easier to talk about. Real commit messages would describe the changes being made in that commit.

Since my demo repository only contains three commits to the master branch, Source Tree's graph is a simple straight line from each commit to the next. Let's start making things a little more complicated by creating a separate branch to work on a feature.

In the above screenshot you can see that I created a branch called feature1, but the graph is the same. This is because I haven't made any new commits since creating my branch. Branches are really just pointers that point to a specific commit. Right now both master and my feature1 branch are pointed at the exact same commit. Now we'll make some changes and add a new commit to our feature1 branch.

You can see that our feature1 branch has moved to point to a new commit, Commit 3. You can also see that our graph is still a simple straight line. That's because so far there are only four total commits and each commit references the one immediately before it. If I were to merge feature1 into master right now then the only thing that would happen is the master branch would jump up to point at the same commit as feature1, which would be Commit 3. That is called a fast-forward merge because it just moves the master branch pointer up the graph to point at the newer commit.

Okay, so now we're happily over on our feature1 branch toiling away when suddenly the boss calls and says a new bug has been documented and it is top priority. We need to halt our work on feature1 and get a bug fix committed to master right away. To do that we need to check out master and make a commit. If the bug was large we might consider creating another branch and making multiple commits to that branch, but we'll pretend our bug is small and easily fixed in one commit.

Now things are looking a bit different. In the above screenshot I've called your attention to the graph. Notice now that our Commit 3 on our feature1 branch is now off on its own little path in the graph. The reason for this is simple. Both Commit 4 and Commit 3 have the exact same ancestor. Remember how commits store the commit that immediately preceded it? Well when we checked out master, we were then back to Commit 2 since Commit 3 was only referenced by our feature1 branch pointer. The master branch pointer still pointed to Commit 2. Because of this our hotfix commit (Commit 4) listed Commit 2 as the previous commit.

The graph is showing us that both Commit 4 and Commit 3 reference Commit 2 as the previous commit. In this case we would call Commit 2 a common ancestor to commmits 3 and 4. Now that our hotfix is committed we can head back over to our feature1 branch and finish working.

I've gone ahead and made two new commits to our feature1 branch, Commit 5 and Commit 6. Our feature is complete and it's now time to merge our feature1 branch back into master. At this point we can choose to merge our feature1 branch into master or we can rebase feature1 onto master. For now, let's explore what a merge is in part 2.