Welcome to the first of hopefully many blog posts that will come together to form a full-stack web development tutorial for beginners. I've been wanting to do this for a long time but never took the time to sit down and pull the trigger. Each of these blog posts will have an accompanying YouTube video which I will share here at the top of each post. I wanted to take the time to both write the post and record the video because I know many people learn very differently. I personally enjoy a mixture of video and written tutorials as it often helps really solidify a new concept for me.
So, what is Git? Git is a Version Control System (VCS), also known as Source Control. In the past there have been other forms of source control, each new one offering some benefit over an older one. In the history of source control systems there have been two that stand out the most. Subversion (SVN) and Git. I don't want to spend too much time talking about SVN but I do want to briefly mention the big difference Git brought to the table that pretty much made SVN and other forms of source control obsolete. Before we can do that however, we must first discuss the word "repository". What is a repository?
In the VCS world a repository is basically a database. The goal of a VCS is to turn regular data that you're used to working with, into a database with a version history. So if you normally create text files and put them in a folder on your computer, the goal of a VCS would be to turn that folder into a repository and track changes to those files. It would keep a history and allow you to rewind time to see what files looked like in the past. There are other features of a VCS as well such as the ability to collaborate with others or to branch your work. We will cover those features later.
The big difference between Subversion and Git is in their network architecture. SVN is a server-based VCS. What that means is that the repository that tracks your files actually lives on a remote server. You and your computer are simply a client that talks to that server to check out files, work on them, and check them back in. That's how SVN tracks changes to your files as you work.
The way SVN is designed was pretty great in a world without any other options. When one person checked out a file SVN would lock that file so that nobody else accidentally checked it out to work on at the same time. That helped prevent teams from overwriting their colleagues' work. Unfortunately, it also meant that two people could never work on the same file at the same time. SVN has other downsides too, such as the fact that every interaction with SVN had to communicate over the network to talk to the server. Internet issues sometimes made it impossible to work on your project. Despite all its warts, it elegantly solved a lot of issues in its day. You can certainly still use it today if you choose, but it's popularity has declined tremendously due to the introduction of Distributed Version Control Systems (DVCS).
Git is the popular DVCS of the modern day. There are others such as Mercurial which work quite well, but for a variety of reasons Git swept through the dev world like fire over the last 10 years. Not the least of which being Git's popularity among the Linux community; Microsoft's recent trend of embracing Linux tooling, including Git, has really sealed the deal on making Git the defacto DVCS of today. So what does "distributed" mean?
A distributed VCS is similar to a regular VCS except every client maintains their own version of the repository. In other words, each individual working on a project had their own copy (or clone) of the repository. As a new developer on a project you often start off cloning the project repository to your computer as a first step. But what does this get us that SVN doesn't? A lot, actually. The first and most obvious is the total elimination of network dependency. When I say you clone the remote repository, I literally mean you clone it down to the last detail; it's the same repository as the one on the server at the moment you clone it. This allows you to work on files within the repository right there on your own computer without worrying about having to check those files out from a server via the internet or office network.
Right off the bat you may be able to recognize that having your own copy of the repository allows you to work on the same file as someone else on your team, at the same time. That might get you wondering what happens when two people try to save changes to the same file and push those changes to the remote repository. Well if those two people both modified the same lines in a file then when merged together we'd get a conflict. Conflicts are a topic for a later tutorial, but Git provides some useful tools for helping when that situation arises. SVN also had conflicts. When merging branches together in SVN you would still encounter collisions with changes to files. Resolving conflicts in the days of SVN were a bit of a nightmare, but thankfully with Git they are much simpler to deal with.
So how do we get started with Git?
Enough with all the conceptual speak. How do we actually create a Git repository? Well there are quite a few ways as there is a variety of tools out there for working with Git. The primary and most common way of working with Git is via the command line.
The Git command line is pretty easy to use if you're already comfortable with a terminal. However, if you're like me starting out then the terminal window is big and scary. While I highly encourage you to start getting used to the command line, don't let it stop you from pushing forward as a developer. For the first several years of my career I steered clear of the terminal. There are plenty of graphical interfaces out there for interacting with Git and many other common command line tools, so do feel free to take advantage of them. There are plenty of elitist developers out there who will shame you for not diving into a terminal straight away. Don't listen to them. You do whatever you're comfortable with when starting out because actually writing code is going to get complicated enough. Don't let others tell you you're not a real developer if you don't use the same tooling they do. Anyone who chastises you over stuff like that isn't worth paying attention to.
What I'm going to introduce you to today is a graphical Git interface called Sourcetree by Atlassian. It's completely free and is my favorite Graphical User Interface (GUI) for Git. However it is only available on Windows and OS X so you'll have to poke around and find another if you're on Linux. Though, chances are if you're comfortable in Linux then you're probably comfortable in the terminal ;)
If I recall correctly then I do think the installer for Sourcetree forced me to create a Bitbucket account, which is a hosting provider for Git. I don't use Bitbucket but it was free to create an account. Once installed you should end up with a screen similar to the one below.
As I stated earlier, the first thing you often do as a developer is clone a remote repository to your machine. That's why the software starts you on the remote tab so you can sign into your favorite Git hosting provider account and begin cloning any repositories you have with them. We however don't have any accounts on any Git hosting providers. We're going to start by creating a simple repository on our machine and not worry about hosting it anywhere yet. We'll discuss remote repositories in a later tutorial.
To start we are going to head over to the "Local" tab in Sourcetree and click "Create".
This will bring us to a screen that allows us to browse to a directory on our computer and turn it into a Git repository. It's a good idea to make sure that directory is empty.
Note: Leave the "Create Repository On Account" check box unchecked. We won't be dealing with that in these tutorials. That's just Sourcetree trying to make it convenient for you to also create a repository on a Git hosting provider.
Once you've picked your folder and clicked "Create" you should be presented with a screen that looks like below.
If you've made it this far then congratulations! You've created your very first Git repository. Before we move on to talk about adding files to our repository I first want to discuss a little bit about how Git works. Back in the day SVN used to litter your project with hidden files and folders that would tell SVN that the files in those directories were tracked by SVN. Git is much cleaner. There is literally one hidden directory called
.git in the root of your project folder. If you can't see it then follow these instructions to enable the showing of hidden files in Windows. Showing hidden files on a Mac is a little more complicated because it requires some terminal usage but it's not too bad.
If you delete that
.git folder in your project directory then it would be the equivalent of removing Git from your project. You would be left with a normal directory with your project files in it and no more version control. It's quite nice to know that our projects aren't littered with metadata files like they used to be with SVN.
Note: Git is Git. Meaning, even if you create a repository with Sourcetree or another tool, you can still use other tools such as the command line to interact with it. As long as that
.gitfolder is there then you can use any Git tool you choose to interactt with that repository.
Now that we have our repository created, it's time to add some content to it. Normally we'd be tracking files full of code, but for the sake of this tutorial we'll just add a text file. You can do this quickly in Windows by simply right-clicking inside the project directory and going to
New > Text Document.
Once you've done that go ahead and double-click it. It should open up in a Notepad window. I took the liberty of naming mine "Grocery List.txt" and I've added some items one might need to pick up at the grocery store.
After saving your file you can head back over to Sourcetree. You should notice that it suddenly looks different than it did earlier. It has detected that there is a new file in the working directory. The term "working directory" refers to changes to the project that have yet to be checked into the repository and tracked by Git. They are untracked changes.
If you hover the little question mark icon next to our file you can see that it says "untracked". In order to track it we want to click that little plus sign icon on the right side of our file's row. This will move the file from the "Unstaged files" section up to the "Staged files" section.
Our file still isn't tracked by Git yet, it's just staged. Staging our file is a way to preview all your changes that you're about to commit so you can be sure you're only committing what you want to. For example, if we had tons of files in the "Unstaged files" area we may not want to commit them all to Git yet. In that case we would only stage the files we were interested in, preview them in the "Staged files" area, and then commit them to Git.
Once we're satisfied that the staging area only contains the files we're interested in then we can commit. To do that we simply type a commit message in the box at the bottom of the window and click "Commit".
Note: Always make sure your Git commit messages explain clearly and concisely the changes that you are committing. For the purposes of our demo it's not super important but in the development world it will be. Your colleagues should have a good idea of what your commit contains just by reading your commit message.
Once you've committed your changes you should see Sourcetree return to the way it looked earlier with a message saying "Nothing to commit".
So what did we just do? Well, click on over to the history tab and find out :D
The history tab shows us a history of our commits. You can see now that we've created our first commit. As such there isn't much of a history aside from our one single entry. Let's examine that one entry for a moment.
All the details of our commit can be found in that bottom section of the Sourcetree window. On the right you can see the contents of our file. Notice that they are colored green and have little plus signs to the left of each line. Git is telling us that those lines were added. Since the entire file is brand new in the repository all the text is green. If we had added one line to an existing file in the repo then you would only see that line colored green.
On the left side of that section you can see several details about our commit: the author's name and email, the date the commit was created, and our commit message. Notice also that there is a giant string of letters and numbers at the top next to "Commit:". That is our commit ID. That giant ID is generated based on the content it contains. What that means is that we cannot change anything about that commit or else the ID would have to change as well. For example, to change the commit message we would have to delete the commit entirely and recreate it with the new message. The new commit would have a different ID from the one we deleted. There is no way to modify a commit and keep the same ID.
Let's go ahead and create one more commit so you can see how Git manages your repository as things change. I'm going to open my
Grocery List.txt and add a couple more items to it. After saving the file I'm going to go back to Sourcetree to see if it detected those changes.
Notice now that the icon next to our file in the File Status tab is a yellow icon with three dots. If you hover over it you can see that it says "Modified". If you recall from above it previously said "Untracked". That is because the entire file itself had not yet been committed to Git. This time around Git is aware of our file and is detecting that is has been modified. In the right pane you can see that I added "Cat Food" and "Litter" to the list. However, you might be confused why it appears to show us removing and re-adding "Cucumbers". That is because "Cucumbers" used to be the last line of the file. To add more lines to the file we had to press enter and type out those new lines. To explain what's happening there we first need to discuss how text editors know where line breaks are in a text file.
A line break in a text file is a character just like any other such as a letter or a space. The only difference is that it's a special character that tells the editor to drop down to a new line. This character is often referred to as a Newline character. In Linux and Unix (Mac) there is just one invisible newline character at the end of a line, called a Line Feed (LF) character. Windows actually uses two invisible newline characters at the end of a line, a Carriage Return (CR) character and a line feed character. As such Windows newline characters are often referred to as CRLF. These characters come from way back in the day when monitors weren't yet common and you had to literally print out the output from a computer. If you want to learn more about this there is a good answer over on Stackoverflow.
What we've done above by pressing enter to type new lines into the file, is add a newline character to the end of the "Cucumbers" line. So Git is rightfully detecting that line was modified even though at first glance you might not have understood why since the modification was an invisible newline character. Let's go ahead and commit this new change, remembering to stage our file first.
Now that we've committed our changes to the file we can take a look at the History tab again.
You can see above that Sourcetree is able to draw a line between our two commits in the history. Notice also that we have one new item in the commit details that we did not have on our first commit, a parent commit ID. That is how Sourcetree is able to draw the line graph from our second commit to our first commit. Note that I saw from our second to our first rather than from our first to our second. That is because this graph is drawn backward from the most recent commit all the way back to the oldest one. Remember that our first commit did not have any information describing another commit, but our second commit does. Our second commit has a reference to its parent, our first commit.
What this means is that technically our first commit is not aware of the second commit, but our second commit is aware of our first commit. That is why the graph is drawn backward. You might be wondering why this distinction is important. I'll dive deeper into that later, but for now you just need to know that each commit is only aware of the one commit that came before it. You also need to know that that reference to its parent is factored into the creation of that commit's own ID. Remember how we said that you can't modify any data about a commit without changing its ID? Well that includes the reference to its parent's ID. The implication of this fact is that Git commits essentially form a chain as they are created, with each commit having an immutable (unchangeable) reference to the commit that came before it.
This inherent chaining of commits makes your Git history very strong and reliable. If you somehow modified a commit's data you'd have to regenerate its ID which would cause any commits that came after it to lose reference to it. In order to update those references to point to the new commit, you'd have to regenerate those commits' IDs too. So on and so forth all the way down the chain. This is not a simple task but Git does provide a tool to do that called "rebase". Rebasing is an advanced topic that we will not be covering in this tutorial. I only bring it up here in order to tell you to avoid messing with it because you'll almost certainly end up getting yourself in trouble. Tons of developers use Git on a daily basis and rarely or never use rebase so it's not anything you need to worry about for the foreseeable future.
I believe that about covers our introduction to Git. Thanks for reading! Please feel free to leave comments below with any constructive feedback or questions :)