(This is the 2nd post in this series. Read the introduction here.)
The first thing to understand about Git, and Distributed Version Control Systems (DVCSs) in general, is the difference between it and the classic Centralized Version Control Systems (CVCSs).
CVCSs have serious disadvantages. For example, if the repository goes down, no one can access a single piece of code; you must lock a file to work on it, denying access to others while you're coding. The most dangerous disadvantage is that if something bad happens to the repository server, the code might be lost all together if there is no backup.
Now, in DVCSs, there is not only the last snapshot of the code, but the whole repository. Each machine pointing to a remote repository server contains the entire repository. This means that two people can work in the same file simultaneously, this may lead to a collision or conflict, but DVCSs let you, and others, to resolve those conflicts (I'll talk about that later).
Let's install it.
$ sudo apt-get install git
How does Git work?
This is the classic approach, where each modification to the code at a certain point in time is considered like a set of changes:
However, Git's approach is to think about your set of files as a single snapshot for a certain point in time:
So, you have a set of files for version 1 and another for version 2 instead of the original version 1 and the set of deltas for the subsequent versions. This doesn't mean that Git makes a physical copy of each file for the coming versions. That would be inefficient. If certain file wasn't modified, then Git makes a link to the previous file that wasn't changed at all.
You simply can not modify something in a Git repository without Git being aware of it. This is very important because, as we will see later, Git won't let you do illegal operations.
Git uses a SHA-1 hash to do the checksumming, which is a string of 40 hexadecimal characters, that Git will use to address each version the repository holds. This string is calculated upon the content of each new version.
What follows might be considered Git's "back bone," so pay attention!
- Commited: Data is part now of the local database
- Modified: Data has been modified but isn't in your local database yet.
- Staged: Modified data that has been marked in its current version to go into the next commit.
Things to keep in mind:
- Git directory is where Git stores all the metadata and the object database that holds your project. This is where your data goes when you clone a (remote) repository.
- The working directory is where data is decompressed out of Git's database, placed on disk and available to work with.
- The staging area is a simple file within the Git directory that stores information about what is going to be included in your next commit.
So, the main scenario is:
- You start to modify your code in the working directory.
- You send what you consider as "ready" to the staging area (as many times as needed).
- When you are done with all the changes, you finally encapsulate everything in a single commit. This action takes everything that has been staged to add it as a single snapshot in the repository.
Basic Git configuration
There are several things you can configure globally in Git. What follows are, perhaps, the most important ones, however you can take a deeper look to other variables.
$ git config --global user.name "John Doe" $ git config --global user.email email@example.com $ git config --global core.editor vim $ git config --global merge.tool vim $ git config --list
If you need immediate help there's enough information right there in the terminal:
$ git help <verb> $ git <verb> --help $ man git-<verb>
For further information, take a look at http://git-scm.com/book/en/Getting-Started-First-Time-Git-Setup
This might be Git's killer feature. Imagine your Git repository is a huge tree, with the main line of code composing the trunk and the code derived from an initial code composes all the branches coming from your trunk.
Lets assume we have three files and we want to stage them. As I said before, this action checksums each one, storing them in the Git repository and referring to each one of them as a blob (represented here by the pink squares at the right).
$ git commit
Then, via git commit, we take a snapshot of that version (all the directories and files in the working directory). When we perform this action, we are creating a tree object, which is checksummed too and it's the snapshot of the version, as well as a commit object (also checksummed), a metadata object that points to the snapshot (the tree) so it can recreates that specific version.
So now we have our first commit, but what about the subsequent commits? The following commits point to the previous commit.
Up to this point, we know every version in git is a commit, a series of linked snapshots. Keeping that in mind, lets talk about branching.
In this graphic, we have a series of commits and two branches: master, which is the main branch (trunk) in our repository, the default one, and another branch called testing. A branch is nothing else but a pointer to a specific commit. Here, both branches are pointing to the same commit, however if a modification is made in each one of them (even the same file), we'll see how a bifurcation appears in the path. In the graphic, there is another pointer called HEAD, which tells you which branch are you currently working with.
$ git checkout <branch>
Checkout is the way you choose a specific branch. Keep the last graphic in mind and follow the changes per line to reach the next graphic.
$ git checkout testing $ vim index.html $ git commit -a -m 'adding an index' $ git checkout master $ vim index.html $ git commit -a -m 'adding an index on main branch'
Git's Branching and Merging
$ git merge
So far we know how to make branches, but how about integrating those changes into the main branch? Here's where Git's merging principle appears.
Assume the following scenario:
We have two branches: hotfix and iss53, each one of them has diverged from master and we want master to have the changes made on hotfix. Check out the following flow of commands and how our repository map gets modified:
$ git checkout master $ git merge hotfix Updating f42c576..3a0874c Fast-forward index.html | 1 - 1 file changed, 1 deletion(-)
Now, let's remove hotfix and modify iss53.
$ git branch -d hotfix Deleted branch hotfix (was 3a0874c). $ git checkout iss53 Switched to branch 'iss53' $ vim index.html $ git commit -a -m 'finished the new footer [issue 53]' [iss53 ad82d7a] finished the new footer [issue 53] 1 file changed, 1 insertion(+)
Let's say you have your work done on the iss53 branch and you want to merge those changes into master. You have two options: you can either to merge directly or keep a cleaner version. What does that mean? Let's check out both of them.
Three way merging
$ git checkout master $ git merge iss53
What happens is that, instead of moving the pointer forward, git creates a new snapshot based on a three way merging basis as the following image illustrates:
What git does here is determine the best ancestor for the divergence, in this case it's C2, because it was 'father' to C4 and C3. So, it's a fast way to get branches merged into master.
Significant commits into master
Sometimes, you want your main trunk to hold only the significant changes, i.e. having a commit (snapshot) for each product change. It does work well with Trello as we're going to see later.
First, you need to make your branch aware of all the changes to master.
$ git rebase master
Which leads to the following map scenario:
Git will rebase the whole iss53 branch to be aware of all those changes made on master. This might take you to a point where you have to resolve some conflicts because there is always a chance that a teammate, or even you, have made a change on the same file in both branches, but I will talk about conflicts later on.
To this point, we are just telling to the iss53 branch that its parent is the latest commit on master. There is nothing significant yet, since you can have more than 2, 3, 4, and so on, commits on that branch. What is significant is achieved by melding all those commits (no matter how many of them) into a single commit that will use the changes in a single, unique commit that can be merged into master once everything looks good.
$ git rebase -i master pick 5d5366a c3 pick d03a833 c5 # Rebase c57580a..d03a833 onto c57580a . . .
As seen in the rebasing, Git is asking us how the commit will look. Then the only commit that must be picked is the first one, the subsequent commits can be squashed, meaning that git will use those commits, but they will be melded into the first one. So, it must be like this:
$ git rebase -i master pick 5d5366a c3 s d03a833 c5 # s or squash have the same effect # Rebase c57580a..d03a833 onto c57580a . . .
Then, we must alter the comments...
# This is a combination of 2 commits. # The first commit's message is: c3 # This is the 2nd commit message: c5 # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # rebase in progress; onto c57580a # You are currently editing a commit while rebasing branch 'iss53' on 'c575 80a'. # # Changes to be committed: #| new file: c3 #| new file: c5
... into this...
# This is a combination of 2 commits. # The first commit's message is: c3 # Please enter the commit message for your changes. Lines starting # with '#' will be ignored, and an empty message aborts the commit. # rebase in progress; onto c57580a # You are currently editing a commit while rebasing branch 'iss53' on 'c575 80a'. # # Changes to be committed: #| new file: c3 #| new file: c5
Where we left only the first comment to match the first commit.
Then, we can do this:
$ git checkout master $ git merge iss53 $ git branch -d iss53
And we will have:
It's very possible to get a conflict message right after a rebase or a merge. That's because git knows that the same file was modified in both branches. First, you must run a status to know where your conflicts are because it might be in more than one file.
$ git status -sb ## master UU index.html
You have to open your index.html file, fix the conflict, and then commit the changes if it was a merge, or a continue if it was done via rebase.
<<<<<<< HEAD <div id="contact">contact : firstname.lastname@example.org</div> ======= <div id="contact"> please contact us at email@example.com </div> >>>>>>> iss53
Here's where you must to talk to your colleagues or check for an updated version of the documentation. What's enclosed between HEAD and the equal signs is what comes from the target branch (in our example master) and what's enclosed between the equal signs and the iss53, is precisely the content on your branch.
Once the conflict is solved and saved, the repository status will look like this:
$ git status -sb ## master M index.html
Tip: As you can see above, there are two pound symbols. The first indicates your changes have been saved into the staging area and the second indicates that your changes have been saved in your working directory but haven't been included in the staging area. M, A, R, D, stands respectively for Modified, Added, Renamed and Deleted. U means updated but unmerged. So, UU means updated but unmerged.
If you did this via merge, it's enough to do:
$ git commit
If you did this via rebase, you must type:
$ git rebase --continue
Working with the remote server
The idea of having a remote server is having it as a central repository where everybody can push their work and pull what others have done. You must not think about your local repository as the final word on your code, since it might be volatile. Instead, the final word would fit perfectly with the remote server role.
Think about all the changes you have done to the master branch in your local repository. There might be four since you updated your repository. This means that your remote master would be 4 commits behind your local master.
In this case, the idea is to update the remote branch with the local (approved) content. It can be done by typing:
$ git push origin master
And then origin/master and master will be synchronized.
The opposite scenario is when your local branch(es) are outdated. So simply run:
$ git pull
It will synchronize only the branches that have automatic pull configured. Otherwise you will need to be more specific:
$ git pull origin iss53
Of course, there are more advanced actions to perform when working with remote repositories, but these are the most common actions to perform.
As we have seen previously, the way to delete local branches is:
$ git branch -d <branch> [<branch2> [<branch3> [<branch4> [...]]]]
To delete branches remotely you can:
$ git push origin :<branch>
However, people always forget to clean the references in the local repository of branches that have been deleted remotely. So you can perform:
$ git remote prune origin
to clean those deleted references. This will list all the branches that were deleted remotely and you still have locally.
To see a list of all your local branches simply use:
$ git branch
and it will show that list.
So far, I've covered the most important things to know about working with git. The next article in the series is intended to give a glimpse on how to use other tools to synchronize teams' work and finally, a video with a simple example.
For further info: http://git-scm.com/book/en/Getting-Started-Installing-Git
This is a series of posts I originally wrote on my personal blog: Using git and other tools to synchronize teams’ work: Working with git
Disclaimer: Some pictures are not hosted in this site, but at http://git-scm.com. They are the intellectual authors of those images, and such graphical assets are licensed under the Creative Commons Attribution 3.0 Unported License. Since those images are taken directly from their site, no modification has been made to them.