Git rebase, squash...oh my!
Happy little Trees and bumpy Roads
Your first contribution to an open source project can be a very rewarding
experience. Once your feature, fix or enhancement as part of a Github pull
request1 (PR) is merged, your brain is likely going to start looking to
solve the next problem in open source land with your newly acquired git
foo
π
ββοΈ
If you’ve already went through this exercise though, you also know that
mastering the git
Swiss Army knife, and related code collaboration platforms
(Github, etc.), might feel like black art and overwhelming. This is especially
true, when things go south, e.g. the reviewer requests changes to your code.
But even in the case where your code contribution is fine, the reviewer(s) might ask you to perform additional steps on your commits before accepting (“merging”) them into the upstream (“target”) repository.
Here’s a short list of typical reviewer (or bot) comments you might see during a review:
- “You must sign all of your commits!”
- “Please squash your commits before merging.”
- “Can you please rebase your commits onto the latest changes?”
- “Please follow our contribution guidelines and add
XYZ
to the message title/body”
Let me tell you that I’ve been in the same situation several times, both as a contributor and reviewer, e.g. on the VMware Event Broker Appliance (VEBA) project.
I have witnessed many times how quickly an enthusiastic (first-time)
contributor, often with little to no knowledge about git
and software
development, can desperately fail due to tooling, terminology or cryptic error
messages like the one below:
|
|
Over the time I established a set of patterns for my daily work with git
. They
help me to stay organized and protect me from common mistakes, e.g. pushing to
the wrong branch/repository. This is even more important when I work on multiple
PRs in parallel.
Donβt ask how I know π
Of course, I adjust these patterns here and there, depending on the contribution guidelines/workflows of a project or team Iβm working with. I hope you find them useful, especially when you are blocked or frustrated π
This post aims to address some common challenges a git
novice might face
during a contribution, e.g. via Github pull request. The advice and best
practices given here are definitely opinionated based on my own observations and
the way I git
things done, so take them with the typical grain of salt.
Basics or internals of git and related tooling and platforms, such as Github,
won’t be covered. See the end of this post for some useful links
with details on git
(hub) concepts, workflows and internals.
Before you’re git
-ting started
Always make sure you read and understand the projectβs contribution guidelines and follow the provided issue/pull request templates before you start coding and open a pull request.
If the project does not provide any of these, first search for related issues. If your idea/fix has not already been discussed, open an issue to avoid lengthy discussions during a review on why you filed a PR. Upfront, clear and friendly communication is key in an open source project.
How I git
Things done
The nice thing with git
when dealing with repositories, aka remotes
, is that
it does not differ between a remote URL or local folder. We can use this to our
advantage here to show some of concepts in action without creating a repository
on Github.
git
CLI. Thus the latter lacks Github
related primitives to work with issues, releases, pull requests, etc. I highly
recommend installing the Github CLI gh
as a
productivity booster.Let’s create a local demo repository awesome-project
to understand the
patterns explained in the subsequent sections.
|
|
We can verify that a commit was created with the powerful git log
command.
|
|
To avoid having to remember all the different command line flags, I heavily rely
on $SHELL
aliases, e.g. gloga
which is a shortcut to the command above.
$SHELL
and plugin managers, e.g.
ohmyzsh provide useful pre-defined git
aliases. For clarity, I will use the full commands in this post though.Let’s create a couple more commits to make this post a bit more realistic.
|
|
Ignore the text that is added to each commit with the -m
flags for now. I’ll
come back to them in the Commits section.
Forks in the Road
By default, you can’t make direct changes to a repository (unless you are a owner or maintainer). That’s why Github established a fork/pull request workflow for contributions.
A fork creates a point-in-time clone of the original ("origin"
or
"upstream"
) repository in your own account. Your changes always are made in
that forked repository. Then you can create (open) a pull request on the
origin
.
The git
CLI does not have a concept of forks. But we can mimic the workflow
locally with a plain ol’ clone. Even though we won’t be able to cover the full
origin/fork/local_fork_clone
lifecycle with the following examples, they help
to keep the complexity at a minimum, while focusing on useful patterns.
|
|
A nice thing is that git
will remember where the clone was created from and
automatically configures the clone’s main
branch to track origin/main
.
|
|
We can also quickly verify that the clone is identical to the source, i.e.
origin
.
|
|
Here, HEAD -> main
tells you your current position (commit) in the
current repository (well, folder). But also note the additional origin/main
and origin/HEAD
references. They show you the position of the HEAD
and
the main
2 branch in the origin
repository.
And there’s the first catch: in a real-world scenario, more commits might have
been already added to the origin
repository. Thus, it’s important to keep up
with the remote’s changes to avoid issues at a later stage, e.g. merge conflicts
due to concurrent changes on the same code path.
HEAD
, branches or tags like main
respectively v1.0.2
are nothing but
human-readable
references, aka
refs
, in a git
repository pointing to a specific commit SHA.The topic of synchronizing with a remote will be covered in keeping your fork in sync. But since one can easily get lost with all the different repositories, I’ll cover some important naming patterns first.
Naming Conventions
Remotes
The first step I perform after creating a clone is renaming the origin
remote to something more meaningful, e.g. “upstream”.
|
|
To me, upstream/main
reads more naturally. But feel free to pick whatever name
you prefer to not get lost.
git clone
provides a -o (--origin)
flag to provide a custom remote name during
a clone operation.In addition, and to prevent us from accidentally pushing to the wrong remote
,
we can configure a dummy URL for git push
. With this little trick, git
will
prevent us from pushing to the upstream
repository.
|
|
You might be wondering why this is needed since typically you would not have
push
permissions to the upstream anyways. Well, over time you might actually
become a collaborator or even maintainer of a project. That’s great, but if
you’re not careful your commit(s) end up in the wrong repository…especially
when you don’t follow an intuitive remote
naming strategy.
Here’s another tip: If we’d used a real remote repository, e.g. Github, as an
example here, you would have cloned a fork to your local machine. Depending on
the platform and tooling, the fork’s remote
might show up as origin
, whereas
the “real” source repository (“upstream”) might not show up at all when you
perform a git remote show
. As usual, there’s an easy fix.
|
|
Branches
For branch names I settled on using Github issue numbers, which has a couple of benefits:
- I’m forced to create an issue upfront which is a good thing anyways
- I can directly tell which branch belongs to which issue(s)
- I can easily clean up merged branches by pattern matching
Once you work on multiple issues/branches in parallel, this approach has saved me several times from accidentally pushing the wrong code. Whatever naming pattern you pick, as long as it’s consistent (can be automated) you’re fine.
Here is an example using the gh
CLI to create an issue and respective branch.
|
|
git checkout -
brings you back to your previous branch.Once my branches are merged, I can easily clean them up too.
|
|
The -d
flag tells git
to only delete branches which have been fully merged.
Note that this command might not work for you if the repository does not create
merge commits for PRs. In that case (or if you really want to get rid of your
branch for whatever reason), use the brute-force way git branch -D
instead.
Keeping your Fork in Sync
Over time, especially in very active projects, the upstream
repository will be
ahead of your fork
in terms of commits. Thus, you need to keep up with these
upstream
changes and synchronize them into your fork
.
Let’s create a couple more commits in our example awesome-project
to make this
concrete.
|
|
Now we need to switch back to the fork and bring it in sync with upstream
.
|
|
In the above commands, I only fetched the information about changes, but not the
changes (commits) themselves. You can see this because our current position,
i.e. HEAD
, is still pointing to main
at commit SHA aa08aaf
.
My intention here is, that I want to train your muscle memory with an
alternative way instead of the usual git pull
to get the job done.
HEAD
thing, think of it as the “you are
here” marker on a map to indicate the current position. As you traverse through
the map (commits and refs
), HEAD
will move accordingly.Enter Rebase
Rebasing is an amazing,
but often misunderstood, concept in git
. It allows you to move refs
and even
commits onto another commit, branch, or any other ref
.
Concerning our example above, we want to move the current position of our forked
main
branch (HEAD
) to match its upstream/main
counterpart. Since we have a
linear history without any conflicting changes between the two repositories,
bringing main
in sync with upstream/main
is as simple as:
|
|
Because you will perform code changes in dedicated branches, i.e. not main
(or
the corresponding “primary” branch in a project), merge or rebasing conflicts
should never arise. Due to the linear commit history, git rebase
can simply
move the HEAD
to the desired ref
.
My recommendation: rebasing instead of git merge
should become your default
way of synchronizing (integrating) changes from remote
into local branches.
You will see the real strength of git rebase
in the Oh my git
!
section below…
Commits
What goes in a Message?
A commit represents an atomic change in the append-only history in git
,
like in a log. Thus, it’s important that a well crafted commit includes a
meaningful but brief description about the included changes.
Even though the tooling might not enforce a standard, e.g. character limit, etc., there are certain commit best practices you should know and follow. I usually point newcomers to this great post: How to write a commit message.
Read it? Great, let’s move on…
You might have also wondered about the -m
flags and text like "feat:"
used
in the earlier examples when executing git commit
.
Let’s take this example: git commit -s -m "feat: add feature X" -m "Closes: #34"
The -m
flag can be used multiple times and replaces the interactive editor
which otherwise comes up to create a commit title and message. Even multi-line
strings are supported (just keep typing after the first "
).
Depending on the project you’re contributing to, it might use certain patterns
in a commit title or body to craft a
CHANGELOG. Here "feat:"
is recognized
and the commit will be highlighted in a “Feature” section if the project uses
this. Take a look at this example from the
VEBA
project.
If you include a
keyword
like "Closes: #34"
in the commit message body and open a pull request, Github
will try to automatically link the specified issue (here #34
) to the PR so it
gets automatically closed after the PR is merged. Just like with the
aforementioned prefixes, these keywords might be used in a CHANGELOG
or release
note.
Lastly, the -s
flag will add a Signed-off-by
footer which is a good
practice
to traceback the author of a patch and is often required in projects. Don’t
confuse this with digitally
signing your
commits, though.
When to commit?
Perhaps the biggest question is when to create a commit and how to break them up into individual chunks.
The latter really depends on the type of work and project guidelines (if any). In my opinion, everything that belongs together (“atomic”) goes into the same commit, including documentation updates. This makes it simpler to revert or cherrypick changes.
If you work on a larger pull request, likely this can be broken down into sub-tasks which nicely map to individual commits. Alternatively, the PR itself might have to be broken up into multiple PRs with smaller (single) commits.
Creating commits sounds easy, right? Well, often you don’t know upfront the scope of your work and how (when) to break them up. It might also be that the maintainers ask you to do so after the fact (see the following section).
If you’re paranoid like me, you might actually want to commit often and push to a remote (fork) as a cheap backup in case you badly messed up on your local machine…or your disk dies suddenly and the backup (you do backups, don’t you?) is from two days ago π±
Long story short, in most cases you simply don’t know in advance what the final commit history will be. Your (temporary) commit history might actually look like mine, ehhh this one:
|
|
Let me tell you that this is the norm and not bad at all! As usual with git
,
there’s a couple of ways out of this mess. My preferred one is using git reset
once my work is ready for a PR.
Here’s how to fix it based on the commit history above in our fork and the
branch issue-31
:
|
|
The reset
command is super convenient to revert committed changes as if they’d
never been committed - but without losing the actual changes. They get simply
marked as unstaged
in your working directory again. Think of reset
as the
reverse action of commit
.
git checkout .
. This
will discard all remaining changes in your local git repository. checkout
also
accepts a path/file name if you want to be more specific in what gets discarded.The reset
command also comes in handy if you want to rearrange what goes into
a commit. Say you want awesome_script.sh
not in the first but second commit.
Simply (soft) reset your commits to a previous state (commit) as shown above and
create individual commits again with the desired contents.
git reset
has a --hard
option which will discard all
changes. If this is not what you wanted git reflog
is there to help…Oh my git
!
Finally, I am going to cover some of the typical challenges you might run into during your first pull request…or in case you keep forgetting this stuff π
The following sections are broken up into typical conversations you might experience during a PR review.
“How do I commit during reviews?”
Depending on the source code management (SCM) system you’re using, e.g. Github.com, addressing review comments can quickly become an issue - at least for the reviewer.
Most contributors would make the change and then force-push
to the pull
request instead of adding and pushing review commit(s). This way of addressing
review comments means more work for reviewers, especially on large pull (merge)
requests.
Depending on the SCM you use, commits from force-pushes
might not be rendered
nicely so that the reviewer often sees all instead of the incremental changes.
Ideally the contributor creates additional commits during a review because many SCMs have support for incremental changes. Even if your SCM does not support such a functionality, the reviewer can easily inspect the individual commits if needed.
Once the review is done, either all commits are automatically squashed and
merged into one commit (depending on the SCM/repository setup) or one manually
performs a rebase
operation (see further below for ways to do this).
Of course, in that case you would still need to eventually force-push
your
changes. So are we back to the beginning?
Well, yes and no. On Github the reviewer can inspect force-push
changes and
would then see an empty diff, signaling that nothing has changed since the last
review commit3 (see tip above). Furthermore, I expect the SCM vendors and
platform providers to improve the user experience for this common scenario.
Scenario with multiple Review Commits
Problem: You are going through a lengthly review and want to easily merge your review commits eventually.
|
|
Solution:
During the review you are additional commits using the --fixup
flag
to easily squash them once the review is done.
|
|
Here’s our new history.
|
|
Notice how the git
CLI automatically added the fixup!
prefix to the review
commits to indicate that these commits shall be squashed into the corresponding
reference commit (96c6889
).
Once the review is done and you are asked to squash your commits git
will do
most of the work for you.
|
|
“You forgot to sign your Commit(s)”
Often you will be greeted by a friendly π€ stating that you forgot to sign one or more commits. This could also happen after a rebase/squash (see below), where you did not sign the resulting commit.
Scenario with one Commit
Problem: Commit 96c6889
in branch issue-31
has not been signed-off and
is part of a PR in the upstream
repository.
|
|
Solution: Amend the commit and force-push
to the fork (so it gets
reflected in the associated PR).
|
|
Remember: you cannot change existing commits in git
. Thus under the hood,
amending creates a new commit (a2817f0
) with the exact same content/message
and replaces the previous one. This rewrites the history and thus would be
rejected during a git push
to a remote. The flag --force-with-lease
instructs git
to ignore such errors and forcefully overwrite the remote
’s
history too.
--force
in the name can be dangerous
as git
won’t get in your way to protect you. Here we use a preferred option
--force-with-lease
which I highly recommend using instead for various
reasons.Scenario with multiple Commits
Problem: Multiple (or all) commits in a branch are not signed off.
|
|
Solution:
Perform a rebase
against the parent branch (main
) to change all commits in
the current branch (issue-31
). Use --exec
to execute arbitrary commands,
such as signing off by amending.
|
|
-i
option in the rebase
command above and an interactive dialog will
open where you can skip certain commits (and do other fancy rebase stuff).“Please squash your Commits”
Problem: Some repositories might have been set up in a way that they can’t (or won’t) squash (“collapse”) multiple commits of a PR into a single one during merging. The contributor must then squash these commits and (force) push again before merging.
Continuing with our signed-off commits from the example above, there’re two ways you can achieve this.
Option 1: git rebase
|
|
With git rebase
you can interactively perform a squash operation:
|
|
Replace pick
with s
(short for squash) for all but the first commit and exit
the editor (e.g. :wq
in vim
):
|
|
Upon exit, a new editor window will pop up allowing you to specify the title and
message for the resulting (single) commit. Squashing preserves commit details
which you can use to craft a proper message. If you don’t want to reuse the
previous commit details, use f
(short for fixup
) instead of s
in the step
above.
Once you’re done don’t forget to push to your fork/PR:
|
|
vim
supports multi-line
editing
with CTRL-V
so you don’t have to replace every single pick
line?Option 2: git reset
This is my preferred option in this case, as it’s usually the faster approach but your mileage may vary π
|
|
The only drawback I can see here is that you have to retype your commit
title/message, unlike with squash
which preserves these details.
“Your PR needs a Rebase”
Problem: The repository owners configured the target (“base”) branch to only
allow merges from up-to-date branches. In most cases though, this message will
come up when conflicting changes were introduced to the base branch (e.g.
main
) while you were working on your PR.
Solution: The actual fix to this problem is similar to what we did in the
squash scenario above. We continue our example with
branch issue-31
and execute the following steps within that branch.
|
|
The above history tells you that your HEAD
is still originating from main
(which is stale, i.e. behind upstream/main
). We could first sync the recent
changes from upstream/main
into main
, but this is not required for this
exercise.
Instead, we move (rebase
) our HEAD
(commit) onto upstream/main
so our
branch is up to date again.
|
|
git rebase --abort
.After we verified that everything worked, we can push again. As we are not
changing history, i.e. HEAD
moves forward, a force
option is not required.
|
|
“Please adjust your Commit Message”
Problem: You might have forgotten to include some details, like a prefix or issue reference in one or more commit messages.
Scenario with one Commit
Solution: Amend your commit.
|
|
Scenario with multiple Commits
Solution: Perform an interactive rebase. Let’s continue with our previous
issue-31
example branch which for this exercise contains two commits that need
to be changed.
|
|
An editor shows up again where you replace pick
with r
(short for reword
)
on the commits you want to adjust the message.
|
|
After you exit this window, for each selected commit a new editor window opens
where you can perform your changes. Then perform a force-push
as in the other
examples.
Conclusion
I have seen so many enthusiastic newcomers getting stuck or frustrated in an open source project due to tooling or technical terms. Not too seldom ending up dropping the ball on a contribution. This has nothing to do with how smart you are or that these tools and platforms are only for “real developers”.
I hope the real-world examples, my very opinionated best practices and
references below help you to easily navigate git
and it’s huge ecosystem, e.g.
the Github platform.
I want YOU to shine as a leading example and inspire even more people to contribute their ideas, documentation fixes or other forms of improvements to the world of open source. No matter how “big” your contribution is, every useful commit counts!
And now, git
your feet wet π
References
Further Reading
- How to Write a Git Commit Message
- Interactive Git Cheatsheet
- Official Github Guides
- The Pro Git Book (free)
- Mastering Git Tutorials
- The legendary Oh Shit, Git!?!
- On undoing, fixing, or removing commits in git
- Step by step Tutorial Contributing to the VEBA Project
Credits
Thanks to Robert Guske for your thorough review! Title photo by Roman Synkevych on Unsplash