We recently started preparing to open source trackr - our Java-based application which we’ve written about before. It wasn’t made with that in mind so there are some files in the commit history that shouldn’t be exposed to the public.

Until we introduced OAuth, the frontend static files were delivered from within the WAR file. We changed that with OAuth, but of course the files are still in the old commits. Since we are using a commercial template we cannot keep these files. So one objective was to remove the whole frontend from the commit history.

Some files contained passwords or URLs to our testing system which we don’t want to be published, either. That means editing these files in the commit history.

Important: What I am about to describe will alter your history and should only be used under very certain circumstances. You will not be able to push without --force after altering history and you really should not do that. The Git book describes git filter-branch as the nuclear option.

In our case I created a new clone of our repository, edited it and then we pushed the edited content to a new repository.

I want to edit a file or remove a folder in my entire git history in all branches.

How?

Since the only two tools I knew to alter history in git were git commit --amend which certainly would not help here and git rebase --interactive I started with rebasing. I searched for the commits I needed to edit with git log -Gword and marked them as edit in an interactive rebase. This worked - but two things about that approach.

  1. It’s cumbersome and takes a long time. I had to edit multiple conflicts when continuing the rebase.
  2. It’s not working very well when you have multiple branches that are merged at some time. We have a master and a development branch and I only rebased the development branch. But that destroyed our merge history. Just rebasing master would not work either.

git filter-branch

So I started googling around. You will find git filter-branch relatively fast and there are actually a lot of good descriptions on how to use it. What git filter-branch did for me was the following: You can execute a shell command on the working directory for every commit in every branch. If you change something the commit will be altered. There are other options to change the message, the author, files in the index.

How does it look?

git filter-branch --tree-filter 'some command' -- --all

Another great thing about git filter-branch is that you won’t get merge conflicts.

Removing Passwords

To remove passwords from a specific file, e.g. src/main/resources/application.properties I tried to use sed.

git filter-branch --tree-filter 'sed -i "" s/password//g src/main/resources/application.properties' -- --all

Problem: what if the file does not exist in a commit (which happened in our example)? So we got a sed: src/main/resources/application.properties: No such file or directory and the filter-branch failed.

But since we’re executing a shell script, why not include a check for the file?

git filter-branch --tree-filter 'if [ -f src/main/resources/application.properties ]; then sed -i "" s/password//g src/main/resources/application.properties; fi' -- --all

And that will replace password with nothing in application.properties in all commits. If in the commit the file wasn’t present the commit won’t be altered.

Removing a Folder

Additionally I wanted to remove the folder that contained the frontend files. This again is very easy with git filter-branch!

git filter-branch --tree-filter 'rm -r src/main/webapp/WEB-INF/app' -- --all

Addendum

While I consider myself pretty versatile with git this was the first time I really used git filter-branch. So found an error? Leave a comment. Did I do something stupid? Leave a comment! Not sure if you want to use it on your own repository? Copy the repository and try it out first!