P. COVES | Remove consecutive blank lines with sed before commiting code

I work as an SRE. And part of my job is to write terraform Infrastructure as Code to maintain and develop the company’s cloud offers.

My colleagues and I were planing some code refactoring. And we wanted to somewhat raise the code-quality level. Part of this was to enforce some formatting rules. And, while terraform has the fmt command which can recursively reformat code, it fails at fixing the no consecutive blank lines rule we wanted to enforce.

Now, we’re all grown-ups here and we know this is not a task to be achieved by hand. Not only it’s tedious but also error-prone. It has to be automated so that no git commit ever contains such a thing, ever.

The `sed` code

GNU sed or more commonly sed is the tool of choice for substituting parts of files. And, yes, it can achieve this task. Here is how:

sed '/^\s*$/N;/^\s*\n$/D'

OK, this works. But to be honest, it’s the more involved sed command I ever ran and I needed to understand what’s under the hood.

Split

Usually, I’m happy with sed 's/foo/bar/g'. It searches for foo, replaces every match by bar, simple, done.

But here, there are two chained sed commands in a single call: /^\s*$/N and /^\s*\n$/D.

The first command searches for ^\s*$ and uses N on matches. Here ^\s*$ means every lines containing zero or more space/tab (blank characters). And ^\s*\n$ is pretty much the same thing with an extra new line at the end with D used on matches.

OK, I know what sed is looking for, but what are those N and D capital letters ? Let’s have a look at the documentation!

Build the pattern space

The following code block comes from the sed man page:

Commands which accept address ranges
    [...]
    d      Delete pattern space.  Start next cycle.
    [...]
    D      If pattern space contains no newline, start a normal new cycle as if the d command was issued.  Otherwise, delete text
            in the pattern space up to the first newline, and restart cycle with the resultant pattern space,  without  reading  a
            new line of input.
    [...]
    n N    Read/append the next line of input into the pattern space.
    [...]

What is this pattern space both D and N speak about? After a bit of search-engine-fu, here is what I found:

Sed reads files line by line. And every line is inserted in the pattern space to be handled.
N appends the next line to the said pattern space.

Great! So, /^\s*$/N basically runs through the file and kind-of zip every empty line with whatever comes next inside the pattern space before passing it to the next par of the command.

Edit the pattern space

And the pattern space is passed to ^\s*\n$/D.

As \n is present, it matches on empty lines with a following new line. And D deletes content up to the said \n. In other words, it deletes the empty line and starts again.

I find it funny because, if I get this right, the whole command does not delete every empty line after an empty line. On the contrary, it deletes every empty line before an empty line. And it works just as well 🎉

The `git` code

The end goal was to automate this for every one working on a given git-versioned code-base. And there is a tool for such a task, namely git hooks. You can fin hooks examples in every .git/hooks directory of every git repository.

Hooks are handy. They’re run by git at various steps of a command execution. For example, there is this pre-commit hook for exactly the current use case: run a command before committing. But there is a catch: they’re user dependant, meaning they’re not themself committed and shared. So every developer has to add this git hook every time they clone a repository.

We’re speaking about git here. The .git directory is not shared but other might. So, instead of writing the pre-commit script in .git/hooks, I suggest to place it in .githooks and add the directory’s content to the repository.

#!/usr/bin/env sh

# Format terraform code
if command -v terraform &> /dev/null
then
    terraform fmt -recursive
fi

# Remove consecutive blank lines in terraform files
git ls-files "*.tf" | xargs sed -i '/^\s*$/N;/^\s*\n$/D'

Done, everybody now has access to the hooks. All it takes now it to ask git to look for them:

git config core.hooksPath .githooks

Be aware that git only runs executable hooks. So, on clone, you’ll have to chmod +x the hooks you want to run and tell git to look for the .githooks directory. Still much better to copy/paste from outdated documentation or hand-craft your own hooks.

Conclusion

I’d say job done here.

Sed is a complex beast but a bit of reading makes is crystal clear. And git has everything needed for the automation part.

The whole pre-commit hook may not be perfect but it works flawlessly for now. I’m unsure whether it works well on non-GNU sed (like on mac OS) or not. But installing the correct tool for the job is another story.

As always, if you find room for improvement on this process of simply want to say hi, please do so on Mastodon 👋

The sed code