Remove consecutive blank lines with sed
before commiting code
I work as an SRE. And part of my job is to write terraform
Infrastructure as Code to maintain and develop the company’s cloud offers.
My colleagues and I were planing some code refactoring. And we wanted to somewhat raise the code-quality level. Part of this was to enforce some formatting rules. And, while terraform
has the fmt
command which can recursively reformat code, it fails at fixing the no consecutive blank lines rule we wanted to enforce.
Now, we’re all grown-ups here and we know this is not a task to be achieved by hand. Not only it’s tedious but also error-prone. It has to be automated so that no git
commit ever contains such a thing, ever.
The sed
code
GNU sed
or more commonly sed
is the tool of choice for substituting parts of files. And, yes, it can achieve this task. Here is how:
sed '/^\s*$/N;/^\s*\n$/D'
OK, this works. But to be honest, it’s the more involved sed
command I ever ran and I needed to understand what’s under the hood.
Split
Usually, I’m happy with sed 's/foo/bar/g'
. It searches for foo
, replaces every match by bar
, simple, done.
But here, there are two chained sed
commands in a single call: /^\s*$/N
and /^\s*\n$/D
.
The first command searches for ^\s*$
and uses N
on matches. Here ^\s*$
means every lines containing zero or more space/tab (blank characters). And ^\s*\n$
is pretty much the same thing with an extra new line at the end with D
used on matches.
OK, I know what sed
is looking for, but what are those N
and D
capital letters ? Let’s have a look at the documentation!
Build the pattern space
The following code block comes from the sed
man page:
Commands which accept address ranges
[...]
d Delete pattern space. Start next cycle.
[...]
D If pattern space contains no newline, start a normal new cycle as if the d command was issued. Otherwise, delete text
in the pattern space up to the first newline, and restart cycle with the resultant pattern space, without reading a
new line of input.
[...]
n N Read/append the next line of input into the pattern space.
[...]
What is this pattern space both D
and N
speak about? After a bit of search-engine-fu, here is what I found:
Sed
reads files line by line. And every line is inserted in the pattern space to be handled.N
appends the next line to the said pattern space.
Great! So, /^\s*$/N
basically runs through the file and kind-of zip every empty line with whatever comes next inside the pattern space before passing it to the next par of the command.
Edit the pattern space
And the pattern space is passed to ^\s*\n$/D
.
As \n
is present, it matches on empty lines with a following new line. And D
deletes content up to the said \n
. In other words, it deletes the empty line and starts again.
I find it funny because, if I get this right, the whole command does not delete every empty line after an empty line. On the contrary, it deletes every empty line before an empty line. And it works just as well 🎉
The git
code
The end goal was to automate this for every one working on a given git
-versioned code-base. And there is a tool for such a task, namely git
hooks. You can fin hooks examples in every .git/hooks
directory of every git
repository.
Hooks are handy. They’re run by git
at various steps of a command execution. For example, there is this pre-commit
hook for exactly the current use case: run a command before committing. But there is a catch: they’re user dependant, meaning they’re not themself committed and shared. So every developer has to add this git
hook every time they clone a repository.
Share the hook
We’re speaking about git
here. The .git
directory is not shared but other might. So, instead of writing the pre-commit
script in .git/hooks
, I suggest to place it in .githooks
and add the directory’s content to the repository.
#!/usr/bin/env sh
# Format terraform code
if command -v terraform &> /dev/null
then
terraform fmt -recursive
fi
# Remove consecutive blank lines in terraform files
git ls-files "*.tf" | xargs sed -i '/^\s*$/N;/^\s*\n$/D'
Done, everybody now has access to the hooks. All it takes now it to ask git
to look for them:
git config core.hooksPath .githooks
Be aware that git
only runs executable hooks. So, on clone, you’ll have to chmod +x
the hooks you want to run and tell git
to look for the .githooks
directory. Still much better to copy/paste from outdated documentation or hand-craft your own hooks.
Conclusion
I’d say job done here.
Sed
is a complex beast but a bit of reading makes is crystal clear. And git
has everything needed for the automation part.
The whole pre-commit
hook may not be perfect but it works flawlessly for now. I’m unsure whether it works well on non-GNU sed
(like on mac OS) or not. But installing the correct tool for the job is another story.
As always, if you find room for improvement on this process of simply want to say hi, please do so on Mastodon 👋