Verbalex - Regex with the reader and writer in mind
TLDR; #
I created an Elixir library for verbally expressing and composing regular expressions. This is only a short post demonstrating what it can do, but if you’re not in the mood for reading you can find it here!
Why did I write this library? #
For a few years now I’ve maintained a chatbot project which, to my surprise, has actually gained a decent number of users. It was my first time writing Ruby, which is partly why the repo is private - it’s best for everybody’s wellbeing.
It’s a small project, so when coming back to it to address a bug or add a feature it doesn’t take long to regain my bearings. However, servicing the regular expressions it relies on has always been a headache.
Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.
Regular expressions are notoriously write once, read once. Especially the ones I hacked together and hotfixed a hundred times over early in my programming career.
This problem has been largely addressed with solutions like Simple Regex & Verbal Expressions, so while exploring a re-write of Rosterbot I went searching for the Elixir implementation, but I found it hadn’t been maintained since Elixir v0.10.1. I saw an opportunity, so I took it - thanks to Max Szengel for laying down the groundwork!
How to use Verbalex #
Verbalex is essentially a port of Verbal Expressions, but I decided not to implement it function-for-function.
It focuses on composing the regular expressions themselves, and leaves Regex do the heavy lifting for utilising them.
Let’s see how it fares on a classic example.
A Regular Expression for Emails #
Matching email addresses is a pretty common regex task, and one you’ll find a number of implementations for… let’s go ahead and add to the pile. We’ll interpret this example from regular-expressions.info:
~r/\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/
This regular expression, I claim, matches any email address.
Good enough for me. Let’s do it.
First up, we’ll put in the word boundaries bookending the expression:
alias Verbalex, as: Vlx
def email_regex do
""
|> Vlx.word_boundary()
# loading...
|> Vlx.word_boundary()
end
Easy enough! Now, we could do the rest of our expression in this function, but to demonstrate Verbalex using composition let’s break it down into sections. An email consists of two main parts:
- Local-part
- Domain
We’ll define them both as private functions for email_expr/0
:
defp local_part(before) do
local =
""
|> Vlx.anything_in(class: :alnum, string: "._%+-")
|> Vlx.one_or_more()
"#{before}#{local}"
end
defp domain(before) do
domain =
""
|> Vlx.anything_in(class: :alnum, string: ".-")
|> Vlx.one_or_more()
|> Vlx.then(".")
|> Vlx.anything_in(class: :alpha)
|> Vlx.occurs_at_least(2)
"#{before}#{domain}"
end
In order to include these functions in our email address pipeline, we accomodate the regex strings coming before
our functions are called and concatenate them accordingly. This is much the same way Verbalex is implemented under the hood. Also worth noting, you can see in my calls to anything_in/2
that I’ve included support for and documented all the named character classes that Elixir’s Regex module provides.
Hopefully at this point, even without a background using regular expressions, the readability in writing them this way allows you to follow what’s going on with relative ease.
With those in place, we can finish off our main function:
def email_regex do
""
|> Vlx.word_boundary()
|> local_part()
|> Vlx.then("@")
|> domain()
|> Vlx.word_boundary()
|> Regex.compile!
end
email_regex()
# ~r/\x08[[:alnum:]._%+-]+(?:@)[[:alnum:].-]+(?:\.)[[:alpha:]]{2,}\x08/
I find the implementation of email_regex/0
far easier to reason about than its output. If it’s not your cup of tea - that’s fine, too. When properly understood standard regex can be read like any other syntax while coming with the benefit of being incredibly terse. For myself at least, it’s a breath of fresh air.
To wrap up, it’s worth pointing out that Verbalex is the first library I’ve ever written. I welcome all constructive feedback, issues, and pull requests from anybody who might like to contribute - it’s an exciting time to be in the Elixir community with so many tools still to be built. Thanks for reading!