TL;DR: Underscore is a matching character in LIKE queries to match a single character.
I’m using Postgresql for one of my projects to store some python package
metadata. Canonical python packages must be all lowercase and should not use
underscores (_
) but instead should use hyphens (-
) as delimiters. It is
just a recommendation and not enforced by PyPI or pip.
I wanted to check how many packages in my database were using underscores.
SELECT count(*) FROM packages WHERE filename LIKE "nflx_%"
Little did I know that an underscore in a LIKE query is treated as a
placeholder to match exactly one character in Postgresql. So this query matched
every package that had nflx_
or nflx-
as a prefix in the name. 🤦
It took me a good 15 minutes of pulling my hair before I searched for it online.
Of course, Postgresql being an exemplary project had this well-documented.
Update1: Turns out this is not just a Postgresql quirk but it is a feature of SQL, so SQLite, MySQL etc all behave this way.
Update2: You can escape the underscore using a back-slash.
SELECT count(*) FROM packages WHERE filename LIKE "nflx\_%"