TL;DR: Underscore is a matching character in LIKE queries to match a single character.
I’m using Postgresql for one of my projects to store some python package
metadata. Canonical python packages must be all lowercase and should not use
underscores (_) but instead should use hyphens (-) as delimiters. It is
just a recommendation and not enforced by PyPI or pip.
I wanted to check how many packages in my database were using underscores.
SELECT count(*) FROM packages WHERE filename LIKE "nflx_%"
Little did I know that an underscore in a LIKE query is treated as a
placeholder to match exactly one character in Postgresql. So this query matched
every package that had nflx_ or nflx- as a prefix in the name. 🤦
It took me a good 15 minutes of pulling my hair before I searched for it online.
Of course, Postgresql being an exemplary project had this well-documented.
Update1: Turns out this is not just a Postgresql quirk but it is a feature of SQL, so SQLite, MySQL etc all behave this way.
Update2: You can escape the underscore using a back-slash.
SELECT count(*) FROM packages WHERE filename LIKE "nflx\_%"