Lithuanian gives you good practice with regex
Published on 07 Apr 2017
by Alexander Garber
I'm parsing Lithuanian verbs, and in tidying up the raw HTML, I'm compelled to come up with little beauties like this:
<a href="\/[a-z]*\/[a-z]*\W*[a-z]*\W*[a-z]*">
...in order to to clear away the HTML formatting.
The problem lies in the fact that Lithuanian diacritics (č, ž, ė, ę and so on) are not recognised as letters [a-z] but rather as non-word characters \W.
<a href="\/[a-z]*\/[a-z]*\W*[a-z]*\W*[a-z]*">
...in order to to clear away the HTML formatting.
The problem lies in the fact that Lithuanian diacritics (č, ž, ė, ę and so on) are not recognised as letters [a-z] but rather as non-word characters \W.
all tags
activerecord android annoyances api apt arch array artix atom az3w backend bash blog browser bug callback career cli cloud code coding config configuration cp crud css database db design devops django email erp filter fugitive gif gist git gnome grep hebrew http ide isbn-fetcher iso javascript job search js kanban kanban\ kindle koans linux logger manjaro map markdown microservices mobi mtp neovim nodejs packages pastbin patch post python rails reduce refactoring rest routes rspec ruby scripting security sed shell sql string_replacement study tdd terminal testing version_control vim walkthrough workflow