A blog about software by Alexander Yaäkov Garber.

Lithuanian gives you good practice with regex

Published on 07 Apr 2017 by Alexander Garber

I'm parsing Lithuanian verbs, and in tidying up the raw HTML, I'm compelled to come up with little beauties like this:

<a href="\/[a-z]*\/[a-z]*\W*[a-z]*\W*[a-z]*">

...in order to to clear away the HTML formatting.

The problem lies in the fact that Lithuanian diacritics (č, ž, ė, ę and so on) are not recognised as letters [a-z] but rather as non-word characters \W.

Lithuanian gives you good practice with regex

related posts

ERP Simulation Diagram

Markdown Diagrams

Widgets, Gadgets, Gizmos: Factory Simulation Overview

all tags