Using the pattern '\b[_A-ZÄÖÜ][A-ZÄÖÜ][_a-zäöüßA-ZÄÖÜ\-]*\b' with Lexer::addSpecialPattern, I would expect to get matches for words like this:
ALBERT EINSTEIN
_NABU
JOHAN_de JONG
ÄUSSERUNG
The first 3 lines are matched, as expected. For the 4th line, I only get a match with
USSERUNG - Ä is handled a a separate word with word border. Maybe, that this is a problem with PHP 5.3.
Really strange are matches like this:
Bücher
Bütte
Häuser
while words like
Geäußert
Grasbüschel
dont't match. The rule is: [äöüÄÖÜß] at position 2 cause false matches.
I use PHP 5.3 and my fileencoding is utf-8.
What is the reason?