Not logged in. · Lost password · Register
Forum: General Help and Support Development RSS
incorrect output with new language in geshi
Avatar
theoveenker #1
Member since Sep 2017 · 9 posts
Group memberships: Members
Show profile · Link to this post
Subject: incorrect output with new language in geshi
Hi all,

Just updated my wiki to the latest version and noticed the geshi stuff had moved, so it didn't see my language file for the Zep language (a program for doing psycholinguistic experiments https://www.beexy.nl/zep2 of which I'm the author). I moved my  language file to the new location and ran into some problems I got before with previous versions of dokuwiki (which I didn't report back then, sorry).

See my language file below. The problem I'm facing is that semicolons and vertical bars in code are shown as <SEMI> resp.
<PIPE> and some fragments are preceded by REG3XP0. I managed to fix my problem in geshi.php  by replacing all occurences of REG3XP by @reg3xp, <SEMI> by <@semi> and <PIPE> by <@pipe>. I'm not sure if I fixed it or just made problem go away.

Here you can see the result before the 'fix':
https://www.beexy.nl/zep2/wiki-prob/doku.…?id=notation#s…

Here you can see the result after the 'fix':
https://www.beexy.nl/zep2/wiki/doku.…?id=notation#source…

The problem seems to lie in the REGEXPS section of the language file, but I don't see what could possibly be wrong there. I hope someone with knowledge about geshi can see what the problem might be.

My geshi Language file zep.php (without header block):
$language_data = array (
    'LANG_NAME' => 'Zep',
    'COMMENT_SINGLE' => array(1 => '//'),
    'COMMENT_MULTI' => array('/*' => '*/'),
    'COMMENT_REGEXP' => array(
        //Multiline-continued single-line comments
        1 => '/\/\/(?:\\\\\\\\|\\\\\\n|.)*$/m',
        ),
    'CASE_KEYWORDS' => GESHI_CAPS_NO_CHANGE,
    'QUOTEMARKS' => array("'", '"'),
    'ESCAPE_CHAR' => '',
    'ESCAPE_REGEXP' => array(
        //Simple Single Char Escapes
        1 => "#\\\\[abfnrtv\\\'\"?\n]#i",
        //Hexadecimal Char Specs
        2 => "#\\\\x[\da-fA-F]{2}#",
        //Hexadecimal Char Specs
        3 => "#\\\\u[\da-fA-F]{4}#",
        //Hexadecimal Char Specs
        4 => "#\\\\U[\da-fA-F]{8}#"
        ),
    'NUMBERS' =>
        GESHI_NUMBER_INT_BASIC | GESHI_NUMBER_HEX_PREFIX |
        GESHI_NUMBER_FLT_NONSCI | GESHI_NUMBER_FLT_SCI_SHORT |
        GESHI_NUMBER_FLT_SCI_ZERO,
    'KEYWORDS' => array(
        1 => array(
            'break', 'continue', 'else', 'for', 'foreach', 'if', 'return',
            'switch', 'terminate', 'while'
            ),
        2 => array(
            'alias', 'cast', 'castable', 'const', 'enum', 'import',
            'metadata', 'module', 'plugin', 'program', 'record', 'requires',
            'on_event', 'weak'
            ),
        3 => array(
            'false', 'null', 'this', 'true'
            ),
        4 => array(
            'bool', 'char', 'color', 'date', 'dur', 'int', 'real', 'string',
            'time', 'void'
            ),
        ),
    'SYMBOLS' => array(
        0 => array('(', ')', '{', '}', '[', ']'),
        1 => array('<', '>', '='),
        2 => array('+', '-', '*', '/', '%'),
        3 => array('!', '^', '&', '|'),
        4 => array('?', ':', ';')
        ),
    'CASE_SENSITIVE' => array(
        GESHI_COMMENTS => false,
        1 => true,
        2 => true,
        3 => true,
        4 => true,
        ),
    'URLS' => array(
        1 => '',
        2 => '',
        3 => '',
        4 => ''
        ),
    'OOLANG' => true,
    'OBJECT_SPLITTERS' => array(
        1 => '.',
        2 => '::'
        ),
    'REGEXPS' => array(
        0 => array(
            GESHI_SEARCH => "([\p{L}_][\p{L}\p{N}_]*)(\s*\()",
            GESHI_REPLACE => '\\1',
            GESHI_MODIFIERS => '',
            GESHI_BEFORE => '',
            GESHI_AFTER => '('
            ),
        1 => array(
            GESHI_SEARCH => '(\b)((\p{Lu}[\p{L}\p{N}_]*)+)',
            GESHI_REPLACE => '\\2',
            GESHI_MODIFIERS => '',
            GESHI_BEFORE => '',
            GESHI_AFTER => ''
            ),
        2 => array(
            GESHI_SEARCH => '(\b)((\p{Lu}[\p{Ll}\p{N}]+)+)',
            GESHI_REPLACE => '\\2',
            GESHI_MODIFIERS => '',
            GESHI_BEFORE => '',
            GESHI_AFTER => ''
            )
        ),
    'STRICT_MODE_APPLIES' => GESHI_NEVER,
    'SCRIPT_DELIMITERS' => array(
        ),
    'HIGHLIGHT_STRICT_BLOCK' => array(
        ),
    'TAB_WIDTH' => 4,
);

Theo
Avatar
turnermm (Moderator) #2
Member since Oct 2009 · 4785 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
I took a bit of time to look at some of the geshi code.  The <SEMI> issue is reported in two other languages, erlander and ceylon. As for REG3XP0, I wonder if that is a result of an issue in your regexes--since you get REG3XP0, possibly something related to your zero array:
REGEXPS' => array(
        0 =

Because in geshi.php the final character of REG3XP is the 'key', which is probably the array number.  Anyway, just a thought.
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Avatar
theoveenker #3
Member since Sep 2017 · 9 posts
Group memberships: Members
Show profile · Link to this post
Thanks for looking into it.

I noticed for some tokens my regexps 2nd and 3rd rule would both match. I fixed that, but it didn't change anything.

But I think I found out what the problem is. The geshi process replaces ; and | into <SEMI> resp. <PIPE>. Something similar happens for REG3XP. The result (including these replacements) apparently gets passed to the regular expression checker, which then matches a rule in REGEXPS (my second rule for constants).

I think the regexp matcher should not see the SEMI, PIPE and REG3XP at all, or at least these three intermediate symbols should be given values that are unlikely to be matched by any REGEXPS rule (in my case the @ prefix and lowercasing it worked). I tested replacing SEMI, PIPE and REG3XP with new symbols composed entirely of private use characters and that works fine too. Renaming these symbols should likely also solve the problem with the other languages you mentioned.
Avatar
turnermm (Moderator) #4
Member since Oct 2009 · 4785 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
I guess you would have to get in touch with the geshi people with your suggestions, so that they wouldn't be continued in  upcoming versions.  There seems to be an attempt to removed these already in geshi.php.
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Avatar
theoveenker #5
Member since Sep 2017 · 9 posts
Group memberships: Members
Show profile · Link to this post
Thanks. I'll see if I can find anything on that topic. Do you happen to have a reference about the possible removal of <SEMI> etc?
Avatar
turnermm (Moderator) #6
Member since Oct 2009 · 4785 posts · Location: Canada
Group memberships: Global Moderators, Members, Super Mods
Show profile · Link to this post
At line 2156 of the current geshi.php there is a comment relating to this:
            //This fix is related to SF#1923020, but has to be applied regardless of
            //actually highlighting symbols.
            $result = str_replace(array('<SEMI>', '<PIPE>'), array(';', '|'), $result);

But that's as far as I got.
Myron Turner
github: https://github.com/turnermm
plugins, templates: http://www.mturner.org/devel
Close Smaller – Larger + Reply to this post:
Verification code: VeriCode Please enter the word from the image into the text field below. (Type the letters only, lower case is okay.)
Smileys: :-) ;-) :-D :-p :blush: :cool: :rolleyes: :huh: :-/ <_< :-( :'( :#: :scared: 8-( :nuts: :-O
Special characters:
Go to forum
Imprint
This board is powered by the Unclassified NewsBoard software, 20150713-dev, © 2003-2015 by Yves Goergen
Current time: 2020-02-26, 13:23:21 (UTC +01:00)