How to preg_match a pattern having linebreaks?

Discussion in 'App Development' started by deanstreet, Aug 31, 2021.

  1. Can you please help on matching a pattern having potential linebreaks.

    On [PHPRegexLive](https://www.phpliveregex.com/), I use the regex pattern = {{\s*IF(.+)}}(.+){{\s*ENDIF}} on search string:

    before if....{{IF !empty('')}} <div class='h6 mt-4 mb-2 edit-btn-container'>About</div> {{ENDIF}} after if....

    The result is fine, array[0] = entire {{IF <condition>}}...{{ENDIF}} string, array[1] = <condition>, and array[2] = whatever between {{IF <con>}} and {{ENDIF}}.

    The problem is when the entire {{IF <con>}}...{{ENDIF}} spans more than one line, such as


    before if....{{IF !empty('')}}
    <div class='h6 mt-4 mb-2 edit-btn-container'>About</div>
    {{ENDIF}} after if....

    I tried different combinations of \n*, \n*\r*, etc, and s, m modifier but cannot get it to work.
     
  2. DaveV

    DaveV

    Look into the preg_match ending modifiers 's' (Single line) and 'm' (Multiline)

    Also, instead of: IF(.+)}}
    I would try: IF(.*?)}
    the reason is the the first way the ending }} could match the }} that follows ENDIF}}
    .*? means a non-greedy match; i.e. it will stop at the first }} match
     
  3. trudnai

    trudnai

    Btw try to avoid using "unlimited wildcars" like * and +. Use {1,n} instead of + and {0,n} instead of * where n is the number of max characters you expect to a positive match. The reason is unlimited search can lead very slow regex matches in case of big input data and where end results is not guaranteed.

    Also instead of . You can use a definite set of characters or a stopper. It seems like you never want to go further than { and/or } character so you can write [^}] and [^{] instead of the dot.

    And finally it is better to escape { and } as normally they used for ranges (see above).

    So the final regex might look a bit more obfuscated than yours but much safer and faster to use:

    Code:
    \{\{\s*IF([^}]{1,100})\}\}([^{]{1,100})\{\{\s{0,100}ENDIF\}\}
     
    DaveV likes this.
  4. ph1l

    ph1l

    Since PHP has nested IF statements, a single regular expression that searches for an IF followed by an ENDIF would not be able to match arbitrary nested IFs properly.

    https://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns
     
  5. You might wanna look into compiler theory with lex and yacc. You have a lexer that identifies individual grammar tokens, in your case IF, ENDIF, "{" ... and then uses a syntax specification to parse them into something meaningful (an abstract syntax tree in compiler / interpreter case). Not sure what you're trying to achieve but parsing grammars is not trivial, prepare for pain :p