PHP Classes

Pb with regexp

Recommend this page to a friend!

      PHP Minify JS, CSS, PHP and HTML  >  All threads  >  Pb with regexp  >  (Un) Subscribe thread alerts  
Subject:Pb with regexp
Summary:impossible to minimize a regexp array
Messages:3
Author:Gilbert BENOIT
Date:2016-08-15 21:17:23
 

  1. Pb with regexp   Reply   Report abuse  
Picture of Gilbert BENOIT Gilbert BENOIT - 2016-08-15 21:17:23
I didn't investigate much, but it is impossible to minimize the following var:

var Regexp = {
'ALP' : /^[\wà-ÿ\s-]+$/i,
'ALN' : /^[\wà-ÿ0-9\s]+$/i,
'ALX' : /^[a-zà-ÿ0-9\s°\+²\^&~"#'{\(\[\-|`_\\ç@\)\]=\}\)\}$£¤%µ\*<>,?;\.:/!§]+$/i,
'TXT' : /^[a-zà-ÿ0-9\s°\+²\^&~"#'{\(\[\-|`_\\ç@\)\]=\}\)\}$£¤%µ\*<>,?;\.:/!§]+$/i,
'NUM' : /^[0-9\s-.]+$/,
'INT' : /^\-?[0-9]+$/,
'DEC' : /^\-?[0-9]*[\.,]?[0-9]+$/,
'NAT' : /^[0-9]+$/i,
'NNZ' : /^[1-9][0-9]*$/i,
'MON' : /^\-?[0-9]*[\.,]?[0-9]+$/,
'DAT' : /([0-9]{2,4})(\-|\/)([0-9]{2})(\-|\/)([0-9]{2,4})/,
'HER' : /^(([0-1]?[0-9])|([2][0-3]))\:[0-5][0-9](\:?[0-5][0-9])?$/,
'DTH' : /([0-9]{2,4})(\-|\/)([0-9]{2})(\-|\/)([0-9]{2,4})\s+(([0-1]?[0-9])|([2][0-3]))\:[0-5][0-9](\:?[0-5][0-9])?$/,
'DUR' : /^$/,
'REC' : /^$/,
'PRC' : /^\-?[0-9]*\.?[0-9]+$/,
'TEL' : /^[\d\s\+\.-]*$/,
'MEL' : /^[\wà-ÿ0-9._%+-]+@[\wà-ÿ0-9.-]+\.[a-z]{2,}$/i,
'URL' : /^(https{0,1}:\/\/){0,1}[\wà-ÿ0-9._%+-]+\.[a-z]{2,}$/i,
'LNK' : /^[a-zà-ÿ0-9\s°\+²\^&~"#'{\(\[\-|`_\\ç@\)\]=\}\)\}$£¤%µ\*<>,?;\.:/!§]+$/i,
'PWD' : /^(?=.*[a-z]+)(?=.*)(?=.*[0-9]+)([^\s]{8,})$/i,
'PAY' : /^[\wà-ÿ\s-]+$/i,
'CPO' : /^[0-9]+$/,
'GRV' : /^$/,
'GRH' : /^$/,
'GRL' : /^$/,
'GRP' : /^$/,
'IMG' : /^$/,
'AUD' : /^$/,
'VDO' : /^$/,
'BTM' : /^$/,
'ACT' : /^$/,
'ALT' : /^$/
};

  2. Re: Pb with regexp   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-08-15 21:55:38 - In reply to message 1 from Gilbert BENOIT
Ok, don't investigate too much anyway : your sample unveiled at least two bugs in my class :
- A minor one : the 'ALP', 'TXT', 'NUM' etc. entries should be on the same line. There is still a little inconsistency in my way of parsing
- A more troubling one : the 'LNK' entry, which seems to make my class crazy.

I need to investigate a little bit further the second issue, which is related to plain UTF8 characters in the code, an issue I did not foresee.

I will come back to you when I'll have something ready for you.

  3. Re: Pb with regexp   Reply   Report abuse  
Picture of Christian Vigh Christian Vigh - 2016-09-01 09:09:15 - In reply to message 1 from Gilbert BENOIT
I have made an update which should solve your issue now (the output looks much better).

The problem came from the following : the JavascriptMinifier class does not implement a real parser, otherwise too much time would be spent in the parsing process.

As a compromise, it "recognizes" regular expressions based on the character token immediately before it.

For example, you can have constructs such as :

var re = /some regex/ ;

or :

if ( /some regex/ ) ...

But it did not handle correctly regexes used in object definition such as in your example code :

{ 'ALP' : /some regex/ ... }

because the ':' character was not in the list of character tokens that can appear before a regex. I just added it to the list of authorized characters before the start of a regular expression, and it solved the issue.

Strangely, however, I still cannot understand why this problem caused some bytes of multibyte Unicode characters to be eaten up ; for example, the "à" character, encoded as \uC3A0, was rendered as 0xC3 (the 0xA0 byte disappeared from the output).

Anyway, please feel free to let me know if you have further issues.