Pb with regexp

Recommend this page to a friend!

Pb with regexp

Subject:	Pb with regexp
Summary:	impossible to minimize a regexp array
Messages:	3
Author:	Gilbert BENOIT
Date:	2016-08-15 21:17:23

1. Pb with regexp

Report abuse

Gilbert BENOIT - 2016-08-15 21:17:23

I didn't investigate much, but it is impossible to minimize the following var:

var Regexp = {
'ALP' : /^[\w�-�\s-]+$/i,
'ALN' : /^[\w�-�0-9\s]+$/i,
'ALX' : /^[a-z�-�0-9\s�\+�\^&~"#'{$\[\-|`_\\�@$\]=\}\)\}$��%�\*<>,?;\.:/!�]+$/i,
'TXT' : /^[a-z�-�0-9\s�\+�\^&~"#'{$\[\-|`_\\�@$\]=\}\)\}$��%�\*<>,?;\.:/!�]+$/i,
'NUM' : /^[0-9\s-.]+$/,
'INT' : /^\-?[0-9]+$/,
'DEC' : /^\-?[0-9]*[\.,]?[0-9]+$/,
'NAT' : /^[0-9]+$/i,
'NNZ' : /^[1-9][0-9]*$/i,
'MON' : /^\-?[0-9]*[\.,]?[0-9]+$/,
'DAT' : /([0-9]{2,4})(\-|\/)([0-9]{2})(\-|\/)([0-9]{2,4})/,
'HER' : /^(([0-1]?[0-9])|([2][0-3]))\:[0-5][0-9](\:?[0-5][0-9])?$/,
'DTH' : /([0-9]{2,4})(\-|\/)([0-9]{2})(\-|\/)([0-9]{2,4})\s+(([0-1]?[0-9])|([2][0-3]))\:[0-5][0-9](\:?[0-5][0-9])?$/,
'DUR' : /^$/,
'REC' : /^$/,
'PRC' : /^\-?[0-9]*\.?[0-9]+$/,
'TEL' : /^[\d\s\+\.-]*$/,
'MEL' : /^[\w�-�0-9._%+-]+@[\w�-�0-9.-]+\.[a-z]{2,}$/i,
'URL' : /^(https{0,1}:\/\/){0,1}[\w�-�0-9._%+-]+\.[a-z]{2,}$/i,
'LNK' : /^[a-z�-�0-9\s�\+�\^&~"#'{$\[\-|`_\\�@$\]=\}\)\}$��%�\*<>,?;\.:/!�]+$/i,
'PWD' : /^(?=.*[a-z]+)(?=.*)(?=.*[0-9]+)([^\s]{8,})$/i,
'PAY' : /^[\w�-�\s-]+$/i,
'CPO' : /^[0-9]+$/,
'GRV' : /^$/,
'GRH' : /^$/,
'GRL' : /^$/,
'GRP' : /^$/,
'IMG' : /^$/,
'AUD' : /^$/,
'VDO' : /^$/,
'BTM' : /^$/,
'ACT' : /^$/,
'ALT' : /^$/
};

2. Re: Pb with regexp

Report abuse

Christian Vigh - 2016-08-15 21:55:38 - In reply to message 1 from Gilbert BENOIT

Ok, don't investigate too much anyway : your sample unveiled at least two bugs in my class :
- A minor one : the 'ALP', 'TXT', 'NUM' etc. entries should be on the same line. There is still a little inconsistency in my way of parsing
- A more troubling one : the 'LNK' entry, which seems to make my class crazy.

I need to investigate a little bit further the second issue, which is related to plain UTF8 characters in the code, an issue I did not foresee.

I will come back to you when I'll have something ready for you.

3. Re: Pb with regexp

Report abuse

Christian Vigh - 2016-09-01 09:09:15 - In reply to message 1 from Gilbert BENOIT

I have made an update which should solve your issue now (the output looks much better).

The problem came from the following : the JavascriptMinifier class does not implement a real parser, otherwise too much time would be spent in the parsing process.

As a compromise, it "recognizes" regular expressions based on the character token immediately before it.

For example, you can have constructs such as :

var re = /some regex/ ;

or :

if ( /some regex/ ) ...

But it did not handle correctly regexes used in object definition such as in your example code :

{ 'ALP' : /some regex/ ... }

because the ':' character was not in the list of character tokens that can appear before a regex. I just added it to the list of authorized characters before the start of a regular expression, and it solved the issue.

Strangely, however, I still cannot understand why this problem caused some bytes of multibyte Unicode characters to be eaten up ; for example, the "�" character, encoded as \uC3A0, was rendered as 0xC3 (the 0xA0 byte disappeared from the output).

Anyway, please feel free to let me know if you have further issues.

About us

Advertise on this site

For more information send a message to info at phpclasses dot org.