PHP Classes
elePHPant
Icontem

Automatic PHP Regular Expression Building Part 1: Introduction to the PHP Regex Advanced Package - PHP Regex Advanced package blog

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  All package blogs All package blogs   PHP Regex Advanced PHP Regex Advanced   Blog PHP Regex Advanced package blog   RSS 1.0 feed RSS 2.0 feed   Blog Automatic PHP Regular...  
  Post a comment Post a comment   See comments See comments (0)   Trackbacks (0)  

Author: Christian Vigh

Posted on:

Package: PHP Regex Advanced

Regular expressions are very useful and powerful, but for many PHP developers they are also hard to understand and hard to build to match text that applications need to process.

The PHP Regex Advanced package was created to avoid the pain of dealing with regular expressions. It uses meta-matching for building regular expressions automatically for you from samples of text that your PHP applications need to match.

Read this article to learn more about the PHP Advanced Regex package works with a real problem example so you can see how you can start using it in your PHP projects.




Contents

Introduction

The Beginning Problem

The Concept of Meta Regular Expression Matching

The PHP Advanced Regex package

Conclusion

Email Regex from https://commons.wikimedia.org/wiki/File:Email-regex.svg  Creative Commons Attribution-Share Alike 2.5 Generic

Introduction

Some time ago I had the need to go farther than what PHP can offer regarding regular expressions. I wanted to analyze log file contents, each line being matched by a specific regular expression. I wanted that a single regular expression could be used to match a sequence of lines in a logfile using some kind of meta-matching.

This article explains the basic knowledge about regular expressions that was needed to achieve that goal. It describes the intermediate methods that I built as steps towards meta-matching.

Then it introduces this so-called "meta-matching" feature that allows you to describe text flows as sequences of meta-regular expressions involving themselves their own sets of regular expressions provided to match particular lines of text.

The Beginning Problem

Let's suppose you have to scan a sequence of lines, such as in a log file. You want to recognize which sequence follows which pattern. A typical sequence in an example log file could be, for example:
  • A line containing the string "message start"
  • Any number of lines starting with "log:" and followed by any sequence of characters
  • A line containing "message end"
The following example gives a layout of how such a log file could look like:x
message start
log: message 1
log: message 2
...
log: message n
message end
The idea behind that is to say:
  • Ok, I want to match the string "message start", and every string that starts with "message" and ends up with a number ; and I also want to match strings that contain "message end"
  • But... wait... these sequences of strings can also be described by a regular expression ! If I could be able to say :
    • I'm expecting the string "message start" as the first line
    • Then, any number of lines containing the word "log" followed a number
    • And finally, a line containing "message end"
Wouldn't it be better if it could be described using a regular expression ?

The Concept of Meta Regular Expression Matching

Let's say that the sequence \1 references a regular expression that matches the string "message start", \2 is a reference to an expression that matches the string "log:" followed by a number, and \3 a reference to a regular expression that matches the string "message end".

Matching our example input stream with a single meta regular expression could be written as:

\1 \2* \3

This means : a line containing "message start", followed by any number of lines containing the word "log" followed by an integer, and ending with a line containing "message end".

You're done! You just learned how to do meta regular expression matching. Although you do not yet know how to tell that "\1" is intended to match the string "message start", "\2" the string "log" followed by an integer, and so on, you have moved a step forward in the direction of meta-thinking (the term is humbly mine), which is an activity of choice for so many mathematicians and developers.

The PHP Advanced Regex package

The PHP Advanced Regex package is here to relieve you from the burden of this meta-something stuff.

If you feel uncomfortable with basic regular expressions, may I recommend you to read the next part of this article which is basically a survival guide to regular expressions?

Although it is not intended to provide a thorough coverage on how to be a regular expression superstar, it explains some basic concepts, introduces some habits I have taken in PHP when writing regular expressions and that saved me a lot of time.

It also contains some stuff to explain how to get proper match results from applying a regular expression to input strings. The last two items belong more to common sense rather than to rigid coding rules. 

Conclusion

In this part of the article you learned how to think about meta regular expression matching to match complex text sequences.

In the next part you will be guided you through basic rules for using regular expressions. It will detail the intermediary utility methods that have been implemented in the PHP dvanced Regex package to assist you into meta-regular expression matching.

The third and last part which will show you how to cope with meta-regular expression matching.

For now, if you liked this article or you have some questions, post a comment here.


You need to be a registered user or login to post a comment

Login Immediately with your account on:

FacebookGmail
HotmailStackOverflow
GitHubYahoo


Comments:

No comments were submitted yet.




  Post a comment Post a comment   See comments See comments (0)   Trackbacks (0)  
  All package blogs All package blogs   PHP Regex Advanced PHP Regex Advanced   Blog PHP Regex Advanced package blog   RSS 1.0 feed RSS 2.0 feed   Blog Automatic PHP Regular...