Recommend this page to a friend! |
Download .zip |
Info | View files (2) | Download .zip | Reputation | Support forum (1) | Blog | Links |
Last Updated | Ratings | Unique User Downloads | Download Rankings | |||||
2008-03-04 (8 years ago) | 36% | Total: 1,336 | All time: 2,802 This week: 1,016 |
Version | License | PHP version | Categories | |||
robots_txt 1.1 | GNU General Publi... | 5.0 | PHP 5, Searching |
Description | Author | ||||||||||||||
This class can be used to check whether a page may be crawled by looking at the robots.txt file of its site. Innovation Award |
|
Robots exclusion standard is considered propper netiquette, so any kind of script that exhibits crawling-like behavior is expected to abide by it. The intended use of this class is to feed it a url before you intend to visit it. The class will automatically attempt to read the robots.txt file and will return a boolean value to indicate if you are allowed to visit this url. Maximum Crawl-delays and request-rates maxed-out at 60seconds. The class will block until the detected crawl-delay (or request-rate) allows visiting the url. For instance, if Crawl-delay is set to 3, the Robots_txt::urlAllowed() method will block for 3 seconds when called a second time. An internal clock is kept with the last visited time, so if the delay is already expired, the method will not block. Example usage foreach($arrUrlsToVisit as $strUrlToVisit) { if(Robots_txt::urlAllowed($strUrlToVisit,$strUserAgent)) { #visit url, do processing. . . } } The simple example above will ensure you abide by the wishes of the site owners. Note: an unofficial non-standard extension exists, that limits the times that crawlers are allowed to visit a site. I choose to ignore this extension because I feel it is unreasonable. Note: You are only *required* to specify your userAgent the first time you call the urlAllowed method, and only the first value is ever used. Example Usage var_dump(Robots_txt::urlAllowed('http://slashdot.org/','Slurp')); var_dump(Robots_txt::urlAllowed('http://slashdot.org/test','Slurp')); |
Version Control | Unique User Downloads | Download Rankings | |||||||||||||||
0% |
|
|
User Ratings | User Comments (1) | ||||||||||||||||||||||||||||||||||
|
|
Applications that use this package |
If you know an application of this package, send a message to the author to add a link here.