This class can be used to extract links and images from remote Web pages.
It can access Web pages, parse the pages HTML and extract the URLs of the links and the images.
If necessary, the class may access a login page and emulate the submission of a login form to subsequent accesses can be done on behalf of the logged user.
|
Name: |
Crawler |
Base name: |
crawler |
Description: |
Extract links and images from remote Web pages |
Version: |
1.1 |
PHP version: |
4.0 |
License: |
Freely Distributable |
|
|
March 2008
Number 7
Prize: One copy of Delphi for PHP |
Retrieving Web pages from remote sites is a relatively easy task in PHP.
If you want to crawl a site to search for something in its pages, you only need to retrieve the site pages, use some regular expressions to extract the site links, and retrieve the linked pages until all pages were followed.
However, if some pages can only be accessed by authenticated users, the problem is no longer so simple.
This package provides a more complete solution to the problem of crawling site pages by automatically authenticating, so it can access all pages restricted to logged users.
Manuel Lemos |
|
Pages that reference this package |
|
Latest pages that reference packages
|
Applications that use this package |
|
No pages of applications that use this class were specified.
If you know an application of this package, send a message to the author to add a link here.
|
Files |
|