This class allows you to get data from any site.
The data are taken from defined locations in the DOM structure.
Data points are defined using the phpquery notation - similar to the selectors used in JQuery library.
This class can fetch data in three different modes by:
* scanning a single page
* scanning a "from->to" range of pages matching defined URL schema
* scanning a list of URLs retrieved from a PHP array
EXAMPLE
$scrap = new Scraper();
//set base url with token named ##TOKEN##.
$scrap->setBaseUrl('http://your.site.ccm/path/to/details.html?id=##TOKEN##');
//Set the scan range for the token
//##TOKEN## will be replaced by from the scope of id
$scrap->addRangeScanRule(151598039, 151598042, '##TOKEN##');
//definition of points where data are
$scrap->addDataTarget('name', '.headline .margin h1');
$scrap->addDataTarget('price', '#buyerpricegross');
$scrap->addDataTarget('image', '#imageWrapper #thumbnailoverlay a');
$data = $scrap->process();
//$data has array structure:
array(
array('name' => ....,
array('price' => ....,
array('image' => ....,
),
....
....
....
|