PHP Classes
elePHPant
Icontem

Sorcerer: Scrape Web page content using regular expressions

Recommend this page to a friend!
  Info   View files Documentation   View files View files (6)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Last Updated Ratings Unique User Downloads Download Rankings
2017-02-01 (15 hours ago) RSS 2.0 feedNot yet rated by the usersTotal: Not yet counted Not yet ranked
Version License PHP version Categories
sorcerer 1.0.0MIT/X Consortium ...5PHP 5, Web services
Description Author

This class can scrape Web page content using regular expressions,

It takes a given page URL and retrieves its contents.

The class can use a given list of regular expressions and extract the page content matches to a given file.

  Performance   Level  
Name: Gavin Gordon Markowski <contact>
Classes: 10 packages by
Country: Canada Canada
Innovation award
Innovation award
Nominee: 4x

Details

Sorcerer

Packagist Version Github Release Usage License

Description

An easy-to-use PHP class for scraping webpages' source code.

Usage

Installation

	$ composer require gavinggordon/sorcerer

Examples

Insantiation

	include( 'vendor/autoload.php' );

	use GGG\Http\Data\Collection\Sorcerer as Sorcerer;
	
	$scraper = new Sorcerer();

Configuration

	$url = 'http://www.testurl.com/index.php';
	
	$regexes = [
		'/\<a\s?[^\>]+?\>(.+)\<\/a\>/i',
		'/\<img\s?([^\>]+?)[\s\/]*?\>/i'
	];
	
	$savefile = __DIR__ . './testurl-scrapedata.txt';
	
	$scraper->configure( $url, $regexes, $savefile );

Run

If no filepath was set for "$savefile",...

	$data = $scraper->scrape();
	
	print_r( $data );

...the scraped data will be returned.

If a filepath was set for "$savefile",...

	$scraper->scrape();

...the scraped data will be saved to the file which you specified.

Issues

If you have any issues at all, please post your findings in the issues page at https://github.com/gavinggordon/sorcerer/issues.

License

This package utilizes the MIT License.

  Files folder image Files  
File Role Description
Files folder imagesrc (1 directory)
Accessible without login Plain text file .travis.yml Data Auxiliary data
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file LICENSE.txt Doc. Documentation
Accessible without login Plain text file phpunit.xml Data Auxiliary data
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  src  
File Role Description
Files folder imageHttp (1 directory)

  Files folder image Files  /  src  /  Http  
File Role Description
Files folder imageData (1 directory)

  Files folder image Files  /  src  /  Http  /  Data  
File Role Description
Files folder imageCollection (1 file)

  Files folder image Files  /  src  /  Http  /  Data  /  Collection  
File Role Description
  Plain text file Sorcerer.php Class Class source

 Version Control Unique User Downloads  
 100%
Total:0
This week:0