PHP Classes

PHP Email Crawler: Crawl Web site pages to extract email addresses

Recommend this page to a friend!
  Info   View files Example   View files View files (6)   DownloadInstall with Composer Download .zip   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
Not enough user ratingsTotal: 81 All time: 10,082 This week: 140Up
Version License PHP version Categories
email-crawl 1.0.0GNU General Publi...5Email, PHP 5, Searching, Web services, C...
Description 

Author

This package can crawl Web site pages to extract email addresses.

It can take the URL of a given site and retrieve the page contents.

The package can parse the page to extract any email addresses that it contains and links to other pages.

Then it may crawl other linked pages recursively to extract different email addresses also contained in the pages.

The count of crawled pages can be limited to a given number.

The email addresses found using this package will be returned in an array.

A report of the crawl process may be outputted to the console terminal or saved to a file.

Picture of Ujah Chigozie peter
  Performance   Level  
Name: Ujah Chigozie peter <contact>
Classes: 25 packages by
Country: Nigeria Nigeria
Innovation award
Innovation award
Nominee: 11x

Example

<?php
error_reporting
(E_ALL);
ini_set('display_errors', '1');
require
__DIR__ . '/plugins/autoload.php';
use
Peterujah\NanoBlock\EmailCrawl;
$target = "https://default.com/contact";
$limit = 50;
if(!empty(
$argv[1])){
    if(
filter_var($argv[1], FILTER_VALIDATE_URL)){
       
$target = $argv[1];
       
$limit = $argv[2]??50;
    }else{
       
$req = unserialize(base64_decode($argv[1]));
       
$target = $req["target"];
       
$limit = $req["max"]??50;
    }
}
$craw = new EmailCrawl($target, $limit);
$resInstance = $craw->craw()->getResponse();
$data = $resInstance->inLine();
$resInstance->printCommandResult($data)->saveAs(__DIR__ . "/craw/", $data);


Details

email-crawl

PHP Email Web Crawler, is a simple and easy to use class that uses curl & command line interface to extract email address from websites. It also has the feature to deep extract email from website link which is found from the initial target website.

Installation

Installation is super-easy via Composer:

composer require peterujah/email-crawl

Basic Usage

Initalize email crawl instance

$craw = new EmailCrawl("https://example.com", 200);

Star email crawling scan

$craw->craw()

Get scanned response and return CrawlResponse instance

$response = $craw->getResponse();

Get response emails separate in a new line

$data = $response->inLine();

Get response emails separate with a comma

$data = $response->withComma();

Get response emails as an array

$data = $response->asArray();

Print response email

$response->printCommandResult($data);

Save response emails to file. This will save result as json string

$response->save("/path/save/craw/");

Save response emails to file. If string data is passed it will save it, els it will save result as json string

$response->saveAs("/path/save/craw/", $data);

Example

Create a file name it craw.php, inside the file add this example code. With this example you can run your craw directly from command line, browser or php shell_exec.

error_reporting(E_ALL);
ini_set('display_errors', '1');
require __DIR__ . '/plugins/autoload.php';
use Peterujah\NanoBlock\EmailCrawl;
$target = "https://example.com/contact";
$limit = 50;
if(!empty($argv[1])){
    if(filter_var($argv[1], FILTER_VALIDATE_URL)){
        $target = $argv[1];
        $limit = $argv[2]??50;
    }else{
        $req = unserialize(base64_decode($argv[1]));
        $target = $req["target"];
        $limit = $req["max"]??50;
    }
}
$craw = new EmailCrawl($target, $limit);
$response = $craw->craw()->getResponse();
$data = $response->inLine();
$response->printCommandResult($data)->saveAs(__DIR__ . "/craw/", $data);

Execute craw through command line interface, run the below command

php craw.php https://google.com 50

Execute craw through php shell_exec, create a file call exec.php and add below example script. Note: change PHP_SHELL_EXECUTION_PATH to your php executable path. Once done navigate to https://mycraw.example.com/exec.php

define("PHP_SHELL_EXECUTION_PATH", "path/to/php");
$crawOptions = array(
    'target' => 'https://example.com',
    'max' => 50,
);
$crawRequest = base64_encode(serialize($crawOptions));
$crawScript =  __DIR__ . "/craw.php";
$crawLogs =  __DIR__ . "/craw_logs.log";
shell_exec(PHP_SHELL_EXECUTION_PATH . " " . $crawScript . " " . $crawRequest ." 'alert' >> " . $crawLogs . " 2>&1");

ATTENTION

Is advisable to run this code in command line interface for be better performance.


  Files folder image Files  
File Role Description
Files folder imagesrc (2 files)
Accessible without login Plain text file composer.json Data Auxiliary data
Accessible without login Plain text file craw.php Example Example script
Accessible without login Plain text file exec.php Aux. Auxiliary script
Accessible without login Plain text file README.md Doc. Documentation

  Files folder image Files  /  src  
File Role Description
  Plain text file CrawlResponse.php Class Class source
  Plain text file EmailCrawl.php Class Class source

 Version Control Unique User Downloads Download Rankings  
 100%
Total:81
This week:0
All time:10,082
This week:140Up