File: README.md

Recommend this page to a friend!

README.md

File:	`README.md`
Role:	Documentation
Content type:	`text/markdown`
Description:	Documentation
Class:	Broken Links Scanner Scan web sites to identify broken links
Author:	By Ujah Chigozie peter
Last change:	Update README.md Update README.md
Date:	9 months ago
Size:	`5,914 bytes`

Download

PHP Broken Links Scanner

A PHP library for scanning websites to identify broken links and extract relevant information. Ensure that the required PHP extensions are installed, particularly cURL, for the scanner to function properly.

Installation is super-easy via Composer:

composer require peterujah/broken-links-scanner

CLI Usage

CLI Example Use the CLI script to scan a website for broken links.

Options:

`--url` (required): The starting URL for the scan (e.g., `http://luminova.ng/docs/` or `http://luminova.ng/`).
`--host` (required): The scan URL hostname (e.g., `luminova.ng`).
`--path` (optional): Path to save the scan results.
`--output` (optional): Flag to control output of broken links. Use `1` to print, or `0` to suppress output (default: `0`).
`--timeout` (optional): Maximum time in seconds to wait for the scan to complete (default: `0`).
`--limit` (optional): Maximum number of scans to perform. Use `0` to scan all URLs (default: `0`).

Example Usage:

To start a scan, run the following command:

php broken --url="https://luminova.ng/" --host="luminova.ng" [--timeout=10] [--path="/scanner/logs"] [--output=0] [--limit=0]

Example: Using Scanner to Scan a Website for Broken Links

Initialize Scanner with the necessary parameters and register your custom classes.

1. Basic Usage

require_once __DIR__ . '/vendor/autoload.php';

use \Peterujah\BrokenLinks\Scanner;

// Define the starting URL for the scan
$url = 'https://luminova.ng/';
$host = 'luminova.ng';
$maxScan = 10; // Set to 0 to scan all URLs.

// Initialize the BrokenLinks class
$scanner = new Scanner($url, $host, $maxScan);

// Optionally set the path to save scanned URLs
$scanner->setPath($path);

2. Start the Scan and Retrieve Results

If the path is not set, you can get the output directly:

if ($scanner->start() && $scanner->isCompleted()) {
    // Get results from the scan
    $brokenLinks = $scanner->getBrokenLinks();
    $visitedUrls = $scanner->getVisitedUrls();
    $errors = $scanner->getErrors();
    $allUrls = $scanner->getUrls();

    // Output the scanned data
    echo "Broken Links:\n";
    print_r($brokenLinks);

    echo "\nVisited URLs:\n";
    print_r($visitedUrls);

    echo "\nErrors Encountered:\n";
    print_r($errors);

    echo "\nAll Extracted URLs:\n";
    print_r($allUrls);
} else {
    echo "Failed to complete the scan.\n";
}

3. Using the `wait` Method

To wait for the scan to complete, you can use the wait method with a specified timeout:

$timeout = 30;

try {
    $scanner->wait($timeout, function (BrokenLinks $scanner) {
        $brokenLinks = $scanner->getBrokenLinks();
        echo "Broken Links:\n";
        print_r($brokenLinks);
    });
} catch (RuntimeException $e) {
    echo "Error: " . $e->getMessage() . "\n";
}

> Note: When using the wait method no need to call start method again.

Class Methods Documentation

__construct

Description: Initializes a new instance of the scanner with the specified URL and hostname.
Parameters: - `string $url`: The starting URL for the scan (e.g., `https://luminova.ng/docs/`). - `string $host`: The hostname for the URL to scan (e.g., `luminova.ng`). - `int $maxScan`: The maximum number of scans to perform (default is `0`, which means no limit).

isCompleted(): bool

Description: Checks whether the scanning process has been completed.
Returns: - `bool`: Returns `true` if the scan is completed; otherwise, returns `false`.

getBrokenLinks(): array

Description: Retrieves the list of broken URLs identified during the scan.
Returns: - `array`: An array containing the broken URLs.

getVisitedUrls(): array

Description: Retrieves the list of URLs that have been visited during the scan.
Returns: - `array`: An array containing the visited URLs.

getErrors(): array

Description: Retrieves the error messages encountered during the scan process.
Returns: - `array`: An array containing the error messages.

getUrls(): array

Description: Retrieves the list of extracted URLs during the scan.
Returns: - `array`: An array containing the extracted URLs.

setPath(string $path): self

Description: Sets the file path where scanned URLs will be saved.
Parameters: - `string $path`: The file path to save scanned URLs.
Returns: - `self`: Returns the current instance of the class for method chaining.

cli(bool $cli): self

Description: Sets whether the scanning results should be shown in the command line interface (CLI).
Parameters: - `bool $cli`: `true` if running in CLI mode; otherwise, `false`.
Returns: - `self`: Returns the current instance of the class for method chaining.

start(): bool

Description: Initiates the link scanning process.
Returns: - `bool`: Returns `true` if the scan completes successfully; returns `false` otherwise.
Throws: - `RuntimeException`: Throws an exception if the provided URL is invalid.

wait(int $timeout, ?callable $onComplete = null): void

Description: Waits for the scanning process to complete or until a specified timeout is reached. If a callback function is provided, it will be executed upon completion.
Parameters: - `int $timeout`: The maximum number of seconds to wait. If `0`, it waits indefinitely until the scan is completed. - `callable|null $onComplete`: An optional callback function to be executed when the scan completes.
Throws: - `RuntimeException`: Throws an exception if the timeout is exceeded before completion.