PHP Classes

Very simple page details: Parse and extract Web page information details

Recommend this page to a friend!
     
  Info   Example   View files Files   Install with Composer Install with Composer   Download Download   Reputation   Support forum   Blog    
Ratings Unique User Downloads Download Rankings
StarStarStarStar 63%Total: 244 All time: 7,996 This week: 50Up
Version License PHP version Categories
php-vspd 1.4.0Custom (specified...5HTML, PHP 5, Parsers
Description 

Author

This class can parse and extract Web page information details.

It can retrieve a Web page from a given URL and parse it to extract details like:

- Page title
- Page head and body
- Meta tags
- Character set
- Links expanded to full path
- Images
- Page headers from H1 through H6
- Internal and external links checking if they are broken
- Page elements by class or id value

Picture of zinsou A.A.E.Moïse
  Performance   Level  
Name: zinsou A.A.E.Moïse <contact>
Classes: 50 packages by
Country: Benin Benin
Innovation award
Innovation award
Nominee: 23x

Winner: 2x

Recommendations

Link Checker
Find broken links in a Web site

Extract div data or tags text from Web pages
I need to extract the values that are in divs of the same class

What is the best PHP web content crawler class?
Extracting content by passing the URL of a web site

Extract text or links from a web page
i need to parse and extract text

Retrieve a page content
I need a crawler to get a data from an url

Example

<?php session_start(); ?>
<!DOCTYPE HTML>
<html lang="en">
    <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" >
    <title>Test</title>
    </head>
    <body>

<?php
set_time_limit
(0);
include_once
"VSPD.class.php";
 
//$obj=new VSPD("https://www.phpclasses.org/");

 
$obj=new VSPD("https://fr.investing.com/indices/major-indices",stream_context_create($opts = array(
 
'http'=>array(
   
'method'=>"GET",
   
'user_agent'=>"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
 
)
)));
// echo "Page title:";
// echo '<pre>'.$obj->getTitle().'</pre>';

// echo "All Images:";
// echo '<pre>'.print_r($obj->getImages(),true).'</pre>';



// echo "Internal links:";

// echo '<pre>'.print_r($obj->getInternalinks(true),true).'</pre>';

// echo "External links:";

// echo '<pre>'.print_r($obj->getExternalinks(true),true).'</pre>';
// echo "Headers:";
// echo '<pre>'.print_r($obj->getHeaders(),true).'</pre>';
// echo "Header1:";
// echo '<pre>'.print_r($obj->getH1(),true).'</pre>';
// echo "Header2:";
// echo '<pre>'.print_r($obj->getH2(),true).'</pre>';
// echo "Header3:";
// echo '<pre>'.print_r($obj->getH3(),true).'</pre>';
// echo "CHARSET:";
echo '<pre>'.print_r($obj-> getCharset(),true).'</pre>';
echo
"METAS:";
echo
'<pre>'.print_r($obj-> XplicitMeta(),true).'</pre>';
// echo "Specifics tag:";
// echo '<pre>'.print_r($obj-> getDTag('div'),true).'</pre>';
// echo '<pre>'.print_r($obj-> getSTag('img'),true).'</pre>';
// echo '<pre>'.var_dump($obj->getElementsByTagName('div')).'</pre>';
echo '<pre>'.print_r($obj-> getOG(),true).'</pre>';
echo
'<pre>'.print_r($obj-> getTwitterTags(),true).'</pre>';
echo
'<pre>'.print_r($obj-> getHttpEquiv(),true).'</pre>';
// echo "BROKEN LINKS:";
// echo '<pre>'.var_dump($obj->check_broken_externalLinks()).'</pre>';
// echo "check FAKE BROKEN LINKS:";
// $ar=array('https://www.phpclasses.org/browse/mouton.html','https://www.phpclasses.org/voleur.html','https://www.stupidthieves.com','www.phpclasses.org/');
// foreach($ar as $k=>$v){
// if(VSPD::is_broken_link($v)) $brokens[]=$v;
// }
// echo '<pre>';
// var_dump($brokens);
// echo '</pre>';
?>
</body>
</html>


Details

PHP VSPD is a little package to get more details about a web page content Actually there are methods to get title to get the full head to get the full body to find any html tags to get explicit meta tags to get charset to get open graph tags to get twitter tags to get Applinks tags to get Http-equiv tags to find and rebuild all links(to absolute path) to find and rebuild all images and source(to avoid broken images href) to get all headers once but also individual type of header as H1,H2 etc... to get element by id to get elements by class to get elements by tag name to get elements by name to get all internal Links to get all externals Links to check if a link is a broken link to check all internal broken links to check about all externals broken links to check globally broken links The package only parse Html or xhtml files when URL is valid and will throw exception when the url is not valid or when the file is not html or xhtml for more details check the class statement and see the how to use file test.php for feedback and bug reporting write to leizmo@gmail.com or use the dedicated support forum....

  Files folder image Files (4)  
File Role Description
Accessible without login Plain text file license.txt Lic. license file
Accessible without login Plain text file readme.txt Doc. readme
Accessible without login Plain text file test.php Example example script
Plain text file VSPD.class.php Class class source

The PHP Classes site has supported package installation using the Composer tool since 2013, as you may verify by reading this instructions page.
Install with Composer Install with Composer
 Version Control Unique User Downloads Download Rankings  
 0%
Total:244
This week:0
All time:7,996
This week:50Up
User Ratings User Comments (2)
 All time
Utility:66%StarStarStarStar
Consistency:100%StarStarStarStarStarStar
Documentation:100%StarStarStarStarStarStar
Examples:100%StarStarStarStarStarStar
Tests:-
Videos:-
Overall:63%StarStarStarStar
Rank:831
 
I try your php code on PHPCLASSES : « Very simple page deta...
7 years ago (Dominique VARLET)
42%StarStarStar
Very good.
7 years ago (Alekos Psimikakis)
67%StarStarStarStar