PHP Classes
elePHPant
Icontem

Best crawler for specific Web sites: How can choose pertinent paragraphs for indexing a specific site

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  All requests RSS feed  >  Best crawler for specific Web sites  >  Request new recommendation  >  A request is featured when there is no good recommended package on the site when it is posted. Featured requests  >  No recommendations No recommendations  

Best crawler for specific Web sites

Edit

Picture of hadra momo by hadra momo - 8 months ago (2015-05-31)

How can choose pertinent paragraphs for indexing a specific site

This request is clear and relevant.
This request is not clear or is not relevant.

+1

I created a class using curl (HTTP transport) to get content from certain urls, but I want to get just some paragraphs.

My objective is to index some web sites, but I don't want to have bug databases. How can I proceess the retrieved content?

Ask clarification

1 Recommendation

HTML Parser: Parse HTML using DOMDocument

This recommendation solves the problem.
This recommendation does not solve the problem.

+2

Picture of Dave Smith by Dave Smith package author package author Reputation 5955 - 8 months ago (2015-06-01) Comment

This class will parse the document as a string, so you can get the whole webpage using curl or file_get_contents (if you are able to supply url's to fopen). It can then return an array of the entire document or all of a specific element like <p> paragraphs. What you do with the information after that, like saving it to a database, is up to you.


Recommend package
: 
: