Best crawler for specific Web sites: How can choose pertinent paragraphs for indexing a specific site

Recommend this page to a friend!

All requests

Best crawler for specific Web sites

Request new recommendation

Featured requests

No recommendations

Best crawler for specific Web sites #crawler

Edit

by hadra momo - 10 years ago (2015-05-31)

How can choose pertinent paragraphs for indexing a specific site

+2	I created a class using curl (HTTP transport) to get content from certain urls, but I want to get just some paragraphs. My objective is to index some web sites, but I don't want to have bug databases. How can I proceess the retrieved content?

Ask clarification

1 Recommendation

HTML Parser: Parse HTML using DOMDocument

This class can parse HTML documents using DOMDocument.

It can load the HTML markup either from a file or from a text string.

It can parse the entire document, returning an array of elements.

It can parse the document for a specific element, returning an array of each element found. It also can return the element's child elements.

It can return an element referenced by a given ID.

It can display the returned results in a human readable form.

by Dave Smith package author 7625 - 10 years ago (2015-06-01) Comment

This class will parse the document as a string, so you can get the whole webpage using curl or file_get_contents (if you are able to supply url's to fopen). It can then return an array of the entire document or all of a specific element like <p> paragraphs. What you do with the information after that, like saving it to a database, is up to you.

Recommend package

About us

Advertise on this site

For more information send a message to info at phpclasses dot org.