Recommend this page to a friend! | Stumble It! | Bookmark in del.icio.us |
All requests | > | Best crawler for specific Web sites | > | Request new recommendation | > | Featured requests | > | No recommendations | ||
by hadra momo - 8 months ago (2015-05-31) crawler
+1 | I created a class using curl (HTTP transport) to get content from certain urls, but I want to get just some paragraphs. My objective is to index some web sites, but I don't want to have bug databases. How can I proceess the retrieved content? |
+2 | by Dave Smith 5955 - 8 months ago (2015-06-01) Comment This class will parse the document as a string, so you can get the whole webpage using curl or file_get_contents (if you are able to supply url's to fopen). It can then return an array of the entire document or all of a specific element like <p> paragraphs. What you do with the information after that, like saving it to a database, is up to you. |
Recommend package | |
|