Login   Register  
PHP Classes
elePHPant
Icontem

Title: Spidering Hacks

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us

  Latest classes All reviews   Spidering Hacks   Latest classes Latest reviews   Best sellers ranking Best sellers ranking  

Title

Spidering Hacks

Category

Web development books

Authors

Kevin Hemenway
Tara Calishain

Publisher

O'Reilly & Associates

Release date

November 1, 2003

ISBN

0596005776

Reviews

May 13, 2004 55555 5
  Post a comment Post a comment   See comments See comments   Find where to Buy Now Find where to Buy Now  
Picture of Manuel Lemos
Manuel Lemos
manuellemos.net
Spidering is an invented word that refers to the activity of spiders. In the Internet world, spiders are programs meant to crawl the World Wide Web to extract information or perform some type of interaction with other Web sites.

"Spidering Hacks" is a book that covers all sorts of issues related to spidering activities. Despite what the title may suggest, this book is not about taking advantage of other Web sites in an illicit way. As a matter of fact the book draws attention to the legal issues involved in making acceptable use of other sites' content or services.

The book is made up of 100 tips on various spidering issues, referred to as hacks. These hacks are presented in six chapters. The first chapter provides an introduction to spidering and the steps that should be taken to develop spider programs that work in a way acceptable to the sites that they crawl.

The second chapter addresses the tools and components that can be used to develop spider programs. Although spiders can be developed in any language, this chapter mentions mostly existing Perl modules that provide ready to use capabilities that simplify the development of new spider programs.

The third chapter focuses on retrieving media files from other sites, such as pictures, music, movies, news headlines, etc.. It also includes information about extracting files sent as e-mail attachments by accessing POP3 e-mail servers.

The fourth chapter is the largest in the book. It contains numerous tips on accessing many types of database driven sites, such as mailing list sites like Yahoo Groups, search engines like Google , directories like Yahoo, e-commerce sites like Amazon, etc..

The fifth chapter addresses aspects of managing spiders that are meant to run periodically. It discusses spider automation using cron or other operating system specific task scheduling tools. It also addresses techniques for mirroring Web sites and accumulating search engine results.

The final chapter tells developers what they can do to make it easy for others to benefit from your site content and services, eventually also providing simplified access to your site by others. It explains how to make your content easier to syndicate by generating XML RSS feeds or providing Web services using API standards like XML-RPC and REST. It also has an interesting example of providing a Instant Messaging robot to get the latest security vulnerabilities.

This book addresses an uncommon but very interesting Web development subject of how to enhance the content or services provided by a site by developing spider programs that automatically retrieve information or interact with other Web sites. If you are looking for ideas to enhance the value of your site to your users, you will certainly love this book.
  Post a comment Post a comment   See comments See comments   Find where to Buy Now Find where to Buy Now  

Comments

No comments were submitted yet.

Post a comment