Recommend this page to a friend! | Stumble It! | Bookmark in del.icio.us |
All reviews | Spidering Hacks | Latest reviews | Best sellers ranking | |||||
TitleSpidering Hacks
CategoryWeb development books
Authors
Kevin Hemenway
Tara Calishain PublisherO'Reilly & Associates
Release dateNovember 1, 2003
ISBN0596005776
|
|||||||||||||||||||||||||||||
Reviews |
|||||||||||||||||||||||||||||
Manuel Lemos manuellemos.netSpidering is an invented word that refers to the activity of spiders. In the Internet world, spiders are programs meant to crawl the World Wide Web to extract information or perform some type of interaction with other Web sites. "Spidering Hacks" is a book that covers all sorts of issues related to spidering activities. Despite what the title may suggest, this book is not about taking advantage of other Web sites in an illicit way. As a matter of fact the book draws attention to the legal issues involved in making acceptable use of other sites' content or services. The book is made up of 100 tips on various spidering issues, referred to as hacks. These hacks are presented in six chapters. The first chapter provides an introduction to spidering and the steps that should be taken to develop spider programs that work in a way acceptable to the sites that they crawl. The second chapter addresses the tools and components that can be used to develop spider programs. Although spiders can be developed in any language, this chapter mentions mostly existing Perl modules that provide ready to use capabilities that simplify the development of new spider programs. The third chapter focuses on retrieving media files from other sites, such as pictures, music, movies, news headlines, etc.. It also includes information about extracting files sent as e-mail attachments by accessing POP3 e-mail servers. The fourth chapter is the largest in the book. It contains numerous tips on accessing many types of database driven sites, such as mailing list sites like Yahoo Groups, search engines like Google , directories like Yahoo, e-commerce sites like Amazon, etc.. The fifth chapter addresses aspects of managing spiders that are meant to run periodically. It discusses spider automation using cron or other operating system specific task scheduling tools. It also addresses techniques for mirroring Web sites and accumulating search engine results. The final chapter tells developers what they can do to make it easy for others to benefit from your site content and services, eventually also providing simplified access to your site by others. It explains how to make your content easier to syndicate by generating XML RSS feeds or providing Web services using API standards like XML-RPC and REST. It also has an interesting example of providing a Instant Messaging robot to get the latest security vulnerabilities. This book addresses an uncommon but very interesting Web development subject of how to enhance the content or services provided by a site by developing spider programs that automatically retrieve information or interact with other Web sites. If you are looking for ideas to enhance the value of your site to your users, you will certainly love this book.
|
|||||||||||||||||||||||||||||
CommentsNo comments were submitted yet. |