This class can be used to crawl a site and retrieve the the URL of all links.
It can retrieve a page of a site and follow all links recursively to retrieve all the site URLs.
The class can restrict the crawling to URLs with a given extension and avoids accessing pages listed in the site robots.txt file, or pages set with the no index or no follow meta tags.
|
Applications that use this package |
|
No pages of applications that use this class were specified.
If you know an application of this package, send a message to the author to add a link here.
|
Files |
|