Categories
Cross-referenced terms
- Synonym
- Related Terms

web crawler

n. software that systematically browses and captures web content Prom 2002, 269It is important to provide live hyperlinks to each finding aid (so that Web crawlers find the finding aid) and to properly code thetag to supply information which can be indexed by search engines. O’Sullivan 2005, 71The Wayback Machine is an automated Web crawler that makes mirror images of all currently available Web sites. Olston and Najork 2010, 176A web crawler (also known as a robot or a spider) is a system for the bulk downloading of web pages. Jones and Neubert 2017, 11–12Many websites monitor IP addresses for traffic hitting their sites and when a bot is noticed that is perceived as behaving badly, webmasters will stop that traffic. As a result, Web crawlers are configured to be “polite,” to throttle back the rate of interactions with the target site. Wickner 2019, 5Web crawlers are one widely used web archiving method. A crawler begins to archive when a user specifies a starting “seed” URL. It creates and saves a facsimile of the seed, then identifies, follows, and copies links leading out from that page. The crawler repeats these steps until it reaches a user-specified limit defined in terms of host domain, number of documents, data, page quantity, or time. Kelly 2019, 5Users are generally assumed (or, at the very least, hoped) to be human users and not bots or web crawlers, though, as is discussed later, the choice to analyze use and reuse by machine-generated “users” may vary by institution. Lohndorf 2022There are some characteristics of sites that can prevent Heritrix from crawling. For example, a robots.txt file is a tool used to direct a web crawler (not just ours, but any crawler) not to crawl all or specified parts of a website.

Word of the Week
subscribe
Terms trending now...

Browse by Alphabet
Suggest a Term
Provide Feedback

Privacy & Confidentiality
Disclaimer
Contact Us

Copyright © 2005-2026 by SAA. All rights reserved.