n.a URL from which a web crawler will begin to harvest a websiteProm and Swain 2007, 360, fn. 27The Internet Archive now provides a subscription service, Archive-It, that institutions can use to capture information by harvesting information from seed URLs. Content is stored on the Archive-It servers. The service was not available at the time this study was completed, but it may offer a good option for archivists seeking a relatively simple capture mechanism and off-campus storage options.Olston and Najork 2010, 198The implications for crawling are: (1) one cannot simply crawl to depth N, for a reasonable value of N like N = 20, and be assured of covering the entire web graph; (2) crawling “seeds” (the pages at which a crawler commences) should be selected carefully, and multiple seeds may be necessary to ensure good coverage.Gelfand 2015, 9Once the seed was crawled, it had to undergo quality control to make sure that the quality of the capture was the best one possible.Dooley and Bowers 2018, 34When describing a single archived website, the seed URL may be a key piece of information to assist in discovery. It may no longer function when a site’s address changes, and so may not be appropriate to include as access elements unless your system will resolve the change. An archived site must always have an access URL that will continue to be valid after the live site is taken down.Abrams et al. 2019, 459We also coded the following in our analysis: misleading or unsatisfactory metadata display, confusion regarding the meaning of collection (a group of websites) versus seed (a single website), and misuse of facets or expressing the view that the facets were unhelpful or confusing. It is worth noting that for the number of times each attribute or tool was unsuccessful in helping a participant complete a task, it sometimes helped different participants, or even the same participant in a different moment. Metadata (reading, understanding, and clicking on the descriptive metadata displayed with each seed on the collections page) was helpful 3 times as often (38) as it was unhelpful or confusing.Wickner 2019, 5Web crawlers are one widely used web archiving method. A crawler begins to archive when a user specifies a starting “seed” URL. It creates and saves a facsimile of the seed, then identifies, follows, and copies links leading out from that page.Wiedeman 2019b, 6Any web page captured in a web crawl—whether that page is a seed or was just captured along the way—could be described at any point in an archival hierarchy.Lohndorf 2023A seed is an item with a unique identifier in the Archive-It backend. A seed has associated data that does not change, like the dates on which it was added or updated and its crawl history. Seeds also have data that can be edited like Seed Level Metadata, notes, and even the seed URL. ¶ A seed URL is both a starting point for the crawlers, as well as an access point to archived pages. A seed URL can be, for example: ¶ an entire website . . . ¶ a specific part (directory) of a website . . . ¶ a specific document . . .
Notes
Informally referred to as “seed.” The input web crawlers use to initiate indexing and crawling processes. A seed URL is both a starting point for the crawlers and an access point to archived pages.