n.a portion of a body of records that are chosen for permanent retention through a process that ensures the retention of a statistical sample of each type or category of file within the series in the retained set at the same percentage as each existed within the whole; a stratified random sampleHull 1981, 36It points out that, in technical terms, it would be very easy to take a sample of machine-readable records but doubts the value of the exercise, since random sampling in this context ‘seldom reflect the real records/information world, since in most instances the distribution of the records, say, in a case file (sc. machine-readable) is arbitrary’. It suggests, therefore, that a stratified sample, i.e. one covering such matters as sex and marital status, would be required, but considers that the cost of the operation would be prohibitive and unjustified.Kepley 1984, 241Stratification is a species of statistical sampling in which certain parts of the universe to be sampled are weighted differently than others. This differs from a straight systematic or random sample in which all parts of the universe theoretically have an equal chance to be selected in the sample.Cook 1991b, 38–39Stratified random sampling is where the whole is broken down into logical “strata” (which may be defined as parts or subgroups or geographic areas or file blocks of the whole—like the categories in the United States Justice Department litigation case files mentioned earlier), and then each stratum is randomly or systematically sampled, thus ensuring that no part is overlooked.
Notes
The stratified sample ensures the statistical relevance of the sample hews to each population of record types or categories within the entire set of records being sampled. Each type of case, for instance, in a set of case files may take up a different percentage of the entire series, so the stratified sample ensures each case type is represented at the same percentage in the retained sample as it did in the series as a whole.