With DISCO sampling, you can return a randomized set of documents that meet your search criteria. Here’s how it works:
- sample(0.1, “any query”) – returns 10% of the documents matching your query
- sample(10%, “any query”) – returns 10% of the documents matching your query
- sample(500, “any query”) – returns a maximum of 500 documents matching your query
- sample(10%, “any query”) – returns 10% of search results set
- sample(10, tag(by price@csdisco.com)) – returns 10 documents tagged by price@csdisco.com
Sampling also supports more complex searches. To search a 10% sample of the keywords budget, fraud, or inquiry with the custodian Jeff Skilling within a date range of 1/1/2000 - 1/1/2002, use sample(10%,("budget" "fraud" "inquiry") & custodian("Jeff Skilling") & date(1/1/2000 to 1/1/2002)).
Some important items to note:
- Re-running a search will return a new random subset of documents.
- The sort order of the search results is disabled when using sampling.