Auto Review is a powerful tool for categorizing documents based on your written descriptions. But how should you write those descriptions? There are many good ways to arrive at strong results; there's not a singular, proscriptive "best" way to write a good tag description. However, keeping the following tips in mind can help you to get to the results you need!
Writing tag descriptions
Grammar and Content
- Write descriptions in plain English.
- Auto Review can understand many nuances of written language, much like a human. But much like a human, confusing grammatical construction can be confusing and imprecise.
- Define key entities, roles, and acronyms that are specific to the matter.
- Acronyms that are highly common in the world probably do not need to be independently defined. Auto Review understands "CEO" to mean "Chief Executive Officer", and vice-versa. However, if your tag description includes looking for a term that is commonly abbreviated, but which is industry-specific, defining the acronym is advised. For example, if you are looking for “…any occurrences of an EM scheduling a meeting after…”, consider writing it as “…any occurrences of an engineering manager (EM) scheduling a meeting after…”
- Remember that the LLM being used to help power Auto Review has essentially been trained on the whole internet, but it is not trained on your dataset. If something is very specific to your industry or dataset, the LLM likely doesn’t know it, or isn’t able to narrow down to the correct meaning, unless it’s specifically defined in the exact document being reviewed. Auto Review does not have memory from document to document, because Auto Review is not training on your data.
- When defining a key entity, also include any aliases or nicknames commonly used for that entity.
- Avoid using too many pronouns, and especially avoid words such as “plaintiff” and “defendant”. Auto Review often won’t know who or what the pronoun refers to, unless you tell it in your description. For example, if Bob Smith is the CEO of Acme Corp, then a description of “… any time Acme’s CEO engaged in…” will often be too vague for Auto Review, unless you have also included the fact that “Bob Smith is the CEO of Acme Corp” elsewhere in the description. Ask yourself “In the context of only what I’ve written, are any of the terms poorly-defined?” and "Is it unclear what this pronoun's antecedent is?"
- Explanations of the law should not be included.
- Avoid using ambiguous language. For example, asking to “…locate strange or suspicious accounting behaviors” may perform well at locating documents in which there is commentary discussing that an accounting practice was odd or untested. However, you will likely see better performance by being less vague, such as if you can elaborate the description to “…locate strange or suspicious accounting behaviors. A strange or suspicious accounting behavior includes anything that is not in accordance with GAAP. Everything associated with Project Dolphin should be considered a strange or suspicious accounting behavior.”
- Avoid using double negatives. While Auto Review can often correctly interpret these, simple writing provides the best results. Clear writing is important.
Mechanical Formatting
- While bolding, underlining, and italicizing are helpful for a human reviewer, they do not add emphasis for Auto Review. However, they may be helpful for you as you review and potentially revise/refine your tag descriptions.
- When choosing between writing paragraphs and using bulleted lists, choose whichever is more comfortable for you as a human reader. For example, when providing a list of key entities and aliases, a bulleted list might be easier for you to use, while in another section, you may have a list of examples that make sense to include as sentences in a paragraph.
-
When multiple requirements need to be met in order for a document to be responsive to a tag, labeling each of those sections can be helpful. For example,
“Apply this tag only to documents that explicitly show both of the below parts:
- Part 1: The document is about X.
- Part 2: The document shows information about Y.”
- When you have the opportunity to choose between being specific or broad, err towards specificity. If you write something broad such as “documents related to X”, the large language model will sometimes go very broad.
- DISCO currently limits the length of your tag descriptions to a total of 15,000 characters, across up to 10 tags. The interface has a built-in character counter. Write concisely. Instead of simply copy-pasting from a review protocol, it may be beneficial to summarize where possible, and to only include key passages.
- If you copy-paste information from a spreadsheet, such as Google Sheets, Excel, etc., it’s good practice to select and copy the text, rather than to select and copy the cell. Copy-pasting between different programs can sometimes insert additional special characters, such as extra quotation marks. It’s always a good idea to double-check your descriptions before running the Auto Review job.
- Do not attempt to provide “mechanical tagging rules”, such as saying that two tags are mutually-exclusive. This can be better-achieved with searches after the tags have been suggested by Auto Review.
- Auto Review only reviews the TEXT of the document. Images, audio, and video are not reviewed by Auto Review. Asking Auto Review to locate pictures (or audio/video), or the content of pictures (or audio/video), will not be effective. You can see what the text of a specific document is in DISCO by clicking the "Text" tab above the document in the document viewer.
Iterating on tag descriptions to improve them
- Providing specific examples can increase Auto Review’s ability to locate that information. This is particularly helpful when you’ve noticed a trend of a particular kind of document when reviewing false negatives from a sample (i.e., Cecilia says No, but the reviewer says Yes). This is a good way to increase Recall. When providing a list of examples, indicate if it is complete or incomplete. Language such as “…including but not limited to…” is useful for indicating an incomplete list of examples. Limiting language, such as “…only including items from this exhaustive list: oranges, tangerines, or clementines”, is useful for indicating a complete list of examples. The list of examples should provide the actual context, instead of trying to point to another document. Do not say "as indicated in document ID 789" or "such as in document ENRON-004321"--Auto Review does not know the contents of any of your documents, except for the specific document that is being reviewed at the time. Auto Review does not train using your documents; it looks at each document on its own four corners, and cannot refer to other documents.
- Another way to increase Recall is to broaden the scope of the tag description. There are many linguistic ways to do this, such as using language such as “…or including related details even if those details are described differently.”
- When reviewing false positives from a sample (i.e., Cecilia says Yes, but the reviewer says No), review the Suggestion Reasons. This will guide you to why Cecilia suggested the document to be responsive for the tag. This may point you to a nuance that was not seen when the document was first reviewed by a person; this happens often! Or it may help you to understand a nuance that is causing the friction between the tag description and the suggestion on the document, providing you with important information to revise the tag description.
- Narrowing the scope of the tag description can also be useful if you are receiving a large amount of false positives (i.e., Cecilia says Yes, but the reviewer says No). Including limiting language such as “exactly”, “exclusively”, “directly”, or “only” can aid with narrowing the scope.
- Tag suggestions in the UI are only representative of the most recent version of that tag’s description that was run on that document. However, the suggestions for every iteration are maintained in the database, and can be searched for with the search links provided in each job in the Auto Review right-side panel.
Running Auto Review
- Auto Review only reviews the TEXT of the document. Documents without text are not good candidates for Auto Review, and neither are documents with only a tiny amount of text.
- Documents with absolutely no text will automatically be skipped by Auto Review, and will appear in the results as “skipped”. Documents in DISCO with no characters can also be found with the search syntax
textlength(0)
. - We advise manually reviewing “low text” documents. As long as they have some text, Auto Review will review them. However, documents with less than X characters are potentially worth manually reviewing, given the incredibly low amount of text actually involved. We often advise manually reviewing documents with less than 50 characters
textlength(<50)
. Of course, you have the discretion to choose to manually review a larger swath as well.
After Auto Review
- Tag suggestion reasons can be exported to a spreadsheet. This is performed via the Document List Export (DLE) facility on the main Search and Review document list page. Include the columns “suggested as likely” and “suggested as unlikely” in a Custom View, along with any other details you might need such as document IDs, Bates numbers, applied tags, etc., and create a DLE for your desired documents. Each row of the spreadsheet corresponds to a document, and the “suggested” columns will contain the data for the tag name, the suggestion, and the suggestion reason, semicolon-delimited, for every tag that was Auto Reviewed for that document.
- Data from Auto Review will persist for as long as your database persists. “Vaulting” a database will maintain Auto Review details, just like it maintains all other details about the database. “Archiving and deleting” a database will not maintain the Auto Review information--an Archive does not necessarily contain the full information from Auto Review. And if you delete a specific document from the database, that will delete all information about the document, including its Auto Review information.