When a document is ingested and language identification is enabled, its primary language is identified. DISCO databases created prior to 1/16/2025 support eight languages (English, German, French, Portuguese, Spanish, Chinese, Japanese, and Korean).
For DISCO databases created after 1/15/2025, 82 languages can be identified as a primary language.
If a document has too few characters or is primarily in an unsupported language, the language will be identified as "undetermined".
A Primary Language column can be displayed by creating a custom column and selecting the Primary Language option under Metadata.
You can also search for documents based on language, using the primaryLanguage(Language) search string. See Searching in DISCO (best practices) for more information.
Supported languages for databases created after 1/15/2025
Afrikaans
Albanian
Arabic
Armenian
Azerbaijani
Basque
Belarusian
Bengali
Bihari
Bulgarian
Catalan
Cebuano
Cherokee
Croatian
Czech
Chinese
Danish
Dhivehi
Dutch
English
Estonian
Finnish
French
Galician
Ganda
Georgian
German
Greek
Gujarati
Haitian_Creole
Hebrew
Hindi
Hmong
Hungarian
Icelandic
Indonesian
Inuktitut
Irish
Italian
Javanese
Japanese
Kannada
Khmer
Kinyarwanda
Korean
Laothian
Latvian
Limbu
Lithuanian
Macedonian
Malay
Malayalam
Maltese
Marathi
Nepali
Norwegian
Oriya
Persian
Polish
Portuguese
Punjabi
Romanian
Russian
Scots_Gaelic
Serbian
Sinhalese
Slovak
Slovenian
Spanish
Swahili
Swedish
Syriac
Tagalog
Tamil
Telugu
Thai
Turkish
Ukrainian
Urdu
Vietnamese
Welsh
Yiddish