This feature was added to all databases on April 16, 2024, and was available to all new databases beginning September 26, 2023.
DISCO has released a new suite of features, Chat Streams. Slack is the first type of data that is supported within Chat Streams. The new features allow for data exported from Slack's built-in export functionality to be directly ingested into DISCO. Alongside the simple user-level ingestion capability, we've also included powerful deduplication logic, standardized metadata fields for use in search, review, and production, and simplified navigation tools for moving within a Slack channel or a set of direct messages (DMs).
Ingesting an export from Slack into DISCO
DISCO's Chat Streams feature first presents itself on the ingest page. Select "New ingest", then "Slack" from the "Cloud service" category.
Select "Exported ZIP file", and you'll then be taken to your familiar DISCO ingest options, where you'll name your ingest session, and choose whether to send your data directly to active review or into ECA. On the File selection page, select the ZIP file which you've downloaded from Slack. DISCO's Chat Streams feature supports the direct export from Slack, unmodified. It will only process channel and direct message data.
Note: Update for October 2, 2023: The Enterprise Grid version of Slack is supported by Chat Streams. The latest version of DISCO's High Speed Uploader is required for support.
Next, on the Ingest options page, you'll again find the familiar options, plus one new option: "Download Slack attachments". This option is selected by default. The ZIP file downloaded from Slack contains the message content and metadata from Slack, but it does NOT contain attachment files that were added to the messages. This includes items such as pictures, PDFs, Office files, etc. All of those files are stored in a separate location by Slack. Keeping this option selected allows DISCO's software to retrieve those attachments and properly include them as family member documents with your chat documents.
Note: DISCO strongly advises keeping the "Download slack attachments" option selected. If it is de-selected, then the contents of attached files will not be in your DISCO database, and will thus not be able to be searched, reviewed, or produced.
Note: The billed size of the ingest depends on two items: (i) the size of the ZIP file contents from Slack, and (ii) the size of the attachments downloaded. The first item is something you'll easily know ahead of time, because it is simply the file size from the ZIP file. However, the second item is not something you'll know ahead of time, because DISCO's processing of the files in the ZIP initially obtained from Slack is what determines which attachment files will be further downloaded from Slack. The size of those attachment files is not known until they are obtained.
For more details on collecting your data from Slack and ingesting it into DISCO, please also refer to this article.
How is information broken up into documents? And what relationships are applied between documents?
The processing of your documents will appear to occur much the same as any other ingest into DISCO. However, in the background, DISCO is also collecting the needed attachment files from your Slack server. Additionally, DISCO is taking a small number of files that you downloaded from Slack and turning those into many chat documents in DISCO, with a new document for each day of messages from each channel or each set of direct/group messages.
Each chat document in DISCO represents a single calendar day of messages from a single channel or set of direct (or group) messages. That calendar day begins at midnight of the time zone to which your DISCO database was set. This means that the starting time and ending time of each chat document properly aligns with the times the emails in your database were normalized at, providing consistency within your database.
Additionally, each Slack thread (i.e., an inline reply to a Slack message plus any further messages in that same direction, as opposed to another message in the main channel) is broken into its own documents in DISCO, and each document represents a single calendar day of messages in that thread.
Finally, attachment files in a channel/set of DMs are added as family member children documents to the chat document containing that day's messages. This is akin to files attached to an email message.
In the above example, there is a chat document containing messages from a channel on March 9. The channel's main messages on March 9 began at 4:27pm, and ended at 4:31pm. However, people also made threaded replies to some of those messages. On April 6, a threaded reply was posted ay 7:49am to a message from March 9, creating the first Slack thread (that document also has a file attached to it, and has one DISCO tag applied to it, as shown with the paperclip and the tag icons). That same thread then continued on April 14, and again on June 21.
Next, a different message from March 9 was replied to on June 13, creating a second Slack thread.
Finally, on March 16, a new message was sent in the main channel area at 2:34pm.
Slack threads can be understood with the help of the indentation levels displayed in DISCO's conversation browser. The messages in the primary channel level are all at one indentation level. Any threads created by someone replying to those messages will receive an additional indent for the first document in that thread. And any additional documents in that thread will be at a third indentation level. This is as deep as the indentation levels can go, because Slack does not allow someone to create thread replies within another thread reply.
Tip: Each entire channel, or set of DMs, or set of group DMs is associated using DISCO's conversation grouping mechanism (the same as with a chain of emails). This means that if you locate a document of interest, and want to include its entire channel of messages in a folder, review stage, etc., then you can use the "apply changes to related documents" functionality throughout DISCO for the "conversation" of documents. You can also sort the chat documents by their conversation ID or by their conversation date.
Deduplication
DISCO's Chat Streams for Slack includes a set of advanced deduplication features. These features are designed to help prevent conversations from appearing in your database more than once.
Of course, within a single export from Slack, there won't be any duplicative data. Each channel or set of direct messages only appears once, and attachments appear in the context of their messages. What has historically been challenging is the situation in which data is collected from Slack on multiple occasions. For example, a second collection of data from Slack might include overlapping time frames with the first collection. Or a second collection might include additional messages in a Slack thread. What should happen if the name of a channel is changed between the first and second collections? Or what if a message is edited in that intervening time?
The "simple" way to address questions like these is to create new documents for everything without considering if that information already had been ingested into your database. But that could force a large amount of duplicative review and production of documents.
The better way is to deduplicate data when it is exactly the same. But that's often easier said than done, especially when "exactly the same" needs to be defined, because not every change is necessarily important enough to necessitate reviewing the same data multiple times. For Slack data, DISCO's Chat Streams uses the below guidance when determining if data should deduplicate or not.
Of course, you can also always opt to disable deduplication for your ingest. But most people, after having collected and reviewed 700 days of chat messages from a channel, and then recollecting the channel with another 30 days worth of messages, probably are hoping to only read another 30 days of messages, even if a new user joined a new channel in those 30 days. Rereading the first 700 days simply because one more person had access to the channel is usually not an efficient use of time. And on the occasion where it might be important to comb through the channel again in that new context, those messages are still available for review--they just don't have duplicative copies.
Metadata Fields
With Chat Streams, DISCO has introduced a cadre of new fields for metadata. These new metadata fields mostly follow the same pattern as email metadata fields. These new fields are found in all of the expected places: they're in the search builder, in the filters panel, available for custom columns in the document list, available for redaction in the document viewer's metadata redaction panel, and available to include as metadata in a production.
Before examining each of the new fields, it's worth first noting that "chat" is a new document type in DISCO. This document type includes all of the documents containing chat messages themselves. It is NOT applied to their attached files, which are classified according to their respective file types (e.g., PDF, Word, Video, etc.).
1. Chat Message Type: This classifies Slack chat messages into exactly one of four categories: direct message, group direct message, private channel, and public channel.
2. Chat Channel Name: For private channel and public channel chat documents, it is the name of the channel as it existed in Slack. For group direct message chat documents, it is the abbreviation "mpdm" (multi-person group message), followed by a list of the Slack users included. For direct messages, it is the phrase "direct message" followed by a unique value used by Slack.
3. Chat Participant: For a chat document, this includes all Slack users who were members of the channel, the sender of a chat message, and anyone to whom a message was sent.
4. Chat Participant Count: The unique count of participants of a chat message.
5. Chat Sender: The sender(s) of messages in a chat document. This is specific to each chat review document. This is similar to an email sender, but can have more than one value, because multiple Slack users will often send messages in a channel during a single day.
6. Chat Recipient: For a chat document, this includes all Slack users who were members of the channel, and anyone to whom a chat message was sent.
7. Chat Recipient Count: The unique count of recipients of a chat message.
8. Chat Send Date: The date a chat message was sent. For documents containing multiple messages, this is the date of the first message. Note: the date of the last message (or edit/delete activity) on a chat doc is stored as the last modified date in DISCO.
Note: If Chat Participant or Chat Recipient values exceed 2000 values, additional values will be truncated. This truncation will be noted in your ingest reports.
Chat document splitting and nesting
Each chat document in DISCO's Chat Streams represents all of the messages from a 24-hour period from within a single channel, direct message set, or group message set. However, Slack is not a linear messaging application. A message can be replied to in a Slack thread, and that reply could happen many days later than the original message was sent.
Each thread in Slack will also create its own set of chat documents, split into 24-hour calendar day documents. Consider the following channel messages:
1 Alice (2:45pm, Jan 22): Knock knock.
2 Bob (2:48pm, Jan 22): Who's there?
- 3 Charlie (thread reply to Bob's 2:48 message, 2:51pm, Jan 22): Bob, you know better than to play this game.
- 4 Bob (thread reply to Bob's 2:48 message, 2:54pm, Jan 22): Sigh, yeah, I know.
- 5 Bob (thread reply to Bob's 2:48 message, 1:15pm, Jan 24): Funny story. They were actually outside of my house, but I'm away on a trip. And yes, it actually was very cold outside.
6 Alice (2:50pm, Jan 22): Lettuce.
7 Bob (2:51pm, Jan 22): Lettuce who?
8 Alice (2:53pm, Jan 22): Lettuce in, it's cold out here.
9 Bob (2:55pm, Jan 22): You can stay outside. That joke was terrible. Leaf me alone.
10 Bob (1:01pm, Jan 24): Alice, I'm soooooo sorry!
- 11 Alice (thread reply to Bob's 1:01 message, 1:14pm, Jan 24): #coldshoulder
How many documents will there be? Note that there's a message thread, messages 3, 4, and 5, which would appear above message 6 when viewed in Slack itself, but which are temporally interwoven with messages 6-11.
Chat Streams will create a total of five documents for the above example. Listed purely chronologically, using the date/time of the first message in the document, they are the following:
A The first document will consist of everything from the main channel on Jan 22 (1, 2, 6-9).
B The second document will consist of everything from the first day (Jan 22) of the first reply thread (3, 4).
C The third document will consist of everything from the main channel on Jan 24 (10).
D The fourth document will consist of everything from the first day (Jan 24) of the second reply thread (11).
E Finally, the fifth document will consist of everything from the next day (Jan 24) of the first reply thread (5).
Of course, those documents can be ordered in several different ways in DISCO. If you order them by their Chat Sent Dates, they'll appear in the order provided above. But you can also use DISCO's conversation browser to see them in a nested relationship, which some people prefer versus the semi-chronological ordering above. That nested view places them into the following structure:
A (indent level 1)
B (indent level 2)
E (indent level 3)
C (indent level 1)
D (indent level 2)
Documents A and C are both filled with messages from the main channel, so they are placed at the first indent level. Documents B and D both contain the first reply messages of their respective threads, so they are nested at the second level. And Document E is a continuation of the thread started in B, so it lives in the third level of nesting. If the B thread were to continue onto additional calendar days, then each day's worth of messages would receive its own document, and those would also be nested into the third level.
Visual cues in the document's near native PDF are important in order to tell a reviewer that there's a thread branching from a message. In the below example, observe the "9 replies from 3 people" indicator, along with the nested documents in the conversation browser in the right panel's related documents section.
Exporting and Producing
Natives
For a chat document, there's not truly such a thing as a "native original" document. Each chat document in DISCO is derived from a series of files downloaded from Slack, which are interpreted, combined, and sliced to create each channel's day's document of messages.
As such, if you select to download or produce the native version for a chat document in DISCO, the native version is simply a copy of the near-native PDF.
Metadata
The new chat metadata fields are immediately ready to be added into your metadata DAT file in your Production. These fields will properly contain any metadata redaction work you performed during your document review, and they will also conform to any date formatting standards you've applied in the production settings.