Top 10 things you need to know about e-discovery

In January 2011, I wrote “Taming e-discovery anxiety.” In the five years since, I have written 29 articles to de-mystify e-discovery and provide a language and common framework to improve general understanding of the basic process and mechanics of e-discovery. Generally, the source of my inspiration was questions from readers and others.

This is my final column focusing solely on e-discovery. So, here are “10 Things about e-discovery” that I have covered in my articles and that I believe every lawyer who exchanges data with another party should know.

1. Encryption

Any time you exchange client data with another person, protect it. At a minimum, protect data with a password that you exchange separately from media. Better yet, encrypt data in transit. TruCrypt may no longer be viable, but there are other cost-effective ways to transfer data securely, including secure file transfer protocol (SFTP). Learn what is available, and use these tools.

2. Legal holds

They consist of two parts: notification and preservation. Sending people a letter telling them not to delete data does not mean the data is preserved. Often, more proactive steps need to be taken, usually by IT or a “records custodian,” to ensure the data will be preserved (i.e., by turning off auto-delete features). But understand the implication of turning off deletion features: It can be costly to clients, or result in poor IT system performance. Legal holds and preservation should be routinely revisited.

3. Custodian interviews

Custodian interviews are critical, so do them often and early. Interview IT personnel to know what information exists and where, and interview individuals. Often, individuals do not know where their system data, such as e-mail, is stored and IT does not know how people store information (or if they have rogue applications in use).

4. DeNISTing during processing:

The National Institute of Standards and Technology is a U.S.-based institution that publishes a quarterly record of all known computer applications (i.e., Microsoft Office Suite). If you collect data, you want to remove these items from your data pool, because you do not want to review a computer application but rather the records it generates. DeNISTing is particularly important when you collect data forensically, because that technique captures the full machine, including computer applications. It is less important to do when only collecting files (although there’s no harm or extra cost in doing this for files, too).

5. Deduplication

This means removing duplicates, but “duplicate” is a technical term of art, and it does not mean a duplicate as recognized by the human eye but as recognized by a computer. Computers “see” duplicates by a calculation known as a hash value, which is based on the properties of a record; duplicates generate the exact same hash value because they are an exact mathematical duplicate. Items humans would consider duplicates, such as a Word document of a letter and a pdf of a letter,  are not mathematic duplicates but “near duplicates.” Near duplicates are identified in e-discovery through a textual comparison (the computer comparing every word of text). Make sure you are asking for the right process when you want to thin out records. Also, ask for either “vertical” deduplication (removing duplicates from within a single custodian) or “horizontal” (or global) deduplication (removing duplicates from across two or more custodians). Most modern e-discovery processing applications can deduplicate horizontally and keep track of which people had the same records!

6. Indexation

Indexation is a precursor to search and you must know and understand your index to achieve an effective search. A computer does not search text; it searches an index (which is like a table of contents of all the words in the corpus, with links pointing to where the word is in the text). A word that is not in the index will not be available in search, even if you can see it on the page. Make sure all words are available for search (for example, that the documents have been OCRed if they are scanned or unsearchable). And know your “stop words” — these are words that are not indexed because they are common or not known to the computer application doing the indexing. Some indexers cannot recognize foreign characters and some omit punctuation or numbers.

7. Metadata

“Data about data” is “extracted” during processing (and text is also extracted into the index). Metadata is used by e-discovery systems to organize data and enable a lot of advanced functionality. While metadata can tell you a lot about the associated data, it is not always conclusive. Generally, metadata can be handed over during production, which means affidavit entries can be prepared using this information.

8. Threading

Threading relates to e-mails only and (using metadata) will pull all e-mails in the same conversation together. Many applications now present this in a schematic, so you can “see” the thread and with a simple click expose the individual e-mails within the conversation. Threading will show if people are added to or dropped off the chain, if there is a change to the subject line or text, or the thread is sent to a new person. It brings e-mail conversations alive.

9. Native production

This is a term that always refers to post-processed data. It means that during production, instead of making a document into a tiff image or pdf, the native document you exchange is in a load file, and is accompanied by extracted text and metadata associated with that record as well as (most importantly) the system-generated control number and the hash value. It does not mean you turn over Word or Excel documents that have no control number. Native productions keep the richness of the original and the control (unique numbering) of processed records.

10. Technology-assisted review

TAR or predictive coding is a spectrum of technologies that use machine learning together with human inputs to speed up the classification of documents. Read my January 2012 column on the topic. This technology is getting better every day (except on Excel spreadsheets) and is now accepted practice according to U.S. decisions. Take the time to schedule a demo. It will change your life, if used properly.

To my readers: Thank you for your questions and suggestions, which were the source of the subjects I tackled. And next month, this column begins a new chapter: technology and the law.

Dera J. Nevin is director of e-discovery at Proskauer Rose LLP. The opinions expressed in this article are her own.