Natural Language Processing: Artificial Intelligence Applications in eDiscovery

The next time you start to type a Google search, text message, or email, pay attention to the suggestions that your computer or phone starts throwing at you. Whether you knew it or not, that’s artificial intelligence (AI) at work. Specifically, it’s a type of AI known as natural language processing (NLP).

The truth is that you’re using or benefitting from NLP all the time, in subtle and streamlined applications that blend into your workflow effortlessly. (If you doubt this, count how many times you ask Alexa or Siri to do something for you in a typical day: these and other voice-recognition systems rely on NLP to decipher your requests and determine the correct action for following up on them.)

NLP is also improving eDiscovery through applications like concept clustering. Let’s take a closer look at how NLP works and how you might be using it without even realizing it.

Natural Language Processing: What It Is and How It Works

NLP uses machine learning, a type of AI, to process and analyze human language. As it analyzes text, its algorithms decipher contextual patterns and analyze semantics so that they can determine what words and sentences mean, what various parts of speech are, and what other concepts those words match up with. Like all forms of AI, NLP needs a huge volume of data to crunch through so that it can start to recognize which words appear together and in what context they appear—both of which allow it to assign categories of meaning.

This is not an overnight process, and NLP isn’t perfect. After all, language is tremendously complex. While some forms of legal documents are more straightforward—chances are that your contracts don’t have sarcastic clauses that mean the opposite of what they say—eDiscovery software must deal with all the various forms of human communication that occur, in all of their casual, shorthand, and even contradictory glory. (Does “shut up” mean “stop talking,” or does it mean “no way!” And then what, exactly, does “no way” mean?)

NLP Applications in eDiscovery

eDiscovery uses NLP for a variety of information retrieval and information extraction tasks. It can help sort through reams of documents, recognizing relevant terms and suggesting additional keywords that co-occur with your search terms. In this sense, it can act as a digital “highlighter,” drawing an attorney’s attention to specific documents and sections that may be interesting. NLP applications can also extract dates, party or custodian names, or other specific details from a large volume of data, again focusing attorney attention where it will be most useful.

Many common eDiscovery applications rely on NLP, at least in part. For example, concept clustering uses language analysis to determine which words go together, grouping similar documents together so that a single reviewer can consider them at once. Email threading recognizes continuing conversations and pulls them together into one coordinated view, eliminating redundant or inconsistent review. Similarly, near-deduplication extends beyond standard deduplication, recognizing when documents are substantially similar—such as iterative drafts of a single final document—and sorting them together. Predictive coding in review also uses NLP to spot word associations and key phrases.

All of these applications make eDiscovery faster and less painful. Note also that these rely on “shallow” or “statistical” processing techniques: they don’t genuinely understand the underlying words or concepts, but these systems have learned that those words go together or relate to each other.

Try This at Home

Pay attention to some of the applications of NLP that you’ve probably been using every day, such as word suggestions in text and email messages and autofill Google searches. Think about what contextual cues you’re providing that are helping your technology anticipate what you want (or what cues it missed when it guessed wrong!).

If you’re ready to incorporate more artificial intelligence into your eDiscovery processes, we can help. Our best-in-class software solutions include the industry-leading Invariant software, now incorporated within Relativity, and Lumix, the only processing technology on the market that has the speed, accuracy, and flexibility to crunch terabytes of complex data in a matter of days. Contact us to learn more.

