Why the archaic process of eDiscovery is vulnerable to hacking and data breach

And what you can do about it.

Earlier this year, Inside Counsel published an article speculating as to whether e-discovery repositories pose an attractive target to cybercriminals and, given the often chaotic nature of discovery, whether the process itself presents a vulnerable point of data breach.

It would be hard, on either question, to reach an answer other than "yes." Unequivocally, yes.

To the first question - are e-discovery repositories sources of high-value, sensitive data? -- consider the nature of the material that is gathered for the purposes of discovery. At the outset, when data is collected on the client side, it is often the case that this information has only been "cleansed" rudimentarily, perhaps through high-level ECA techniques, broad keyword searches, or some other quick-and-dirty method of stack ranking.

More often than not, though, broad collections sweep up chaff to wheat at a ratio of 10 to 1. And in either scenario, it is always the case that the initial body of material collected in the discovery repository/litigation database, before it is reviewed, includes highly sensitive information that has not been flagged (such as documents subject to attorney-client privilege). As steps are taken to parse the dataset, eventually those confidential materials are removed, such that the recipient of the data -- in theory -- only receives information that is a) responsive to its requests and b) non-privileged. The takeaway here is that discovery document repositories -- unlike, for instance, document productions -- house information that represents a particularly sensitive subset of data -- because it has been hastily identified as potentially relevant to litigation, but not yet purged of confidential material.

The graphic below addresses the second, and perhaps more worrisome question: is the process of discovery ripe for breach?

The red locks represent risk points as data is passed from the client to its external providers (law firms, service vendors) and then, ultimately, to requesting parties (e.g. adversarial litigants). All of these exchanges share a common source of risk: data is moving and, consequently, is at its most vulnerable. While some organizations have taken steps to address these risks, such as providing access to secure FTP portals or shipping data only on encrypted media, those measures only go so far. An equally problematic issue is the copying and share of data that has been removed from those secure channels. In the course of discovery, this happens all the time -- think about how attorneys may email documents back and forth as a means to share notes and collaborate. And, finally, when a production is shared with requesting parties, that information becomes subject to the other side's security protocols, whatever they happen to be (or not be).

With all this as context, presented below are the full, unedited responses to the questions that spawned the Inside Counsel article. Lael Andara is a highly regarded litigation partner at Ropers Majeski in Silicon Valley, where he focuses on technology and IP cases. Andy Wilson is the CEO and co-founder of Logikcull.com.

Why haven't hackers begun hacking into e-discovery document repositories?

Andy Wilson: While it may not be the case that cybercriminals have focused specifically on e-discovery databases, it is certainly likely that the materials they've accessed in recent high-profile data breaches are being handled for the purposes of discovery. Law firms and legal services providers, by their nature, handle very sensitive data belonging to their clients. In a sense, those firms act as clearinghouses for all their customers' most valuable information, because it is often the case that this data is relevant to litigation, investigations or other disputes.

So getting back to the idea of "e-discovery document repositories": that's the problem. In the course of discovery, data goes everywhere. The client has to collect data and send it to its outside counsel, who is sending that data to vendors and other colleagues throughout the firm. Then, ultimately, discovery materials are produced to opposing parties, and the whole process starts up again. It's an incredibly risky process because, often, that information is sent through insecure channels, such as unencrypted email, file sharing services and via physical media, like DVDs or hard drives. All of those channels expose information to breach.

Lael Andara: The reality is this is already happening. We just haven't necessarily identified the hacks. The other issue is that a lot of these hacks gather information [the hacker] may not directly know the benefit of. This creates a separate effect where data is gathered and then taken to the black market where people who can actually capitalize on the information obtain it.

"THE REALITY IS THAT HACKERS ARE ALREADY [ACCESSING] E-DISCOVERY DATABASES."

Why are e-discovery document repositories so valuable?

AW: The entire purpose of e-discovery is to gather information potentially relevant to the dispute or investigation at hand, identify the documents that are relevant to the requests of opposing parties, and remove the information that is confidential or otherwise protected. It is usually the case that parties use e-discovery or legal intelligence platforms to analyze and review that information, so they can determine which materials are sensitive and which much be produced. So, those tools act as repositories for all of that valuable information.

LA: The very nature of litigation requires us to get to the most valuable assets of the companies that are in dispute. Think of it as mining for gold. Business data represents piles of paydirt that is yet to be processed, and law firms are the sluice boxes that sift through the business data and pulls out the gold nuggets. The irony is that those piles of paydirt (business data) typically are subject to better security than the law firm's sluice box (i.e. client's have better security measures in place than their outside law firms). In recent years, we have seen a trend in Silicon Valley for the data to be housed within the firewalls of the client, requiring the attorneys to modify their workflow to maintain security.

The other issue that is often overlooked is the reality that this is not a typical hacking where there are "invaders at the gate." Often times, [sharing documents with your adversary in litigation] represents an enemy "behind the gate," and taking appropriate safeguards is imperative. Structuring a protective order that includes encryption and other safeguards to maintain proprietary business data will be the norm in the next few years. Encryption should be directly addressed in the protective order along with the logging of who has access to the data.

Of course, the reality is that this is little comfort given the incidence of adversaries violating protective orders and using the proprietary data in a way that violates the protective order. Often times the sanctions are minimal at best, and of little comfort to businesses trying to maintain security over their data.

Is e-discovery the next frontier for cybercrime?

AW: It very well may be. Security experts are quick to point out that data is most vulnerable when it is in motion. Well, discovery is a process of motion, where sensitive materials are gathered together quickly, often hastily, from all kinds of different repositories and locations, and ultimately shared with requesting parties with few safeguards. It's not unusual for parties to share and exchange this information through the insecure means that I mentioned (DVDs, email, etc.). And, often, the e-discovery tools themselves lack appropriate security safeguards and, by and large, do not encrypt data that is stored at rest. All of this leaves valuable information exposed to breach, whether it's from loss or theft.

"SECURITY EXPERTS ARE QUICK TO POINT OUT THAT DATA IS MOST VULNERABLE WHEN IT IS IN MOTION. WELL, DISCOVERY IS A PROCESS OF MOTION."

LA: As long as law firms are on networks, we can anticipate these types of cyber crimes. Also, the value of this data creates two potential risks: the first that proprietary data will be stolen to obtain an unfair business advantage, and the second that it the data will be held hostage by ransomware. The use of ransomware does not necessarily indicate that the data has left the organization or will be used or sold in the open market, but rather that (possessors of the data) have been locked out it and must pay a ransom in order to re-access it. It is not uncommon for entities to maintain a balance of Bitcoin in the event that they are hit with a ransomware attack.

What steps can be taken to safeguard data in the context of discovery?

AW: We've discussed how discovery is a process where data is shared widely, with many parties, in often insecure fashion. It's not unusual, for instance, for corporate counsel to mail hard drives across the country to get data to their law firm counsel. That's absurd, and risky. It's imperative both to limit the amount of times that data is shared or "touched" in the course of discovery, and to make sure that data is encrypted at all times. The most secure e-discovery or legal intelligence platforms, then, are the ones that eliminate the risk inherent to discovery by providing one central hub where all data is securely hosted and all channels in and out of the database are secure. When data is in the platform, it must be encrypted at rest. And when it is shared with opposing parties, it should be shared through encrypted channels -- ideally a secure, permissions-based link whereby requesting parties can access that data remotely and instantly.

LA: Sometimes you need to admit that your clients' security measures are superior to the law firm's, and maintaining data within the context of their security [framework] is often the most efficient approach. You also need to make sure you meet the same security standards as your client, or identify potential cloud service providers or vendors that maintain the same if not better security measures. To be sure, not all business data requires this level of protection, nor should all data be treated equally, given the significant costs of maintaining higher security levels.

In my experience, there have been situations where it was just not cost-effective to create a security infrastructure to support the volume of data. In the alternative, we created a "clean room" to allow opposing parties to come and review the data under very controlled circumstances that monitored and inventoried what was reviewed and what was requested to be copied. In the several circumstances where we use this approach, the amount of data that was sought to be used in the litigation was a small fraction of the overall data at issue. This approach also tends to expedite the process, as an official review of data in a clean room requires an organized and strategic approach to get through the data in a set amount of time.

How has hacking changed over the last year five years?

LA: It's on the rise and approaching the norm. The fact of the matter is most of the public has been desensitized to cyberattacks because they have become a daily occurrence. The question is not whether or not you've been hacked, the question is, do you know when and to what extent you've been hacked? It's imperative that we acknowledge that there is no such thing as perfect security, and much like in discovery, the standard is reasonableness.

"HACKING [IN LITIGATION] IS ON THE RISE AND APPROACHING THE NORM."

To be confident in the practice of law, you must take minimum measures to secure your clients' confidences, which are often included in the business data implicated in litigation.

To learn more about how to secure data subject to discovery, check out our whitepaper below, or request a consultation with a Logikcull expert.

‍