The world is witnessing an influx of AI disruptors like ChatGPT and DALL-E. These tools can be fun to mess around with – Chat GPT writes some pretty solid knock-knock jokes, and DALL-E’s images can be as entertaining as they are nightmare-inducing. But they will also have very real impacts on the world at large, and eDiscovery in particular.
Machine learning (ML) algorithms have impacted a wide array of industries, from entertainment to surgical care, and its adoption has more than doubled in the last five years. The industries that stand to benefit the most are ones with large data banks, like financial services, healthcare, and – surprise, surprise – the legal sector.
A significant advantage of ML is that it lets organizations automate time-consuming tasks like sifting enormous amounts of data so employees can focus on their real business instead. eDiscovery is well-suited to harness this power. As AI technologies continue to evolve, eDiscovery tools will start incorporating these advancements into their software. Some eDiscovery platforms, such as Logikcull, are ahead of the game in integrating AI technology into their systems.
What is Generative AI?
Generative AI is a form of artificial intelligence that’s capable of creating new outputs based on the data used to train it.
You’ve probably already seen instances of generative AI in ad campaigns. Squarespace's Super Bowl ad featured Adam Driver being cloned in a "journey of truth" to illustrate how the company will leverage generative AI to create new websites, while Hellman's ad seemed to have shrunk people and placed them inside a refrigerator. Seems a little unsanitary, but you do you, Hellman’s.
How will generative AI challenge the eDiscovery industry?
The use of AI in eDiscovery definitely has advantages: Its potential to automate manual work is seemingly infinite. But there are some concerns that shouldn’t be overlooked, like managing bot-generated content, deepfakes, and the growing deluge of data being produced.
What you talkin’ bot, Willis?
ChatGPT is an excellent example of the popularity of bot-generated content applications. OpenAI released a trial version of ChatGPT in November 2022 for public testing. Within five days, over a million people had signed up. By the end of January, it had 100 million active users. This pretty much guarantees that this technology is here to stay.
The eDiscovery challenge around bot-generated content is precisely what you're thinking – is it real or machine-generated?
Will witnesses be able to say, "That wasn't me; that was ChatGPT"? This challenge will only become harder as technology gets better and better at imitating human speech. ChatGPT hasn’t passed the Turing Test just yet, but it may not be long before it does. eDiscovery tools will need to be adapted to find and tag bot-generated content.
Deepfaking it
Deepfakes are another challenge for eDiscovery. You've probably seen a deepfake video without even realizing it (Remember "THIS IS NOT MORGAN FREEMAN"?) But the question is, would you recognize one during your eDiscovery review? More importantly – could your eDiscovery software recognize it?
Deepfakes are manipulated media files, easily made and shared. First appearing in 2017, they were initially used to create lookalike videos, but now they can generate highly convincing fictitious photos and audio recordings too.
Authenticating these clones is tricky, and this uncertainty can have harmful consequences. It poses a challenge not just for legal professionals who may encounter deepfakes during their discovery procedures, but also for eDiscovery tools that must adjust to detect them.
Exploding data volumes
Let's chat a minute about the elephant in the room – the global datasphere. The term "datasphere" may be new to some, as may be the word "zettabyte," but we'll likely hear both terms a lot soon. Simply put, the global datasphere is the total amount of digital data generated worldwide. This includes newly created, captured, and duplicated data, all combined into one entity.
Data volumes are exploding and experts estimate data creation will exceed 180 zettabytes globally by 2025. In case you’re wondering, a zettabyte is one sextillion or 1,000,000,000,000,000,000,000 bytes. Try saying that without taking a pause to breathe.
More data means more time to sort and sift it.
According to a Splunk survey, a whopping 47% of respondents said their organization would fall behind with the rapid growth in data volumes. To handle the explosion of data, it's crucial to have access to modern eDiscovery tools equipped with features such as automated data sorting, filters, and customizable tags.
Storage wars
Data storage is another critical — and ridiculously expensive — issue. With skyrocketing data volumes, on-premises storage solutions can't efficiently or securely manage it. Cloud storage is the more viable solution, but some hosting providers charge based on the amount of stored data. So the more data you have, the higher your costs. Which is just great when you know you’re accumulating unprecedented amounts of data.
If we may be so bold as to offer a solution, Logikcull provides cloud-based storage for all your eDiscovery documents but only charges you for the documents you process. You can upload as much data as you'd like into a secure environment, and return later to process it.
How will generative AI benefit the eDiscovery industry?
Document review costs comprise the bulk of eDiscovery costs, accounting for an average of 70% of total discovery spend. Generative AI can help slash eDiscovery costs by automating tasks that normally require human review. This, in turn, mitigates the risk of errors and inconsistencies in the review process.
AI eDiscovery features, like auto-tagging documents that contain PII, can instantly reduce the number of documents requiring review. Further culling by date ranges, custodians, and keywords can make your review set much smaller, with document production sizes dropping by up to 90% before being sent to outside counsel.
The latest use of eDiscovery AI leverages it for improved document analysis, enabling the identification of patterns and relevant keywords. This makes it easier to search and identify relevant documents.
ChatGPT is also revolutionizing the way people enter prompts. As users get used to being able to enter conversational prompts and searches, discovery tools will have to adapt to these natural language search queries and prompt-based searches. For example, a user might type in, "Find all emails from Mary pertaining to topic Y from company X and tag them all with tag Z.” In other words, tools need to let people search the way they talk.
While it's important to understand its challenges, the applications of generative AI in eDiscovery can turn the industry upside down, opening the door for a new discovery landscape no one could have ever imagined.
-
If you’d like to learn how Logikcull harnesses AI to handle your eDiscovery and data preservation needs, book a quick demo to learn more today.