How can you depend on machines to do the heavy lifting when you also need a lightweight approach?

The challenge

The Court Administration of Latvia process over 30,000 case documents each year. Legal documents regarding individual court disputes are anonymized and then published for reference by the public. This was a time-consuming manual process. Court secretaries would draft documents and upload the files. Then someone else had to review each one to anonymize key information before publication.

Data showed it took about eight seconds to anonymize a single sentence. So, the average document took between 15-20 minutes to process manually. This equated to approximately 1000 work days per year. The Court Administration of Latvia not only wanted to make this anonymization process faster and free up people’s time. They also wanted to maintain the reliability of the information given to the public.

However, the anonymization of sensitive data is more complex than simply replacing text. In a typical court dispute, there are multiple elements involved. Each of these – like names, license plate numbers, addresses and account numbers – requires a consistent and contextual identifier. Meaning someone identified as Person A or an account labelled Account 1 must remain as Person A or Account 1 every time they are referenced in the published court document. The Court Administration of Latvia wanted to make this a lighter touch, streamlined approach that would improve productivity.

One potential solution would have been to use standard machine learning. To label the individual identifiers. To establish correlations between information within the documents. And to apply different rules to sets of information. Yet machine learning on its own is typically used to analyze huge amounts of general data. In this case, the Court Administration of Latvia needed to apply anonymization principles based on legal rules to small amounts of high-quality data that would change from case to case.

That’s why we proposed incorporating machine teaching as well.

Approach

Machine teaching is like training a new colleague. You show the algorithm what to look for and then it can go off and replace real names and numbers with anonymized labels. After doing this several times, the system starts to recognize the information that needs to be anonymized, label it and then find other instances that correlate with that label in new text. However, it still needs support to spot new types of information because the data is different each time.

Our engagement involved several crucial steps:

  • Technology assessment – Understanding existing systems to determine the most appropriate solution and integration.
  • User interface development – Creating an intuitive way for court admin staff to label data for machine teaching.
  • Machine teaching – Starting with Emergn subject matter experts training the system to recognize and replace specific entities according to court rules.
  • Model development – Introducing machine learning models for named entity recognition and labeling.
  • System integration – Designing the host application to work alongside the existing platform for document publication.
  • Change facilitation – Bringing in change agents who could train internal users on how to adopt machine teaching techniques and start using them by themselves.

At a technical level, we used Python to build the machine teaching model and JavaScript for the application. These are common languages with vast libraries of existing code. This sped up the development process and meant we could launch quickly. And by using containers to host the operating system, software and databases, we were able to roll out a lightweight application that could work with any existing IT infrastructure and be transferred to the cloud easily.

To ensure the machine learning and machine teaching parts of the process continued to produce high-quality anonymization, our team spent time ‘training the trainers’. That way, the Court Administration of Latvia can own their transformation by knowing how to prompt the technology to produce the right anonymized output regardless of what data is entered.

Our impact

The Court Administration of Latvia needed a faster and future-proofed approach to anonymizing court documents for publication. By optimizing the entire anonymization process, we improved productivity among employees. We also created an intuitive system that the courts could continue to use long after our engagement was over.

Machine training provided the highly efficient approach the Court Administration of Latvia required. Through containerization, we created a lightweight and scalable application that could be used on any IT infrastructure. And while much of the heavy lifting is now performed by the customized machine learning and machine teaching application, the new process still involves human intervention so there is no loss of control or quality assurance.

This engagement involved collaboration between our team and the court’s administration. With individuals from both organizations focused on labeling and teaching the system to recognize entities.

The final application was custom-made for the Latvian legal system’s rules. The machine learning element recognizes and replaces specific information while preserving contextual meaning. The machine teaching element helps the courts label and anonymize new information.

As part of our commitment to leaving lasting capabilities, we also educated the court administration about language models and the methodology used for machine teaching. The court administration’s role changed from standard users to trainers with a deeper understanding of the technology and how to control the system. Change agents, including those from our team, played a pivotal role. By facilitating this knowledge transfer and helping the court administration build the necessary skills to apply machine learning and machine teaching to this critical workflow.

20x faster

with the average time required per document down from 20 minutes to less than one minute

80-90%

of documents require no manual changes as the introduction of machine learning models has improved the accuracy of anonymized documents

1000 days

saved per year, allowing resources to be deployed more effectively and the courts to do more with less

 

Let’s have a conversation

"*" indicates required fields

Name*
Marketing preferences