ML pipeline classifying 14 contract types with 99.1% accuracy for a legal tech company — replacing a manual review process.
A legal tech company was manually triaging incoming contracts — categorizing NDAs, MSAs, SOWs, employment agreements, etc. before routing them to the right review team. Two paralegals spent 3 hours per day doing nothing but reading the first page of a document and deciding which bucket it went in.
14 categories, 500+ documents per day, nearly zero ambiguity. A classic classification task.
A fine-tuned text classification pipeline using legal-bert-base-uncased (a BERT variant pre-trained on legal text from Hugging Face):
legal-bert-base-uncased from Hugging FaceFastAPI service wrapping the model:
For classification, a discriminative model (BERT) trained on the specific task consistently outperforms a generative model (GPT) prompted for the same task — especially when you have labeled training data. GPT prompting got 94% on this task; fine-tuned BERT got 99.1%. The extra 5% is worth it when errors mean misrouted legal documents.
30 minutes, free, no deck. We'll figure out if I'm the right fit for your project.