Legal document classification

Law firm
Law Firm


Document classification based on legal types of cases for internal usage by certain law firm. Due to the huge workload & manual work being done to separate case files based on their content into physical areas has been cumbersome. Many law firms have switched to online containers for their storage which is convenient but still needs some manual work to be done. To avoid reading the details & being billed many hours for it, we proposed an alternative to this & implemented ML model which based on documents will classify it.


  • NLP
  • Machine Learning


  • Python

Key Technical Challenges:

  • Text preprocessing which contains all meaningful information about the document data.
  • Less tagged data compare to untagged to predict labels.
  • Fetching a proper set of words for topic modeling of documents.

Business + Technical Points:

  • Build a classification system for tagged (5488) documents and get proper topics for untagged (25000) documents data.
  • Proper security measures to be held following GDPR for data.


  • We are able to build a system of classification for documents.