Legalis

python
university
outcome prediction
random forest
bert
  • University project focused on court case prediction
  • Utilizes heavily processed German court case data
  • Prediction models include Random Forest and BERT

About the Project

Legalis is a university project for a machine learning and data science course. I wrote a paper at the University of Oslo about court case outcome prediction, and this is the continuation of that project.

I'm using bulk data from openlegaldata, which included about 250,000 cases, of which approximately 38,000 are usable for my purposes. Based on this data, I trained a random forest classifier to predict the outcome of court cases.

In the end, I achieved about 60% accuracy, which is not great but also not bad for this kind of problem.

Features

Legalis offers several key features aimed at improving the prediction of court case outcomes:

  • Data Processing: Utilizes extensive preprocessing of German court case data.
  • Prediction Models: Employs Random Forest and BERT models for outcome prediction.
  • Model Training: Trained on a substantial dataset to enhance prediction accuracy.
  • Outcome Extraction: Uses ChatGPT to extract binary labels from case texts.

Technology & Tools

Pythonscikit-learnTransformersChat GPT

I heavily rely on 🤗 Huggingface features for my machine learning projects, especially for hosting datasets, models, and apps/spaces.

For prediction, I have trained and optimized a Random Forest classifier, a Naive Bayes classifier, and a BERT model for text classification.

I used ChatGPT to extract the outcome as a binary label for 2800 cases and trained the models on that. It works great for extracting specific information from longer texts (if you're willing to pay or have short texts).

Future Plans

I plan to update this project to use a LLaMA2 or Mistral-based German language model for classification, as the BERT performance was already promising.