INVESTIGADORES
PULIDO Manuel Arturo
artículos
Título:
impact of LLaMA fine tuning on hallucinations for name entity extraction in legal documents
Autor/es:
VARGAS, FRANCISCO; GONZÁLEZ COENE, ALEJANDRO; ESCALANTE, GASTON; LOBÓN, EXEQUIEL; PULIDO, MANUEL
Revista:
SADIO Electronic Journal of Informatic and Operation Research
Editorial:
SOCIEDAD ARGENTINA DE INFORMÁTICA E INVESTIGACIÓN OPERATIVA
Referencias:
Año: 2025 vol. 24
ISSN:
1514-6774
Resumen:
The extraction of information about traffic accidents fromlegal documents is crucial for quantifying insurance company costs. Ex-tracting entities such as percentages of physical and/or psychologicaldisability and the involved compensation amounts is a challenging pro-cess, even for experts, due to the subtle arguments and reasoning in thecourt decision. A two-step procedure is proposed: first, segmenting thedocument identifying the most relevant segments, and then extractingthe entities. For text segmentation, two methodologies are compared: aclassic method based on regular expressions and a second approach thatdivides the document into blocks of n-tokens, which are then vectorizedusing multilingual models for semantic searches (text-embedding-ada-002/MiniLM-L12-v2 ). Subsequently, large language models (LLaMA-27b, 70b, LLaMA-3 8b, and GPT-4 Turbo) are applied with prompting tothe selected segments for entity extraction. For the LLaMA models, fine-tuning is performed using LoRA. LLaMA-2 7b, even with zero temper-ature, shows a significant number of hallucinations in extractions whichare an important contention point for named entity extraction. This workshows that these hallucinations are substantially reduced after finetuningthe model. The performance of the methodology based on segment vec-torization and subsequent use of LLMs significantly surpasses the clas-sic method which achieves an accuracy of 39.5%. Among open-sourcemodels, LLaMA-2 70B with finetuning achieves the highest accuracy79.4%, surpassing its base version 61.7%. Notably, the base LLaMA-38B model already performs comparably to the finetuned LLaMA-2 70Bmodel, achieving 76.6%, highlighting the rapid progress in model de-velopment. Meanwhile, GPT-4 Turbo achieves the highest accuracy at86.1%