top of page

Your certificate is now private

CertificateBackground.png

Certificate of Completion

ErdosHorizontal.png

THIS ACKNOWLEDGES THAT

HAS COMPLETED THE SPRING 2026 DATA SCIENCE BOOT CAMP

Helmut Wahanik

Roman Holowinsky, PhD

MARCH 25, 2026

DIRECTOR

DATE

clear.png

TEAM

LLM Hallucinations Detector

Helmut Wahanik, Guoqin Liu, Santanil Jana, AJ Vargas, Debanjan Sarkar

clear.png

In this project, we develop methods for detecting hallucinations in Large Language Models (LLMs) to flag risky outputs prior to expensive downstream validation. We propose two complementary detection strategies evaluated on 2,500 questions across five benchmark datasets using Llama-3.2-3B. The first approach is a white-box method that extracts spectral features from attention-head Laplacians. This method demonstrates that the hallucination signal is low-dimensional and largely linearly separable. The second approach is a black-box method that computes semantic and geometric statistics from a cloud of sampled responses. We find that an ElasticNet logistic model trained on six baseline features achieves an AUROC of approximately 0.91.

Ultimately, we demonstrate that hallucinations leave measurable signatures in both internal transformer activations and the geometry of sampled outputs. Our approach serves as a cost-effective filter for organizations deploying LLMs at scale.

Screen Shot 2022-06-03 at 11.31.35 AM.png
github URL

©2017-2026 by The Erdős Institute.

bottom of page