Helmut Wahanik

Roman Holowinsky, PhD

MARCH 25, 2026

DIRECTOR

DATE

TEAM

LLM Hallucinations Detector

Helmut Wahanik, Guoqin Liu, Santanil Jana, AJ Vargas, Debanjan Sarkar

In this project, we develop methods for detecting hallucinations in Large Language Models (LLMs) to flag risky outputs prior to expensive downstream validation. We propose two complementary detection strategies evaluated on 2,500 questions across five benchmark datasets using Llama-3.2-3B. The first approach is a white-box method that extracts spectral features from attention-head Laplacians. This method demonstrates that the hallucination signal is low-dimensional and largely linearly separable. The second approach is a black-box method that computes semantic and geometric statistics from a cloud of sampled responses. We find that an ElasticNet logistic model trained on six baseline features achieves an AUROC of approximately 0.91.

Ultimately, we demonstrate that hallucinations leave measurable signatures in both internal transformer activations and the geometry of sampled outputs. Our approach serves as a cost-effective filter for organizations deploying LLMs at scale.

THE ERDŐS INSTITUTE

Helping PhDs get and create jobs they love at every stage of their career.

Helmut Wahanik

TEAM

LLM Hallucinations Detector

Helmut Wahanik, Guoqin Liu, Santanil Jana, AJ Vargas, Debanjan Sarkar