Length Normalization in LLM Evaluation: Solving Length Penalty Bias

13 min

15 мая 2026 г.

Learn how length normalization solves length penalty bias in LLM evaluation. Discover how to use log-probabilities for fair benchmarking in the EleutherAI harness.

Лучшая цитата из Length Normalization in LLM Evaluation: Solving Length Penalty Bias

In raw log-probability sums, every additional token acts like a tax. Understanding how to neutralize this bias through length normalization is the difference between a fair evaluation and a broken one.

Этот аудиоурок был создан участником сообщества BeFreed

Вопрос для ввода

This lesson is part of the learning plan: 'AI Evaluation Pipeline Deep Dive'. Lesson topic: Length Normalization in LLM Evaluation Overview: Longer answers are often unfairly penalized in model scoring. Learn how normalized accuracy ensures fair comparisons by accounting for token counts. Key insights to cover in order: 1. Raw log-probability sums inherently penalize longer answers because each additional token adds a negative value. 2. Normalized accuracy (acc_norm) divides the total log-probability by token count to ensure fair comparison across choices. 3. Multiple choice tasks score candidates by comparing the likelihood of each option as a continuation of the prompt. Listener profile: - Learning goal: Build evaluation pipeline - Background knowledge: I have worked with performance metrics collection in AI harness. - Guidance: Focus on pipeline architecture and metrics integration. Cover evaluation frameworks and performance measurement systems. Tailor examples, pacing, and depth to this listener. Avoid analogies or references that assume knowledge outside this listener's profile.

Голоса ведущих

Lena

Стиль обучения

Весёлый

Источники знаний

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/api/task.py

https://mljourney.com/how-to-evaluate-llms-with-lm-evaluation-harness/

https://huggingface.co/blog/Neo111x/integrating-benchmarks-into-lm-evaluation-harness

https://github.com/EleutherAI/lm-evaluation-harness/blob/main/docs/task_guide.md

https://slyracoon23.github.io/lm-evaluation-harness/new_task_guide/

https://slyracoon23.github.io/lm-evaluation-harness/task_guide/

Часто задаваемые вопросы

Length penalty is a structural bias that occurs when evaluating language models using raw log-probability sums. Because probabilities are values between zero and one, adding their logs results in a more negative number for every additional token. This acts like a tax on longer responses, often causing models to fail on wordier correct answers compared to shorter distractors, even if the longer answer is more accurate.

Length normalization neutralizes the inherent bias against longer sequences by adjusting for the number of tokens in a response. Without this adjustment, a short answer like 'Paris' is almost guaranteed to have a higher total log-probability than a longer, more descriptive correct answer like 'The capital city of France.' Implementing normalization ensures a fair evaluation and prevents the model's actual capabilities from being misrepresented on leaderboards.

The EleutherAI LM Evaluation Harness is a standard tool for benchmarking models against suites like MMLU, HellaSwag, and ARC. If you are integrating performance metrics into this harness, understanding length normalization is critical. It ensures that the math behind the log-probabilities doesn't unfairly penalize models for generating longer tokens, which is the difference between a broken evaluation and a fair, accurate assessment of model capability.

Узнать больше

I want to learn the fundamentals of LLMs

ПЛАН ОБУЧЕНИЯ

I want to learn the fundamentals of LLMs

Large Language Models are revolutionizing how we interact with technology and information. This learning plan provides essential knowledge for developers, AI enthusiasts, and professionals who want to understand LLM capabilities, limitations, and future potential, enabling them to make informed decisions about implementing and working with this transformative technology.

1 h 56 m•4 Разделы

Neural Networks and LLM

ПЛАН ОБУЧЕНИЯ

Neural Networks and LLM

This learning plan is essential for developers and data scientists looking to transition from basic machine learning to state-of-the-art generative AI. It bridges the gap between theoretical mathematics and practical implementation, making it ideal for those who want to build or fine-tune their own large language models.

2 h 53 m•4 Разделы

Master ML Research in LLMs, NLP & Quant Fin

ПЛАН ОБУЧЕНИЯ

Master ML Research in LLMs, NLP & Quant Fin

This comprehensive track bridges the gap between theoretical machine learning research and high-stakes applications in NLP and quantitative finance. It is ideal for aspiring researchers, data scientists, and quantitative analysts looking to master the architectures behind LLMs and algorithmic trading systems.

3 h 42 m•4 Разделы

Python programming for LLMs and evals

ПЛАН ОБУЧЕНИЯ

Python programming for LLMs and evals

As AI integration becomes standard, the ability to both build and critically evaluate models is a vital technical differentiator. This path is ideal for developers and data scientists looking to transition from general programming to specialized LLM engineering and rigorous model benchmarking.

3 h 3 m•4 Разделы

Learning American Accent

ПЛАН ОБУЧЕНИЯ

Learning American Accent

This learning plan is essential for non-native speakers and professionals looking to enhance their clarity and cultural integration in North American environments. It provides a systematic approach to mastering the mechanics of speech, from foundational phonetics to advanced conversational nuances.

3 h 9 m•4 Разделы

ML Eng: Math, Biz, Polyglot & Soft Skills

ПЛАН ОБУЧЕНИЯ

ML Eng: Math, Biz, Polyglot & Soft Skills

This comprehensive path is designed for engineers looking to evolve into senior ML leaders by blending technical depth with business acumen. It bridges the gap between low-level mathematical implementation and high-level strategic influence, making it ideal for those aiming to drive real-world impact in the AI industry.

3 h 7 m•4 Разделы

Leaning English

ПЛАН ОБУЧЕНИЯ

Leaning English

This comprehensive plan is designed for individuals looking to transition from basic English to professional fluency. It is ideal for career-driven learners who need to balance grammatical accuracy with confident verbal and written communication.

2 h 27 m•4 Разделы

English language learning

ПЛАН ОБУЧЕНИЯ

English language learning

This comprehensive curriculum is designed for learners looking to bridge the gap between basic understanding and professional mastery. It is ideal for students and professionals who need to communicate with authority and fluency in both social and corporate environments.

3 h 20 m•4 Разделы

Создано выпускниками Колумбийского университета в Сан-Франциско

BeFreed объединяет глобальное сообщество из 1,000,000 любознательных умов

Узнайте больше о том, как обсуждают BeFreed в интернете

"Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

@Moemenn

"I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

@Chloe, Solo founder, LA

117

"Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

@Raaaaaachelw

"Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

@Matt, YC alum

108

"Reading used to feel like a chore. Now it’s just part of my lifestyle."

@Erin, Investment Banking Associate , NYC

254

"Feels effortless compared to reading. I’ve finished 6 books this month already."

@djmikemoore

"BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

@Pitiful

4.5K

"BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

@SofiaP

"BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

@Jaded_Falcon

201

"It is great for me to learn something from the book without reading it."

@OojasSalunke

"The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

@Leo, Law Student, UPenn

483

"Makes me feel smarter every time before going to work"

@Cashflowbubu

Создано выпускниками Колумбийского университета в Сан-Франциско

BeFreed объединяет глобальное сообщество из 1,000,000 любознательных умов

Узнайте больше о том, как обсуждают BeFreed в интернете

"Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

@Moemenn

"I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

@Chloe, Solo founder, LA

117

"Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

@Raaaaaachelw

"Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

@Matt, YC alum

108

"Reading used to feel like a chore. Now it’s just part of my lifestyle."

@Erin, Investment Banking Associate , NYC

254

"Feels effortless compared to reading. I’ve finished 6 books this month already."

@djmikemoore

"BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

@Pitiful

4.5K

"BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

@SofiaP

"BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

@Jaded_Falcon

201

"It is great for me to learn something from the book without reading it."

@OojasSalunke

"The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

@Leo, Law Student, UPenn

483

"Makes me feel smarter every time before going to work"

@Cashflowbubu

"Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

@Moemenn

"I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

@Chloe, Solo founder, LA

117

"Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

@Raaaaaachelw

"Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

@Matt, YC alum

108

"Reading used to feel like a chore. Now it’s just part of my lifestyle."

@Erin, Investment Banking Associate , NYC

254

"Feels effortless compared to reading. I’ve finished 6 books this month already."

@djmikemoore

"BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

@Pitiful

4.5K

"BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

@SofiaP

"BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

@Jaded_Falcon

201

"It is great for me to learn something from the book without reading it."

@OojasSalunke

"The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

@Leo, Law Student, UPenn

483

"Makes me feel smarter every time before going to work"

@Cashflowbubu

"Instead of endless scrolling, I just hit play on BeFreed. It saves me so much time."

@Moemenn

"I never knew where to start with nonfiction—BeFreed’s book lists turned into podcasts gave me a clear path."

@Chloe, Solo founder, LA

117

"Perfect balance between learning and entertainment. Finished ‘Thinking, Fast and Slow’ on my commute this week."

@Raaaaaachelw

"Crazy how much I learned while walking the dog. BeFreed = small habits → big gains."

@Matt, YC alum

108

"Reading used to feel like a chore. Now it’s just part of my lifestyle."

@Erin, Investment Banking Associate , NYC

254

"Feels effortless compared to reading. I’ve finished 6 books this month already."

@djmikemoore

"BeFreed turned my guilty doomscrolling into something that feels productive and inspiring."

@Pitiful

4.5K

"BeFreed turned my commute into learning time. 20-min podcasts are perfect for finishing books I never had time for."

@SofiaP

"BeFreed replaced my podcast queue. Imagine Spotify for books — that’s it. 🙌"

@Jaded_Falcon

201

"It is great for me to learn something from the book without reading it."

@OojasSalunke

"The themed book list podcasts help me connect ideas across authors—like a guided audio journey."

@Leo, Law Student, UPenn

483

"Makes me feel smarter every time before going to work"

@Cashflowbubu

1.5K Ratings4.7

Начните своё обучение прямо сейчас