Language Modeling Overview

This site outlines useful language modeling recources. Our current focus is to better understand Kneser-Ney smoothing. See the Papers section from the menu.

MIT LM Kit

There's a new LM toolkit from MIT, by Paul Hsu. It allows for iterative parameter tuning not available in SRILM, and claims better memory footprint and faster execution times! See Papers and the code is at http://code.google.com/p/mitlm/.

CT focus: KN understanding


Our current focus should be: * clear understanding of the KN algorithm * parameters which make LM large or small (gtXmin) * Russian-specific tuning we can effect * Syntax-motivated tuning we can add