Snowball

Snowball is a small string processing language for creating stemming algorithms for use in Information Retrieval, plus a collection of stemming algorithms implemented using it.

The Snowball compiler translates a Snowball program into source code in another language - currently Ada, C, C++, C#, Dart, Go, Java, Javascript, Object Pascal, PHP, Python, Rust and Zig are supported.

The Snowball language and compiler were originally designed and built by Martin Porter. Martin retired from development in 2014 and Snowball is now maintained as a community project. Martin originally named the project as a tribute to SNOBOL, the excellent string handling language from the 1960s. The name Snowball now also serves as a metaphor for how the project grows by gathering contributions over time.

What is Stemming?

Stemming maps different forms of the same word to a common "stem" - for example, the English stemmer maps connection, connections, connective, connected, and connecting to connect. So a searching for connected would also find documents which only have the other forms.

This stem form is often a word itself, but this is not always the case as this is not a requirement for text search systems, which are the intended field of use. We also aim to conflate words with the same meaning, rather than all words with a common linguistic root (so awe and awful don't have the same stem), and over-stemming is more problematic than under-stemming so we tend not to stem in cases that are hard to resolve. If you want to always reduce words to a root form and/or get a root form which is itself a word then Snowball's stemming algorithms likely aren't the right answer.

Please address all Snowball-related mail to the snowball-discuss mailing list.

Any such mail sent directly to individual developers may be answered less speedily, and in any case they reserve the right to post their answers on snowball-discuss.

Major events

  • Jun 2026 - Early Modern English stemmer enhanced and added to the distribution.
  • Jun 2026 - Snowball 3.1.1 released.
  • May 2026 - Snowball 3.1.0 released.
  • May 2026 - Sesotho stemming algorithm contributed by Kamohelo Lebjane
  • Apr 2026 - Czech stemming algorithm contributed by Jim O’Regan and Olly Betts
  • Apr 2026 - Persian stemming algorithm contributed by Saeid Darvish.
  • Apr 2026 - Zig backend contributed by AJ Roetker.
  • Oct 2025 - Polish stemming algorithm contributed by Dmitry Shachnev.
  • Oct 2025 - Dart backend contributed by Ryan Heise.
  • Oct 2025 - PHP backend contributed by Tim Whitlock and Olly Betts.
  • May 2025 - Snowball 3.0.1 released.
  • May 2025 - Snowball 3.0.0 released.
  • Mar 2025 - Esperanto stemming algorithm contributed by David Corbett.
  • Sep 2023 - Estonian stemming algorithm contributed by Linda Freienthal.
  • Nov 2021 - Snowball 2.2.0 released.
  • Jan 2021 - Snowball 2.1.0 released.
  • Jan 2021 - Armenian stemmer from Astghik Mkrtchyan merged into the distribution.
  • Jan 2021 - Ada backend contributed by Stephane Carrez.
  • Nov 2020 - Yiddish stemming algorithm contributed by Assaf Urieli.
  • Oct 2019 - Serbian stemming algorithm contributed by Stefan Petkovic and Dragan Ivanovic.
  • Oct 2019 - Snowball 2.0.0 released.
  • Aug 2019 - Hindi stemming algorithm contributed by Olly Betts.
  • Aug 2019 - Basque and Catalan merged into the distribution.
  • Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov.
  • Jun 2018 - Object pascal backend from Wout van Wezel merged.
  • May 2018 - Lithuanian stemming algorithm contributed by Dainius Jocas.
  • May 2018 - Indonesian stemming algorithm contributed by Olly Betts.
  • Apr 2018 - Nepali stemming algorithm contributed by Ingroj Shrestha, Oleg Bartunov and Shreeya Singh Dhakal
  • Mar 2018 - C# backend contributed by Cesar Souza.
  • Mar 2018 - Javascript backend merged.
  • Jun 2017 - Go backend contributed by Marty Schoch.
  • Jun 2017 - Irish stemming algorithm from Jim O’Regan merged into the distribution.
  • Mar 2017 - Rust backend contributed by Jakob Demler.
  • Jan 2016 - Arabic stemming algorithm contributed by Assem Chelli.
  • Oct 2015 - Tamil stemming algorithm contributed by Damodharan Rajalingam.
  • Sep 2015 - New home for snowball on snowballstem.org.
  • Sep 2014 - Martin Porter retires from snowball development.
  • May 2012 - Contributed stemmers for Irish and Czech.
  • Jul 2010 - Contributed stemmers for Armenian, Basque, Catalan.
  • Mar 2007 - Romanian stemmer.
  • Jan 2007 - Turkish stemmer. Contributed by Evren (Kapusuz) Cilden.
  • Sep 2006 - Hungarian stemmer. Contributed by Anna Tordai.
  • Jun 2006 - Supported and updated Python bindings.
  • May 2005 - UTF-8 Unicode support.
  • Sep 2002 - Finnish stemmer.
  • Jul 2002 - ISO Latin I as default The use of MS DOS Latin I is now history, but the old versions of the Snowball stemmers are still accessible on the site.
  • May 2002 - Unicode support
  • Feb 2002 - Java support Richard has modified the snowball code generator to produce Java output as well as ANSI C output. This means that pure Java systems can now use the snowball stemmers.