Snowball is a small string processing language for creating
stemming algorithms for use in Information Retrieval, plus a collection of
stemming algorithms implemented using it.
It was originally designed
and built by Martin
Porter. Martin retired from development in 2014 and Snowball is now
maintained as a community project. Martin originally chose the name Snowball as
a tribute to SNOBOL, the
excellent string handling language from the 1960s.  It now also serves as a
metaphor for how the project grows by gathering contributions over time.
The Snowball compiler translates a Snowball program into source code in another
language - currently Ada, ISO C, C#, Dart, Go, Java, Javascript, Object Pascal,
PHP, Python and Rust are supported.
What is Stemming?
Stemming maps different forms of the same word to a common "stem" - for
example, the English stemmer maps connection, connections,
connective, connected, and connecting to connect.
So a searching for connected would also find documents which only
have the other forms.
This stem form is often a word itself, but this is not always the case as
this is not a requirement for text search systems, which are the intended
field of use.  We also aim to conflate words with the same meaning, rather
than all words with a common linguistic root (so awe and awful
don't have the same stem), and over-stemming is more problematic than
under-stemming so we tend not to stem in cases that are hard to resolve.  If
you want to always reduce words to a root form and/or get a root form which is
itself a word then Snowball's stemming algorithms likely aren't the right
answer.
Please address all Snowball-related mail to the snowball-discuss mailing list.
Any such mail sent directly to individual developers may be answered less
speedily, and in any case they reserve the right to post their answers on snowball-discuss.
Major events
  - 
    Oct 2025 - Polish stemming algorithm contributed by Dmitry Shachnev.
  
 
  - 
    Oct 2025 - Dart backend contributed by Ryan Heise.
  
 
  - 
    Oct 2025 - PHP backend contributed by Tim Whitlock and Olly Betts.
  
 
  - 
    May 2025 - Snowball 3.0.1 released.
  
 
  - 
    May 2025 - Snowball 3.0.0 released.
  
 
  - 
    Mar 2025 - Esperanto stemming algorithm contributed by David Corbett.
  
 
  - 
    Sep 2023 - Estonian stemming algorithm contributed by Linda Freienthal.
  
 
  - 
    Nov 2021 - Snowball 2.2.0 released.
  
 
  - 
    Jan 2021 - Snowball 2.1.0 released.
  
 
  - 
    Jan 2021 - Armenian stemmer from Astghik Mkrtchyan merged into the distribution.
  
 
  - 
    Jan 2021 - Ada backend contributed by Stephane Carrez.
  
 
  - 
    Nov 2020 - Yiddish stemming algorithm contributed by Assaf Urieli.
  
 
  - 
    Oct 2019 - Serbian stemming algorithm contributed by Stefan Petkovic and Dragan Ivanovic.
  
 
  - 
    Oct 2019 - Snowball 2.0.0 released.
  
 
  - 
    Aug 2019 - Hindi stemming algorithm contributed by Olly Betts.
  
 
  - 
    Aug 2019 - Basque and Catalan merged into the distribution.
  
 
  - 
    Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov.
  
 
  - 
    Jun 2018 - Object pascal backend from Wout van Wezel merged.
  
 
  - 
    May 2018 - Lithuanian stemming algorithm contributed by Dainius Jocas.
  
 
  - 
    May 2018 - Indonesian stemming algorithm contributed by Olly Betts.
  
 
  - 
    Apr 2018 - Nepali stemming algorithm contributed by Ingroj Shrestha, Oleg Bartunov and Shreeya Singh Dhakal
  
 
  - 
    Mar 2018 - C# backend contributed by Cesar Souza.
  
 
  - 
    Mar 2018 - Javascript backend merged.
  
 
  - 
    Jun 2017 - Go backend contributed by Marty Schoch.
  
 
  - 
    Mar 2017 - Rust backend contributed by Jakob Demler.
  
 
  - 
    Jan 2016 - Arabic stemming algorithm contributed by Assem Chelli.
  
 
  - 
    Oct 2015 - Tamil stemming algorithm contributed by Damodharan Rajalingam.
  
 
  - 
    Sep 2015 - New home for snowball on snowballstem.org.
  
 
  - 
    Sep 2014 - Martin Porter retires from snowball development.
  
 
  - 
    May 2012 - Contributed stemmers for Irish and Czech.
  
 
  - 
    Jul 2010 - Contributed stemmers for Armenian, Basque, Catalan.
  
 
  - 
    Mar 2007 - Romanian stemmer.
  
 
  - 
    Jan 2007 - Turkish stemmer. Contributed by Evren (Kapusuz) Cilden.
  
 
  - 
    Sep 2006 - Hungarian stemmer. Contributed by Anna Tordai.
  
 
  - 
    Jun 2006 - Supported and updated Python bindings.
  
 
  - 
    May 2005 - UTF-8 Unicode support.
  
 
  - 
    Sep 2002 - Finnish stemmer.
  
 
  - 
    Jul 2002 - ISO Latin I as default
    The use of MS DOS Latin I is now history, but the old versions of the
    Snowball stemmers are still accessible on the site.
  
 
  - 
    May 2002 - Unicode support
  
 
  - 
    Feb 2002 - Java support
    Richard has modified the snowball code generator to produce Java output as
    well as ANSI C output.  This means that pure Java systems can now use the
    snowball stemmers.