Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.

The Snowball compiler translates a Snowball script into another language - currently ISO C, C#, Go, Java, Javascript, Object Pascal, Python and Rust are supported.

Since it effectively provides a ‘suffix STRIPPER GRAMmar’, I had toyed with the idea of calling it ‘strippergram’, but good sense has prevailed, and so it is ‘Snowball’ named as a tribute to SNOBOL, the excellent string handling language of Messrs Farber, Griswold, Poage and Polonsky from the 1960s.

Martin Porter

Please address all Snowball-related mail to the snowball-discuss mailing list.

Any such mail sent directly to individual developers may be answered less speedily, and in any case they reserve the right to post their answers on snowball-discuss.

Major events

  • Oct 2019 - Serbian stemming algorithm contributed by Stefan Petkovic and Dragan Ivanovic.
  • Oct 2019 - Snowball 2.0.0 released!
  • Aug 2019 - Hindi stemming algorithm contributed by Olly Betts.
  • Aug 2019 - Basque and Catalan merged into the distribution.
  • Oct 2018 - Greek stemming algorithm contributed by Oleg Smirnov.
  • Jun 2018 - Object pascal backend from Wout van Wezel merged.
  • May 2018 - Lithuanian stemming algorithm contributed by Dainius Jocas.
  • May 2018 - Indonesian stemming algorithm contributed by Olly Betts.
  • Mar 2018 - C# backend contributed by Cesar Souza.
  • Mar 2018 - Javascript backend merged.
  • Jun 2017 - Go backend contributed by Marty Schoch.
  • Mar 2017 - Rust backend contributed by Jakob Demler.
  • Jan 2016 - Arabic stemming algorithm contributed by Assem Chelli.
  • Oct 2015 - Tamil stemming algorithm contributed by Damodharan Rajalingam.
  • Sep 2015 - New home for snowball on
  • Sep 2014 - Martin Porter retires from snowball development.
  • May 2012 - Contributed stemmers for Irish and Czech.
  • Jul 2010 - Contributed stemmers for Armenian, Basque, Catalan.
  • Mar 2007 - Romanian stemmer.
  • Jan 2007 - Turkish stemmer. Contributed by Evren (Kapusuz) Cilden.
  • Sep 2006 - Hungarian stemmer. Contributed by Anna Tordai.
  • Jun 2006 - Supported and updated Python bindings.
  • May 2005 - UTF-8 Unicode support.
  • Sep 2002 - Finnish stemmer.
  • Jul 2002 - ISO Latin I as default The use of MS DOS Latin I is now history, but the old versions of the Snowball stemmers are still accessible on the site.
  • May 2002 - Unicode support
  • Feb 2002 - Java support Richard has modified the snowball code generator to produce Java output as well as ANSI C output. This means that pure Java systems can now use the snowball stemmers.