Snowball

Snowball is a small string processing language designed for creating stemming algorithms for use in Information Retrieval. This site describes Snowball, and presents several useful stemmers which have been implemented using it.

The Snowball compiler translates a Snowball script into another language - currently ISO C, Java and Python are supported.

Since it effectively provides a ‘suffix STRIPPER GRAMmar’, I had toyed with the idea of calling it ‘strippergram’, but good sense has prevailed, and so it is ‘Snowball’ named as a tribute to SNOBOL, the excellent string handling language of Messrs Farber, Griswold, Poage and Polonsky from the 1960s.

Martin Porter

Please address all Snowball-related mail to the snowball-discuss mailing list.

Any such mail sent directly to individual developers may be answered less speedily, and in any case they reserve the right to post their answers on snowball-discuss.

Major events

  • Sep 2015 - New home for snowball on snowballstem.org.
  • Sep 2014 - Martin Porter retires from snowball development.
  • May 2012 - Contributed stemmers for Irish and Czech.
  • Jul 2010 - Contributed stemmers for Armenian, Basque, Catalan.
  • Mar 2007 - Romanian stemmer.
  • Jan 2007 - Turkish stemmer. Contributed by Evren (Kapusuz) Cilden.
  • Sep 2006 - Hungarian stemmer. Contributed by Anna Tordai.
  • Jun 2006 - Supported and updated Python bindings.
  • May 2005 - UTF-8 Unicode support.
  • Sep 2002 - Finnish stemmer.
  • Jul 2002 - ISO Latin I as default The use of MS DOS Latin I is now history, but the old versions of the Snowball stemmers are still accessible on the site.
  • May 2002 - Unicode support
  • Feb 2002 - Java support Richard has modified the snowball code generator to produce Java output as well as ANSI C output. This means that pure Java systems can now use the snowball stemmers.