Note: The original zip file available here contained compiled .exe files (and "Readme.doc" inside the zip mentions these). We had a report that a virus scanner claimed these contained a virus - it seems likely this is a false positive, but anyone interested would be best advised to compile the code for themselves anyway so we've stripped out all .exe files from the zip.
Here is the original correspondence,
From: Wout van Wezel <email@example.com> To: firstname.lastname@example.org Date: Mon Jan 24 21:44:23 2005 Subject: Snowball extension Dear Mr. Porter, Somebody extended Snowball for me to create Pascal stemmers. I wanted the stemming algorithms in Pascal so they can be compiled in my information retrieval system (http://www.collectionconnection.nl) which is created in Delphi. I would not mind sharing the extensions, especially given the nature of the Snowball project. If you are interested, please let me know, and I will send you the sources. Everything seems to be working fine, but unfortunately I won't be able to maintain the Pascal-extension code when the core Snowball program changes since my knowledge of C is approximately zero. Here are the changes the developer made: "In order to support Delphi files generation I've added file generator_delphi.c (in compiler directory). Also modified: 1) header.h: added output_delphi and make_delphi fields to struct options. Also added forward declarations to Delphi's generator functions; 2) driver.c: modified in order to support new command line option "-d" and call Delphi generator if its specified; Modified GNUmakefile at the root of the snowball tree in order to add generator_delphi.c into the compilation process. Folder Delphi added. It contains 3 files: 1) SnowballProgram.pas - base class for all generated stemmers; 2) Test.bpr - template for the sample projects; 3) Generate.pl - Perl script that generate all sample stemmers. File algorithms/finnish/stem.sbl modified. There was two grouping: v and V. I rename V to V2 because Delphi Language is case-insensitive." I apologize for emailing you directly, but offering the source code on the mailing list could result in parallel versions instead of a single CVS version and I didn't think that would be good either. Best regards, Wout van Wezel
From: email@example.com (Martin Porter) To: Wout van Wezel <firstname.lastname@example.org> Cc: email@example.com Date: Mon Jan 24 21:54:15 2005 Subject: Re: Snowball extension Wout, Thank you for this email. I am very busy with a number of other things at the moment, but hope to send you a sensible reply in a few days. Meanwhile I'm copying your email to Richard Boulton, who is equally involved with the Snowball work. Martin
From: firstname.lastname@example.org (Martin Porter) To: Wout van Wezel <email@example.com> Cc: firstname.lastname@example.org Date: Tue Jan 25 08:49:07 2005 Subject: Re: Snowball extension Wout, I think it would be a shame to lose the work you have done, but at the moment I'm not sure about the best way to make it publicly available from the Snowball site. I will talk the matter over with Richard Boulton. Could I suggest that, despite your misgivings, it should be announced on Snowball discuss? We then get a record on the site that Pascal versions of the stemmers do exist. Maintenance is of course the main issue here. On http://tartarus.org/~martin/PorterStemmer/ I have about 14 versions of the Porter stemmer in various languages, only four of which I wrote myself. Inevitably I get queries about versions of the stemmer written in programming languages I am not familiar with, and they are difficult to answer, especially when contact with the author has been lost. We would need to assess the code. For example, >File algorithms/finnish/stem.sbl modified. There was two grouping: v and V. I >rename V to V2 because Delphi Language is case-insensitive." This is not a good solution, since it constrains the writing of Snowball scripts. The name translation should reflect case. There are various ways of doing this, for example, stemmer -> stemmer_lllllll Stemmer -> stemmer_ullllll STEMMER -> stemmer_uuuuuuu where the pattern of u's an l's shows the upper/lower case usage of the original name. Martin
From: Wout van Wezel <email@example.com> To: Martin Porter <firstname.lastname@example.org> Cc: email@example.com Date: Tue Jan 25 10:44:36 2005 Subject: Re: Snowball extension Dear Martin/Richard, I've attached the files I got from the developer. The changed sources are in the Snowball tree. The developer has explained in 'readme.doc' what he did. Be aware that I don't have an understanding of the Snowball or C language myself. If you think it would be useful for a more general audience, I can ask the developer to work on the 'case' problem. ps, I don't mind a message on the discussion list. Also, I would be happy to send the Pascal stemmers or the adapted Snowball program to people from the list that think they could use it. Best regards, Wout Attachment: stemming.zip
From: firstname.lastname@example.org (Martin Porter) To: Wout van Wezel <email@example.com> Cc: firstname.lastname@example.org Date: Wed Apr 20 21:37:42 2005 Subject: Pascal codegenerator for Snowball Wout, I have only recently looked through the large tgz file you sent me in January. Here are some initial thoughts, The work of your student was done to high standard, and the delivered system, with software extensions and documentation, is very nice. The only real slip is that the upper/lower case distinction of Snowball names was not preserved in the generated Pascal code. But I have altered the Finnish stemmer so that V and v are called V2 and V1 now, so you won't have to create a separate version. I have not yet talked to Richard Boulton about this, but I think we should keep the tgz file on the Snowball site with a note about its purpose, but I don't think it is worth incorporating the Pascal codegenerator into the main Snowball system, since it is unlikely there would be a large demand for Pascal versions of the stemmers. There is of course a danger that as Snowball extends, use of the Pascal codegenerator will become increasingly difficult, but I am beginning to think that Snowball is now fairly fixed, so that should not be a problem. Martin