In swift succession, we received in 2006 two stemmers for Romanian written in Snowball. Here is the original correspondence,
From: Erwin Glockner <firstname.lastname@example.org> To: snowball-discuss Date: Wed, 07 Jun 2006 00:06:30 +0200 Subject: [Snowball-discuss] romanian stemmer Hello everyone, my name is Erwín Glockner, I'm a student of computational linguistics in Heidelberg, Germany. Together with my fellow students Doina Gliga and Marina Stegarescu we started to write a romanian stemmer in Snowball. We planned to finish the stemmer until end of this month. We would be happy if the stemmer would be accepted as part of the Snowball-distribution. There is still some work to do, e.g. evaluating the stemmer, making a stopwords-list, unicode support, etc. After finishing this we will send you our stemmer with the corresponding files, but I couldn't find any email address to whom the stemmer should be sent to. Could please someone tell me the address(es)? With kind regards, E. Glockner, D. Gliga, M. Stegarescu.
From: Erwin Glockner <email@example.com> To: firstname.lastname@example.org, email@example.com Date: Tue Jul 18 19:43:39 2006 Subject: romanian stemmer Dear Mr. Porter, dear Mr. Boulton, we finally finished the Romanian stemmer. Unfortunately evaluation took more time than expected. However, it was an interesting experience creating the stemmer, and we are happy to send you the result of our work. The attachment-file is a Tarball-zipped file with (hopefully) all files needed. The files and the stemmer as well are encoded in UTF-8. Please inform us if something is missing. We would be happy if the Romanian stemmer would be accepted and integrated into the official Snowball distribution. We agree of course to license the stemmer under the same terms as the existing snowball software. We're looking forward to hear from you soon. With kind regards, Marina, Doina and Erwin. Attachment: [romanian1.tgz]
From: Irina Tirdea <firstname.lastname@example.org> To: email@example.com, firstname.lastname@example.org Date: Mon Jul 31 10:19:51 2006 Subject: Romanian stemmer Hello, My name is Irina Tirdea and I have developed a Romanian stemmer in Snowball as part of my bachelor thesis, in Bucharest, Romania. I am sending you the code attached (with vocabulary and stop word list files) and I hope you will accept and integrate it as a part of the Snowball project. I am ready to release the stemmer under the BSD license, just as the Snowball software. The files have been written in UTF-8 encoding (on a Linux system). Looking forward to hear from you. Kind regards, Irina Tirdea Attachment: [romanian2.tgz]
From: email@example.com (Martin Porter) To: snowball-discuss Cc: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Date: Mon Jul 31 10:43:05 BST 2006 Subject: Tardy response to submissions to Snowball I am sending this general email as a kind of apology, for having done nothing so far on the following generously sent Snowball submissions: 7 June, from E. Glockner: a Romanian stemmer 8 June, from A. Tordai: a Hungarian stemmer and this morning another Romanian stemmer arrived, 31 July, from I. Tirdea, a Romanian stemmer After the first submission I promised to look at it "next week", so Mr Glockner has probably been wondering what has happened. [. . .] I will make a point of looking at these submissions this week, More soon, Martin
From: email@example.com (Martin Porter) To: snowball-discuss Cc: firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org Date: Wed Sep 06 12:39:16 BST 2006 Subject: Romanian stemmer To the originators of the Romanian stemmers, I have now found time to do some preliminary work on the Romanian stemmer. I should explain that part of the complication has been the receipt, no more than ten days apart, of two Romanian stemmers in Snowball, the first (romanian1) from [Glockner, Gliga, and Stegarescu] in Heidelberg, the second (romanian2) from Tirdea in Bucharest. [. . . .] I have put together a vocabulary by combining the vocabularies provided with romanian1 and romanian2. This appears in column 1. Column 2 is the stemmed form produced by romanian1, and column 3 the stemmed form produced by romanian2. If the entry in column 3 is blank, both stemmers are producing the same result. You might care to compare the two approaches. My own feeling is that romanian1 does a more thorough job of ending removal, but unlike romanian2 has a habit of discarding too much from short words. aberant->ab, abatere->ab, aburi->ab are examples of this. In romanian1 the R2 test is rarely used (it seems to me that 'R1 or R2' is equivalent to 'R1', since p2 is never to the left of p1.) I might have a go at making some modifications here. Needless to say, I am not familiar with Romanian, but the similarity to the other Romance languages, especially Italian, enables one to grasp the essential features of the morphology. What we would like to do is to have a single stemmer for release from the snowball site, if that is possible, and giving all necessary credits, along the lines of the recent addition, http://snowballstem.org/algorithms/hungarian/stemmer.html Hope to hear from you, Martin Porter
Finally we decided to produce our own Romanian stemmer as described on the Romanian stemmer page. The submitted stemmers both contain stop word lists, available inside the tarballs.