Marking vowels as consonants

Some of the algorithms begin with a step which puts letters which are normally classed as vowels into upper case to indicate that they are to be treated as consonants (the assumption being that the words are presented to the stemmers in lower case). Upper case therefore acts as a flag indicating a consonant.

For example, the English stemmer begins with the step

Set initial y, or y after a vowel, to Y,

giving rise to the following changes,

youth	→	Youth
boy	→	boY
boyish	→	boYish
fly	→	fly
flying	→	flying
syzygy	→	syzygy

This process works from left to right, and if a word contains Vyy, where V is a vowel, the first y is put into upper case, but the second y is left alone, since it is preceded by upper case Y which is a consonant. A sequence Vyyyyy... would be changed to VYyYyY....

The combination yy never occurs in English, although it might appear in foreign words:

sayyid

→

saYyid

(A sayyid, my dictionary tells me, is a descendant of Mohammed's daughter Fatima.) But the left-to-right process is significant in other languages, for example French. In French the rule for marking vowels as consonants is,

Put into upper case u or i preceded and followed by a vowel, and y preceded or followed by a vowel. Put u after q into upper case.

which gives rise to,

ennuie		→		ennuIe
inquiétude		→		inqUiétude

In the first word, i is put into upper case since it has a vowel on both sides of it. In the second word, u after q is put into upper case, and again the following i is left alone, since it is preceded by upper case U which is a consonant.