Some of the algorithms begin with a step which puts letters which are normally classed as vowels into upper case to indicate that they are are to be treated as consonants (the assumption being that the words are presented to the stemmers in lower case). Upper case therefore acts as a flag indicating a consonant.
For example, the English stemmer begins with the step
youth | → | Youth | ||
boy | → | boY | ||
boyish | → | boYish | ||
fly | → | fly | ||
flying | → | flying | ||
syzygy | → | syzygy |
This process works from left to right, and if a word contains Vyy, where V is a vowel, the first y is put into upper case, but the second y is left alone, since it is preceded by upper case Y which is a consonant. A sequence Vyyyyy... would be changed to VYyYyY....
The combination yy never occurs in English, although it might appear in foreign words:
sayyid | → | saYyid |
(A sayyid, my dictionary tells me, is a descendant of Mohammed's daughter Fatima.) But the left-to-right process is significant in other languages, for example French. In French the rule for marking vowels as consonants is,
which gives rise to,
ennuie | → | ennuIe | ||
inquiétude | → | inqUiétude |
In the first word, i is put into upper case since it has a vowel on both sides of it. In the second word, u after q is put into upper case, and again the following i is left alone, since it is preceded by upper case U which is a consonant.