Most of the stemmers make use of at least one of the region definitions R1 and R2. They are defined as follows:
R1 is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
R2 is the region after the first non-vowel following a vowel in R1, or is the null region at the end of the word if there is no such non-vowel.
The definition of vowel varies from language to language. In French, for example, é is a vowel, and in Italian i between two other vowels is not a vowel. The class of letters that constitute vowels is made clear in each stemmer.
Below, R1 and R2 are shown for a number of English words,
b e a u t i f u l |<------------->| R1 |<----->| R2
Letter t is the first non-vowel following a vowel in beautiful, so R1 is iful. In iful, the letter f is the first non-vowel following a vowel, so R2 is ul.
b e a u t y |<->| R1 ->|<- R2
In beauty, the last letter y is classed as a vowel. Again, letter t is the first non-vowel following a vowel, so R1 is just the last letter, y. R1 contains no non-vowel, so R2 is the null region at the end of the word.
b e a u ->|<- R1 ->|<- R2In beau, R1 and R2 are both null.
Other examples:
a n i m a d v e r s i o n |<----------------------------------------->| R1 |<--------------------------------->| R2 s p r i n k l e d |<------------->| R1 ->|<- R2 e u c h a r i s t |<--------------------->| R1 |<--------->| R2