Here is a sample of Danish vocabulary, with the stemmed forms that will be generated by this algorithm:
| word | stem | word | stem | |||||||||
|
indtage indtagelse indtager indtages indtaget indtil indtog indtraf indtryk indtræde indtræder indtræffe indtræffer indtrængende indtægt indtægter indvandrede indvandret indvender indvendig indvendige indvendigt indvending indvendingerne indvie indviede indvielse indvielsen indvielsesløfte indvielsestid indvier indvies indviet indvikle indvikler indvolde indvoldene indvortes indånde indåndede |
⇒ |
indtag indtag indtag indtag indtag indtil indtog indtraf indtryk indtræd indtræd indtræf indtræf indtræng indtæg indtæg indvandred indvandr indvend indvend indvend indvend indvending indvending indvi indvied indvi indvi indvielsesløft indvielsestid indvi indvi indvi indvikl indvikl indvold indvold indvort indånd indånded |
underste undersåtter undersåtters undersøg undersøge undersøgelse undersøgelsen undersøger undersøgt undersøgte undertryk undertrykke undertrykkelse undertrykker undertrykkere undertrykkeren undertrykkerens undertrykkeres undertrykkes undertrykt undertrykte undertryktes undertvang undertvunget undertvungne undervejs underverdenen undervise underviser undervises undervisning undervisningen undervist underviste underværk underværker undevise undeviste undfange undfanged |
⇒ |
underst undersåt undersåt undersøg undersøg undersøg undersøg undersøg undersøg undersøg undertryk undertryk undertryk undertryk undertryk undertryk undertryk undertryk undertryk undertryk undertryk undertryk undertvang undertvung undertvungn undervej underverden undervis undervis undervis undervisning undervisning undervist undervist underværk underværk undevis undevist undfang undfanged |
The Danish alphabet includes the following additional letters,
The following letters are vowels:
R2 is not used. R1 is set up by the following steps:
Define a valid s-ending as one of
Do each of steps 1, 2, 3, 4 and 5.
Step 1:
Search for the longest among the following suffixes in R1, and perform the action indicated.
(Note that only the suffix needs to be in R1, the letter of the valid s-ending is not required to be.)
Step 2:
Search for one of the following suffixes in R1, and if found delete the last letter.
If the word ends igst, remove the final st.
Search for the longest among the following suffixes in R1, and perform the action indicated.
If the word ends with one of bb dd ff gg kk ll mm nn pp rr ss tt in R1, then remove the last letter.
(For example, bestemmelse → bestemmels (step 1) → bestemm (step 3a) → bestem in this step.)
If the word ends with an apostrophe (') then remove it.
(For example, cd'en → cd' (step 1) → cd in this step.)
Apostrophe is used in Danish to separate Danish suffixes from words of foreign origin. This is commonly seen with acronyms/initialisms, some of which are only two characters and don't contain vowels (e.g. cd'en) so the definition of R1 includes a special case for apostrophe.
routines (
mark_regions
main_suffix
consonant_pair
other_suffix
undouble
)
externals ( stem )
strings ( ch )
integers ( p1 x )
groupings ( c v s_ending )
stringescapes {}
/* special characters */
stringdef ae '{U+00E6}'
stringdef ao '{U+00E5}'
stringdef o/ '{U+00F8}'
define c 'bcdfghjklmnpqrstvwxz'
define v 'aeiouy{ae}{ao}{o/}'
define s_ending 'abcdfghjklmnoprtvyz{ao}'
define mark_regions as (
$p1 = limit
test ( hop 3 setmark x )
gopast v gopast non-v setmark p1
try ( $p1 < x $p1 = x )
)
backwardmode (
define main_suffix as (
setlimit tomark p1 for ([substring])
among(
'hed' 'ethed' 'ered' 'e' 'erede' 'ende' 'erende' 'ene' 'erne' 'ere'
'en' 'heden' 'eren' 'er' 'heder' 'erer' 'heds' 'es' 'endes'
'erendes' 'enes' 'ernes' 'eres' 'ens' 'hedens' 'erens' 'ers' 'ets'
'erets' 'et' 'eret'
(delete)
's'
(s_ending delete)
)
)
define consonant_pair as (
test (
setlimit tomark p1 for ([substring])
among(
'gd' // significant in the call from other_suffix
'dt' 'gt' 'kt'
)
)
next] delete
)
define other_suffix as (
do ( ['st'] 'ig' delete )
setlimit tomark p1 for ([substring])
among(
'ig' 'lig' 'elig' 'els'
(delete do consonant_pair)
'l{o/}st'
(<-'l{o/}s')
)
)
define undouble as (
setlimit tomark p1 for ([c] ->ch)
ch
delete
)
)
define stem as (
do mark_regions
backwards (
do main_suffix
do consonant_pair
do other_suffix
do undouble
)
)