The representation of apostrophe in Unicode is frankly a bit of a mess.
There are two Unicode characters for apostrophe, U+0027 and U+2019. The former is also in both ASCII and ISO-8859-1 (Latin1) whereas the latter is not. Compare,
Hamlet's father's ghost (U+0027)
Hamlet’s father’s ghost (U+2019)
Although conceptually different from an apostrophe, a single closing quote is also represented by character U+2019.
As well as being used for apostrophe, U+0027 is used to represented both single opening quote and single closing quote (in ASCII and ISO-8859-1 it is the only way to). Unicode also has a shaped single opening quote (U+2018) to complement U+2019.
A fourth character, U+201B, like U+2018 but with the tail ‘rising’ instead of ‘descending’, is also sometimes used as apostrophe (in the house style of certain publishers, for surnames like M’Coy and so on.)
Since U+0027 can be both opening and closing single quotes, you can't easily tell if U+0027 before or after a word is an apostrophe or a single quote. Similarly, you can't easily tell is U+2019 after a word is an apostrophe or a single closing quote.
There are some common features as to how apostrophe is used which apply to multiple languages.
To mark omission of letters, such as contractions in English (has not becomes hasn't) and elisions in French (le alphabet becomes l'alphabet).
To separate a suffix from the word it added to in some situations. The most common is when suffixing a "foreign" word, for example Bordeaux'ssa in Finnish; Turkish uses an apostrophe when suffixing any proper noun, e.g. Türkiye'dir.
Some Catalan pronouns can attach before or after a verb which starts/ends in a vowel. The pronoun drops its vowel and an apostrophe is added between the pronoun and verb. These are handled by our Catalan stemmer.
An apostrophe is sometimes used to join the enclitic definite article to words of foreign origin, or to other words that would otherwise look awkward. For example, IP'en ("the IP address"). Since 3.1.0, our stemmer includes handling for this.
The Kraaij-Pohlmann stemmer for Dutch (Kraaij, 1994, 1995) removes hyphen and treats apostrophe as part of the alphabet (so ’s, ’tje and ’je are three of their endings). Kraaij-Pohlmann is the default Dutch stemmer since Snowball 3.0.0.
The previous default Dutch stemmer was Martin Porter's which assumes hyphen and apostrophe have already been removed from the word to be stemmed. We still provide this for compatibility with users who have data processed using it - given its aim is compatibility with existing data, we've not updated its handling of apostrophes.
In the English stemming algorithm, it is assumed that apostrophe is represented by U+0027. This makes it ASCII compatible. Clearly other codes for apostrophe can be mapped to this code prior to stemming.
In English orthography, apostrophe has one of three functions.
It indicates a contraction in what is now accepted as a single word: o’clock, O’Reilly, M’Coy. Except in proper names such forms are rare: the apostrophe in Hallowe’en is disappearing, and in ’bus has disappeared.
It indicates a standard contraction with auxiliary or modal verbs: you’re, isn’t, we’d. There are about forty of these forms in contemporary English, and their use is increasing as they displace the full forms that were at one time used in formal documents. Although they can be reduced to word pairs, it is more convenient to treat them as single items (usually stopwords) in IR work. And then preserving the apostrophe is important, so that he’ll, she’ll, we’ll, we'd are not equated with hell, shell, well, wed etc.
It is used to form the ‘English genitive’, John's book, the horses’ hooves etc. This is a development of (1), where historically the apostrophe stood for an elided e. (Similarly the printed form ’d for ed was very common before the nineteenth century.) Although in decline (witness pigs trotters, Girls School Trust), its use continues in contemporary English, where it is fiercely promoted as correct grammar, despite (or it might be closer to the truth to say because of) its complete semantic redundancy.
For these reasons, the English stemmer treats apostrophe as if it were a letter, removing it from the beginning of a word, where it might have stood for an opening quote, from the end of the word, where it might have stood for a closing quote, or been an apostrophe following s. The form ’s is also treated as an ending.
We provide a reference implementation of the original Porter stemmer as described by Martin Porter's 1980 paper. The paper does not include any special handling of apostrophes, so since this is intended as a reference implementation, our implementation does not either.
We provide a implementation of Beth Lovins' very early stemming algorithm. This handles -'s and -s' suffixes.
Inflections of 'sti are expanded into forms of esti. The words l' and "un' become la and unu. A final apostrophe becomes aŭ after certain known stems, or else o.
Apostrophe is used to separate Estonian suffixes added to foreign words. Since 3.1.0, our stemmer includes handling for this.
Apostrophe is used to separate Finnish suffixes added to foreign words. Since 3.1.0, our stemmer includes handling for this.
French elisions (e.g. d'-, l'-, m'-, qu'-) are removed since Snowball 3.0.0.
An apostrophe can be added before a possessive s suffix to prevent ambiguity (e.g. Andrea's Blumenladen where Andreas is a male forename). It is also used with adjectives derived from proper nouns (e.g. Einstein'sche Relativitätstheorie). Since 3.1.0, our stemmer includes handling for these.
Irish has some contractions which appear as prefixes. Our Irish stemmer handles d'-, m'- and b'-.
Italian elisions (e.g. d'-, l'-, m'-) are removed since Snowball 3.1.0.
An apostrophe is sometimes used to separate a Lithuanian ending on an international word (for example, george's, parking'as). To handle this case, a trailing apostrophe is removed as a final step since Snowball 3.1.0.
Apostrophe is sometimes used in Norwegian to separate Norwegian suffixes from words of foreign origin. This is actually incorrect usage - the correct alternative is to use a hyphen instead of an apostrophe - but it's widespread enough that we should handle it, and we do since Snowball 3.1.0.
Polish uses an apostrophe to separate loanwords from native suffixes, for example: olly'ego, george'a. The correct use is to mark the elision of the final sound of a loanword before a Polish inflectional ending, but apparently it's also often used with any loanword.
Since Snowball 3.1.0, the Polish stemmer removes an apostrophe if the stem ends with one after it has removed a noun, adjective or verb suffix.
Our Russian stemmer handles suffixes which include an apostrophe.
In modern Turkish orthography, an apostrophe is used to separate proper names from any suffixes - for example Türkiye'dir ("it is Turkey"). Since Snowball 3.0.0, our Turkish stemmer removes such suffixes.