I have been learning Esperanto lately (see here). One of the really cool features of the language is affixes. Basically, you can create new words using some simple morphological rules, e.g.:

bona (good) → bonulo (good person)
juna (young) → junulo (young person)

vorto (word) → vortaro (group of words = dictionary/vocabulary)
arbo (tree) → arbaro (group of trees = forest)

There are a lot of affixes (at time of writing I have 48 suffixes and 18 prefixes), so I thought it might be useful to write a small program to create new words by randomly attaching these affixes, then quizzing myself on them.

Here it is. (see soup.py)

Usage is like this, for example:

> soup(root=u'hundo', n_p=1, n_s=4, cheat=True)
hundo   + pseŭdo : false
        + uj : container for objects described by root
        + esk : similar to/in the manner of whatever is described by root
        + eg : augments or strengthens idea shown by root affix(opposite of -et)
        + ec : quality/characteristic defined by root

n_p is the number of prefixes, n_s suffixes. The cheat flag toggles printing the explanation.

So let's interpret pseŭdohundujeskegeco... this is an abstract noun, the quality/characteristic of being a large thing similar to a container for false dogs. Or a false quality of being a large thing similar to a container for dogs. The order of interpretation is clear for suffixes or prefixes, I'm not sure how to resolve it when both are present.

This is obviously a ridiculous word which no normal person would use, but I find generating and interpreting these very entertaining. Another example...

baledejarinegestro: boss of an enormous, somehow female collection of ballet theatres

I could go on all day. To save myself the effort of doing this, I automated it. So now there's a...

Twitter bot: vortidplenigilo

tool to make [something] full of word derivatives, from vorto + ido + plena + igi + ilo.

Every hour (or so), it tweets a random root (grabbed from a dictionary) with a random number of suffixes and prefixes. Code is in the same repo as before, see vortidplenigilo.py. It chooses how many affixes to use based on two draws from Poisson distributions, preferring fewer prefixes. Since it's limited by Twitter's 140 character limit, those with n_s or n_p above 1 tend not to make it, unfortunately. Future work will shorten the descriptions so I can squeeze more in. The selection of which affix is not entirely random, however...

Making affixes make sense

Not every affix can go on every type of word. Some take nouns and output nouns, other take nouns and output adjectives, etc. The page I grabbed the affixes from thankfully lists which transformations are valid, so I encoded that. See affixes.py for what is essentially a rendering of aforementioned page into python. The sort of information I recorded is explicit in this class definition:

class affix(object):
    def __init__(self, name=u'undefined', 
                 transformations={'x': 'x'}, 
        self.name = name
        self.transformations = transformations
        self.explanation = explanation
        self.conflicts = conflicts
        self.category = category

arbarero: one of a collection of trees... a tree
dormigiĝi: to become made to be asleep... to sleep

I'm not entirely convinced I want this, though. For example,

hundetego: huge small dog

sort of makes sense. Jury is out on this decision.

One of its first tweets was beautifully meta:

morfologiido: offspring of morphology


I would gladly welcome comments/ideas on the GitHub repository, be it language suggestions or corrections (since I am still a komencisto), code fixes, ideas for automatically producing 'interpretations' of the generated words, or anything else. The contents of affixes.py might also be useful for other people doing things with Esperanto.