As we have previously discussed on this blog, Ethersource constantly and continuously learns new terminology by reading what is written on the Internet. As an example of how Ethersource picks up even weak linguistic signals, we noticed recently that Ethersource suggested the word “tutilurfräs” as a very positive Swedish term. None of us had ever encountered the term “tutilurfräs” before. We looked up the source of this linguistic invention, and found that it originates from a tweet by Swedish punk icon Kajsa Grytt, where she writes that:
Å så Pelle!! Å så Hives! Vilket tutilurfräs!! Jag tycker de är genialiska. Blir helt jävla lycklig av det bandet.
— Kajsa Grytt (@KajsaGrytt) March 30, 2012
A (somewhat creative) translation in English would be something like: “Oh Pelle! Oh Hives! What tutilurfräs!! I think they are genius. That band makes me absolutely happy.”
Quite obviously, Ethersource is correct in its understanding that “tutilurfräs” is a very positive word.
There are two lesson to be drawn from this example:
- If you do sentiment analysis in Swedish on Twitter and your model does not automatically learn new terminology, you should re-train or update your model to include the word “tutilurfräs“.
- If you invent a completely new word and start blogging or tweeting about it, Ethersource will learn it. It is true that in space, no one can hear you scream, but on the Internet, even if you whisper Ethersource will understand you.