When I was a kid, I went through an 80s music phase…well, some things never change. “People just love to play with words…” Know that song? Anyway…
One of the biggest pains of text mining and NLP is colloquialism — language that is only appropriate in casual language and not in formal speech or writing. Words such as informal contractions (“gonna”, “wanna”, “whatcha”, “ain’t”, “y’all”) are colloquialisms and are everywhere on the Web. There is also a great deal of slang common on the Web including acronyms/emoticons (“LOL”, “WTF”) and smilies that add sentiment to text. There is also a less used slang called leetspeak that replaces letters with numbers (“n00b” rather than “noob”, “pwned” instead of “owned” and “pr0n” instead of “porn”).
There are also regionalisms which are a pain for semantic analysis but not so much for probabilistic analysis. Some examples are pancakes (“flapjacks”, “griddlecakes”) or carbonated beverages (“soda”, “pop”, “Coke”). Or, little did I know, “maple bars” vs. “Long Johns”. Now I am hungry. There are also words that have a formal and informal meeting such as “kid” (a young goat, or a child…same thing).
Source: http://popvssoda.com/
Linguists consider colloquialisms different than slang. Slang is informal language used by a specific [...]
Popular Posts