Principle 3: Sentiment
"We use an action based scale to avoid bias"
You may be forgiven for thinking that measuring online sentiment is reasonably easy. After all, there are plenty of automated social media monitoring tools out there which are pretty good at measuring sentiment. You can even get fairly reasonable results by just creating standard lists of positive and negative words to go with your keywords. Think “brand” + “good” or “nice” or “valuable”, for example.
Unfortunately, you would be wrong. On closer inspection it turns out that measuring sentiment is quite tricky, for two reasons: first, it is hard to work out what exactly you should be measuring; second, measuring this consistently across languages and styles is inherently problematic.
What to measure? That may seem obvious: 'online discussion!' But it’s not as straightforward as all that - you have choices. Would you, for example, measure the sentiment of posts or of sentences? Or perhaps of paragraphs or pages?
The most common approach is to measure posts as units. But what happens when someone says, “I just love this car - it looks great. But I would never buy one with that engine”? Would you classify this as a neutral post, on balance? If yes, we think you’re missing out. The post actually contains two separate points, one positive (about the design of the car) and one negative (about the engine).
In fact, the vast majority of posts contain a mix of opinions, positive and negative. Automated tools which look at whole posts or pick a chunk of words near the search term invariably lump positive and negative opinions together to form a bland average, which therefore more often than not tends towards neutral or “mixed”. What this means is that you lose valuable insights (in the above example, the combination of great style and poor engine) and instead end up with a mumbo-jumbo of no real significance.

Indeed, what is the point of having a super-sophisticated categorisation scheme if you can’t measure sentiment accurately by category?
To get real insight you must be able to measure sentiment for each category and sub-category. You can only do that if you can accurately identify which individual comment relates to which category. That means you cannot get by with shortcuts like measuring the average sentiment of posts. You need a great deal of granularity in the aspects of discussion for which you measure sentiment.
How do you know that the same point, made in different languages with different cultural emphasis and using very different writing styles, is measured the same in all cases? It may be fine to have some sort of rough sentiment tool that can sort positive from negative in English, but how do you do it consistently in Spanish, Portuguese, Cantonese, Hindi...? How do you consistently measure the university professor and the teen, when their language styles may be miles apart?
Measuring sentiment using some sort of algorithm which allocates positive or negative strengths to key words is not a good approach. Some languages are much more flowery when voicing both praise and criticism than others. Similarly, the university professor and the teen may have very different vocabularies, syntactical habits and cultural references.
At WaveMetrix we believe that sentiment classification should be based on actions taken, rather than words used. “I bought the car because of the looks” expresses clearly positive sentiment which can be classified identically across all languages and communication styles.