Challenge 2: Topics
"It's understanding what people say that is valuable"
What consumers say online is an invaluable resource for anyone who wants to understand what ordinary people think.
Online discussion is unprompted and unbiased. It is perhaps the most valuable repository of people’s thoughts you could ever find access to. It is certainly the largest – the social media landscape is made up of sites and discussion on any conceivable topic. There is often impressive depth, too, with long debates and detailed reviews frequently driving buzz.
However, there is a major hurdle to any meaningful understanding of buzz: how do you extract the actual value from online discussion? Even if you can locate all the conversations which mention you, you still need to work out what those conversations are about. To do this you must categorise discussion into topics.
The challenge of understanding topics is neatly illustrated using the example of Apple iPhone applications. Ever since the iPhone was announced it has generated substantial volumes of discussion online. Just knowing that much, however, yields very little insight. Using Google Trends, or a similar tool, anyone could establish that there was a lot of buzz about the iPhone.
When the same information is analysed by topic it becomes much more valuable. For instance, we can capture everything relating to the iPhone, but focus our analysis only on its apps. This way we can see what the key topics discussed are (quality, range and price) and therefore start to understand what elements concerning the iPhone’s applications people engage with. The same process repeated for other aspects of the iPhone enables us to construct a balanced and granular understanding of the make-up of discussion.

Once you realise that you can categorise buzz, the real challenge starts: deciding what the categories should be.
To understand why this is important, consider the example of a pet food manufacturer who is trying to establish why consumers purchase its products rather than those of competitors. There are many different ways the data can be categorised: by breed of pet, by specific product name or product type or by comparisons with rival pet food brands, to name but a few. It is essential that the categorisation scheme is logical, or else it will not help meet the research objectives.
It would clearly be impossible to work with a categorisation scheme where pet type and product type were jumbled up together! Equally, if there were no sub-categories, things would easily get out of hand. Just imagine all discussion about different pet food brands falling within one category – for instance “cat food”: if you now wanted to examine and compare this discussion by other factors – for instance “healthiness” or “value for money” – the data would be nearly impossible to interpret, as all the brands would all be lumped into one.
It may seem far-fetched that anyone would attempt to work with categorisation schemes that are not logical, but such schemes are very common. Automated schemes thrown out by NLP (Natural Language Processing) technologies are frequently used and generally produce categories that are illogical, inconsistent and overlapping. They also work differently across languages, so comparing results from different countries is problematic to say the least.
At WaveMetrix we think there are six basic requirements categorisation schemes for social media monitoring must fulfil. If you cannot classify what people say in this way, chances are you won’t be able to make sense of any buzz you collect:



