It’s understanding what people say that is valuable

What consumers say online is an invaluable resource for anyone who wants to understand what ordinary people think.

Online discussion is unprompted and unbiased. It is perhaps the most valuable repository of people’s thoughts you could ever find access to. It is certainly the largest – the social media landscape is made up of sites and discussion on any conceivable topic. There is often impressive depth, too, with long debates and detailed reviews frequently driving buzz.

However, there is a major hurdle to any meaningful understanding of buzz: how do you extract the actual value from online discussion? Even if you can locate all the conversations which mention you, you still need to work out what those conversations are about. To do this you must categorise discussion into topics.

Example: iPhone Launch

The challenge of understanding topics is neatly illustrated using the example of Apple iPhone applications. Ever since the iPhone was announced it has generated substantial volumes of discussion online. Just knowing that much, however, yields very little insight. Using Google Trends, or a similar tool, anyone could establish that there was a lot of buzz about the iPhone.

When the same information is analysed by topic it becomes much more valuable. For instance, we can capture everything relating to the iPhone, but focus our analysis only on its apps. This way we can see what the key topics discussed are (quality, range and price) and therefore start to understand what elements concerning the iPhone’s applications people engage with. The same process repeated for other aspects of the iPhone enables us to construct a balanced and granular understanding of the make-up of discussion.

iPhone applications: buzz by topic

iPhone applications: buzz by topic

Note: buzz reflects online discussion taking place in July and August 2008

The key to success: how to categorise buzz

Once you realise that you can categorise buzz, the real challenge starts: deciding what the categories should be.

To understand why this is important, consider the example of a pet food manufacturer who is trying to establish why consumers purchase its products rather than those of competitors. There are many different ways the data can be categorised: by breed of pet, by specific product name or product type or by comparisons with rival pet food brands, to name but a few. It is essential that the categorisation scheme is logical, or else it will not help meet the research objectives.

It would clearly be impossible to work with a categorisation scheme where pet type and product type were jumbled up together! Equally, if there were no sub-categories, things would easily get out of hand. Just imagine all discussion about different pet food brands falling within one category – for instance “cat food”: if you now wanted to examine and compare this discussion by other factors – for instance “healthiness” or “value for money” – the data would be nearly impossible to interpret, as all the brands would all be lumped into one.

It may seem far-fetched that anyone would attempt to work with categorisation schemes that are not logical, but such schemes are very common. Automated schemes thrown out by NLP (Natural Language Processing) technologies are frequently used and generally produce categories that are illogical, inconsistent and overlapping. They also work differently across languages, so comparing results from different countries is problematic to say the least.

Requirements for social media monitoring categorisation schemes

At WaveMetrix we think there are six basic requirements categorisation schemes for social media monitoring must fulfil. If you cannot classify what people say in this way, chances are you won’t be able to make sense of any buzz you collect:

The six categorisation scheme requirements

  • 1. Logical structure. Your categorisation scheme must be logical. If you are looking at “animals”, you must be able to split them into “cats”, “dogs” and “birds”. Neither sticking at just the high level with “animals”, nor having something arbitrary where things that are not “animals” get added alongside them (say “fast food” or “cars”) is usable
  • 2. Hierarchical sub-categories. Once you have a category, say “dogs”, you must be able to break it down into sub-categories such as “gun dogs”, “toy dogs” and so on. You must be able to create as many layers of categories as you want. “Gun dogs” must also be capable of being broken down into sub-categories: “pointers” and “retrievers”. It is essential that you control which of these sub-categories relate to which main category. If not, you will find Persian cats amongst your gun dogs - and Labradors all over the place!
  • 3. Choice. The categorisation scheme you need is dictated by the problem you want to solve. Very often it is also dependent on the way your organisation thinks about the market it is in and the products or services it offers. You want a structure which fits with your existing trackers and traditional research. For all these reasons it is essential that you can choose your own categorisation scheme (and not just live with what is generated mathematically by an automated tool)
  • 4. Multiple dimensions. There are many ways to categorise buzz. Of course, you can code product or service attributes. But online discussion can also provide a lot of insight into use cases, purchase drivers or brand values. You may also want to look at advocacy separately as it is often a big sales driver. Most importantly, you should be able to have your cake and eat it too! Just because you code features in one dimension, for instance, does not mean you should not be able to code purchase drivers in another

    Amstrad Sky+ box, buzz and sentiment by type of consumer

    Amstrad Sky+ box, buzz and sentiment by type of consumer

    This graph shows types of consumer generated buzz around an Amstrad Sky+ box. Coding buzz in multiple dimensions means we can gain insight into areas like consumer type on the one hand...

    Amstrad Sky+ box, buzz by emotion

    Amstrad Sky+ box, buzz by emotion

    ...and also customer emotions, as shown in this chart - also around the Amstrad Sky+ box.
  • 5. Language consistency. If you want to compare performance in one country against that in another and get a clear picture of geographic differences you need to have the same categorisation scheme in all countries. The same point made by two posters in different countries - one in China and one in the US, say - must be coded the same way, otherwise all international comparisons will be meaningless. Sadly, automated categorisation tools such as clustering often fail to create cross-language consistency
  • 6. Benchmarkability. Imagine you are a large multinational brand which sells all sorts of products worldwide. Some of the most valuable insights online discussion can provide are benchmarks which give you a real measure of how well you are performing against the competition. For instance, you may want to know how much buzz to expect after a new product launch, or what proportion of discussion tends to involve reliability issues or poor customer service. Or you may be a film studio wanting to understand how your movie trailer compares to rivals’ in its success at persuading viewers that they want to see your film. Or you may simply want to know whether buzz about your product in a particular country is unusually low, or whether that market tends towards low levels of online engagement generally
    Powerful benchmarks can immediately tell you where you are successful and where you are not. But to create them you have to design smart categorisation schemes. Your research needs to maximise insight into that product and simultaneously be consistent with every other product you have to enable benchmarking.

    Harry Potter: buzz and sentiment by country

    Harry Potter: buzz and sentiment by country

    This graph shows percentage of buzz and sentiment specifically around the mood of the trailer for Harry Potter and the Half-Blood Prince generated in two weeks during March 2009. Covering multiple countries adds a new dimension to analysis, as everything is categorised according to a universally consistent scheme. Benchmarking is also enhanced by data from a range of markets.

     

    Example case: Harry Potter trailer