(This is the second part of the post – How to analyse your customer social profile in 24 hours – Data and Collection)
Community Level
After collecting the data as described in the previous post, we can look into the data and visualize some aspects of it. There are many questions we can ask of this data, however an obvious one is what are the people who are engaged in the given topic talking about? This is important to understand in order to identify how the organisation can approach these users and market itself. We can get a quick overview of this by producing a word cloud of the most used words in the tweets.
This gives a high level overview of the topics that might be worth looking into. Note that this goes beyond the initially specified tags (see the previous post), as it discovers a number of associations between the given topics and what people actually talk about. This high level extraction can be replaced by more sophisticated methods that can assign importance to the topics based on the influence of the user and how close they are to the target organisations. Essentially, this is a trade-off between exploring new topics and exploiting known topics.
We can also produce a word cloud of the topics, followers of the organisation were talking about.
As these tweets were not restricted to the topics of the target organisation, we get a wide range of keywords here. It is clear, that apart from the main topics that interests most of the people, these people have a slight divergence from the average towards the target organisation. This divergence is what we are interested in identifying. You can spot some keywords that would not turn up from a randomly sampled population of Twitter, for example cfpreform, GreenpeaceUK, overfishing, BristolZooGdns etc. This suggests that the followers of this account are more interested in nature related topics.
Individual Level
Now we can look at what can be derived about the people who form these two communities. We would like to understand who are the influential people who form opinions and spread information. In other words, we aim to identify who the influencers are and therefore who the organisation should start to engage with more closely. This information can also be fed back to the previous analysis by weighting topics and keyword depending on how influential the originator of the tweet is. The simplest way to identify people is to look how active they are and how many followers they have (note that these two factors are not independent). In the literature, other factors were incorporated into this score including the number of retweets and mentions. The figure below shows the top-20 followers of the RSPB account.
These users produced around 7% of the tweets we collected. It is important to concentrate on these people as the information they find interesting is very likely to spread. Users in the long tail should not be abandoned too, however, it is more problematic to define a strategy to reach out to those users. The same approach can also be applied to the users of the target topic; the top influences can be identified and targeted. However, the long tail distribution is even more apparent in that case, as the top-20 users produced only 5% of the tweets.
User profiling
Apart from detecting whether a user is influential, a number of additional characteristics can be inferred. From example, using various data-mining and predicting techniques, my Twitter profile analysis says that
- I am 25-34 year old
- I live near London, England, United Kingdom
- I can potentially reach 1378 users
- My network is composed of 25-34 year olds followed by 18-20 year olds
- I frequently talk about android, apple and big data
- My personality type is inquisitive, cautious (source: )
- My style is academic