Re: There’s No Comparing Male and Female Harassment Online
Posted: Sun Jun 28, 2015 11:01 pm
5 Digital observation
The chapters above have described and showcased a new kind of event-specific research method to understand attitudes on Twitter: digital observation.
It is essential to know whether, how far, and in what ways this method of analysis can actually tell us something about people’s attitudes – their values, concerns, dispositions, fears and convictions. Finally, what is its future?
Our study found that these data are extremely valuable. We found millions of digital voices talking about EU-related themes, in real time. Many tweets expressed political attitudes about pressing events as they were happening. These tweets were surrounded by a cloud of metadata – everything from when the tweet was made, to how many followers the tweeter has, and sometimes where they are. Some of these metadata were leveraged in this project to aid analysis – but much more could be done (and is being done elsewhere). Overall, Twitter is a new venue for politics, and there exists an extremely valuable opportunity to understand it.
We found that such data sets are ‘social big data’. They are often far larger than comparative data sets gathered through conventional polling, interviewing and surveying techniques. Social media data are also noisy, messy and chaotic. Twitter is prone to viral surges in topic, kinds of language used, theme and meme. Twitter data sets are also subject to ‘powerlaws’: the most prolific tweeters tend to be much more prolific than others, those with the most followers tend to have many more followers than anyone else, the most shared links tend to be much more shared than any other. Taken together, any given data set will be profoundly influenced by a number of factors that are very difficult to anticipate beforehand.
Conventional polls, surveys and interviews are not designed to handle the speed and scale at which data are created on Twitter. We found that in order to understand Twitter data, we needed to deploy new technologies that are unfamiliar to sociologists and sociological methods.
Our solution – digital observation – attempted to reconcile and integrate new technologies with conventional techniques, and the long-standing values of social science, but as with any new method of analysis there is a pervasive concern for its quality and credibility.
Generalisability
A key challenge to digital observation is generalisability. When a smaller, representative group is studied, it allows us to extend the findings onto the wider group from which it is drawn. Digital observation does not study representative groups for various reasons:
The data gathered from Twitter may not represent Twitter
Strategies to gather data from Twitter, including our own, often return large bodies of data that are non-representative expressions of systemic non-random bias. [72] As we described above, we used APIs to deliver tweets that match a series of search terms. The search terms that we used attempted (imperfectly) to gather as many tweets about a given topic as possible, and as few tweets about any other topic as possible. This is difficult to achieve: language use on Twitter is constantly changing, and subject to viral, short-term changes in the way that language is mobilised to describe any particular topic. Trending topics, #tags and memes change the landscape of language in ways that cannot be anticipated, but can crucially undermine the ability of any body of search terms to return a reasonably comprehensive and precise sample. It is therefore probable that tweets about the relevant issue were missed and these tweets, through virtue of using different words and expressions, may be systematically different in attitudes to the ones we did collect.
Tweets may not represent Twitter users
In general, tweets are produced by a small number of high-volume tweeters. Some research suggests that a small number, around 5 per cent, of ‘power-users’ on Twitter are responsible for 75 per cent of Twitter activity. [73] These include a small number of dedicated commentators or campaigners on a related issue.
Twitter users may not represent actual people
We found a number of prolific accounts in the data sets that we gathered that not only accounted for a large number of tweets, but were also not EU citizens – our target demographic. These included:
• ‘Twitterbots’ or ‘fake’ accounts programmed to produce automated posts on Twitter
• Official accounts, especially from the EU itself, including the accounts of EU politicians, communications and external affairs agencies and EU offices. [74]
Twitter users may not be representative of EU citizens Take-up and use of Twitter has not been consistent across EU member states or within them:
• Geographically: Around 16 per cent of Europeans use Twitter, and a higher proportion of the population use Twitter in Britain than in France or Germany. Most tweets cannot be accurately located to a particular area – and this study differentiated only on the basis of the language, not specific location, of the tweet.
• Demographically: The background of people who use Twitter continues to change, and is linked to the complex phenomenon of how people adopt technology and new habits of using technology. The demographic of the EU’s Twitter users is unlikely to reflect the overall demographic of the EU. The most detailed demographic studies of Twitter use, from the USA, have identified that Twitter users there tend to be young, affluent, educated and non-white. [75]
Digital observation
Truly getting hold of attitudes is a fraught process. Attitudes are complex constructs, labels for those myriad ‘inclinations and feelings, prejudice and bias, preconceived notions, ideas, fears, threats, and convictions’, which we can only infer from what people say. [76] Does digital observation really uncover attitudes? Can it reliably measure what people say, and does what people say relate to the attitudes that they have? [77]
We have drawn the following conclusions:
Attitudes on Twitter are mixed with a lot of ‘noise’
A significant proportion of our data did not appear to include any discernible attitude at all: the general broadcasting of information, in tweets and through the sharing of links. [78] Practically, therefore, the mixture of attitudinal and nonattitudinal data drawn from Twitter are not always readily distinguishable. Why precisely people decide to share certain stories is not well understood – and has, to our knowledge, not been studied in detail.
The use of natural language processing is necessary
Faced with far too much data of differing quality and relevance to read and sort manually, the use of new, automated technologies was necessary. The ability of digital observation to measure accurately what millions of people are saying depends on the success or failure of a vital new technology – NLP. Assessing whether and when it can work is vital to understanding when digital observation can add insight, and when it cannot.
To be successful, natural language processing must be used on events, not generically
We showed in chapter 2 that the success of NLP technology overwhelmingly depends on the context in which it is used. Natural language processing tends to succeed when built bespoke to understand a specific event, at a specific time. It tends to fail when it is used in attempts to understand nonspecific data over a long period of time.
When used correctly, natural language processing is highly accurate
Where NLP was used appropriately, it was very accurate. As it continues to improve, it is clear that NLP has great potential as part of a reliable and valid way of researching a large number of conversations.
Digital observation will always misinterpret some data
The meaning of language – its intent, motivation, social signification, denotation and connotation – is wrapped up in the context where it was used. When tweets are aggregated as large data sets, they lose this context. Because of this, neither the manual nor automated analysis of tweets will ever be perfect. Automated analysis especially will struggle with non-literal language uses, such as sarcasm, pastiche, slang and spoofs.
Even if we can accurately measure tweets, what do they mean? We make the following observations:
Attitudinal indicators on Twitter may not represent underlying attitudes
There is no straightforward or easy relationship between even attitudinal expressions on Twitter, and the underlying inclinations of the tweeter. Twitter is a new medium: digital social platforms, including Twitter, are new social spaces, and are allowing the explosion and growth of any number of digital cultures and sub-cultures with distinct norms, ways of transacting and speaking. This exerts ‘medium effects’ on the message – social and cognitive influences on what is said. ‘Online disinhibition effect’ is one such influence – where statements made in online spaces, often because of the immediacy and anonymity of the platform, are more critical and rude, and less subject to offline social norms and etiquettes than statements made offline.
It is unclear how Twitter fits into people’s lives
To understand how attitudes on Twitter relate to people, it is important to understand how Twitter fits into people’s broader lives, how they experience it, and when they use it. Social media, including Twitter, as a widespread habit as well as a technology, is constantly evolving. Our event-specific research was an attempt to fit attitudes on Twitter into how Twitter fits into people’s lives. By providing context to situate attitudinal data from Twitter into a narrative of events, it also could then touch on causes, consequences and explanations of attitudes – the ‘why’ as well as the ‘what’.
Current methods struggle to move from ‘what?’ to ‘why?’
The generation of raw, descriptive enumeration of attitudes is not enough. Beyond this, researchers must engage with and contribute towards more general explanatory theories – abstract propositions and inferences about the social world in general, causes and explanations, even predictions – ‘why?’ and ‘where next?’, as well as ‘what?’. Sociologists understanding meaning in this way often draw on different theories – from positivism to interpretivism and constructionism – each with their own ideas on how to expose the representational, symbolic or performative significance implied or contained in what is said.
Conclusion: a new type of attitudinal research
Digital observation cannot be considered in the same light as a representative poll. Our digital observation of the EU did not attempt to intervene within the EU – by convening a panel, mailing out interviews – to attempt to understand what the whole of the EU thinks. Rather, it lets a researcher observe a new, evolving digital forum of political expression, the conversations of the EU’s energised, arguing digital-citizens as they otherwise and anyway talk about events.
This new technique to conduct attitudinal research has considerable strengths and weaknesses compared with conventional approaches to research. It is able to leverage more data about people than ever before, with hardly any delay and at very little cost. On the other hand, it uses new, unfamiliar technologies to measure new digital worlds, all of which are not well understood, producing event-specific, ungeneralisable insights that are very different from what has until now been produced by attitudinal research in the social sciences.
We believe digital observation is a viable new way of beginning to realise the considerable research potential that Twitter has. It will continue to improve as the technology gets better, and our understanding of how to use and our sense of how digital observation fits in with other ways of researching attitudes become more sophisticated.
Overall
An interaction of qualitative and quantitative methods
Automated techniques are only able to classify social media data into one of a small number of preset categories at a certain (limited) level of accuracy for each message. Manual analysis is therefore almost always a useful and important component; in this report it is used to look more closely at a small number of randomly selected pieces of data drawn from a number of these categories. In scenarios when a deeper and subtler view of the social media data is required, the random selection of social media information can be drawn from a data pool, and sorted manually by an analyst into different categories of meaning.
Subject matter experts at every step
It is vital that attempts to collect and analyse ‘big data’ attitudes are guided by an understanding of what is to be studied: how people express themselves, the languages that are used, the social and political contexts that attitudes are expressed in, and the issues that they are expressed about. Analysts who understand the issues and controversies that surround the EU are therefore vital: to contextualise and explain the attitudes that are found on Twitter, and to help build the methods used to find and collect these attitudes.
For acquiring data
New roving, changeable sampling techniques
The collection of systemically biased data from Twitter is far from easy. The search terms that are used are vulnerable to the fact that Twitter is chaotically subject to viral, short-term surging variations in the way that language is mobilised to describe any particular topic. During this study, a new data acquisition technique was piloted that attempted to reflect the changing and unstable way people discuss subjects on Twitter. The ‘information gain cascade’ was developed. It is a method intended to ‘discover’ words and phrases that coincide with, and therefore indicate, topics of interest. To do this, a sample of tweets on a topic is collected using high recall ‘originator terms’. A relevancy classifier is built for this stream in the usual way and applied to a large sample of tweets.
The terms (either words or phrases) that this classifier uses as the basis for classification are ranked based on their information gain: a measure of the extent to which the term aligns with the relevant or irrelevant classes. Terms that are randomly distributed between the relevant and irrelevant classes have low information gain, and terms that are much more likely to be in one class than another have high information gain. The terms that have high information gain in the relevant class are designated ‘candidate search terms’. Each candidate search term is then independently streamed, to create its own tweet sample, analysed on their own merits and then, on the decision of an analyst, either graduated to become full search terms, or discarded. This process iteratively ‘cascades’ to continuously construct a growing cloud of terms discovered to be coincident with the originator terms.
This approach allows the search queries used to arise from a statistical appreciation of the data themselves, rather than the preconceptions of the analyst. This method is designed to produce samples containing a large proportion of all conversations that might be of interest – high recall.
Automatic identification of twitcidents
An important but separate area of study is to detect the emergence of twitcidents automatically through statistically finding the ripples that they cast into the tweet stream. [79] This technology can be used to identify twitcidents as they occur, allowing for the research to be real time, and used reactively.
For analysis
Natural language processing classifiers should:
• be bespoke and event-driven rather than generic
• work with each other: classifiers, each making a relatively simple decision, can be collected into larger architectures of classifiers that can conduct more sophisticated analyses and make more complex overall decisions
• reflect the data: when categories to sort and organise data are applied a priori, there is a danger that they reflect the preconceptions of the analyst rather than the evidence. It is important that classifiers should be constructed to organise data along lines that reflect the data rather than the researcher’s expectations; this is consistent with a wellknown sociological method called grounded theory [80]
For interpretation
• Accepting uncertainty: Many of the technologies that can now be used for Twitter produce probabilistic rather than definite outcomes. Uncertainty is therefore an inherent property of the new research methods in this area, and the insights they produce. Therefore there needs to be an increased comfortableness with confidence scores and systematically attached caveats in order to use them.
• From metrics to meaning: Of all aspects of attitudinal research on Twitter, the generation of meaningful insight that can be acted on requires the most development, and can add the most value. Attitudinal measurements must be contextualised within broader bodies of work in order to draw out causalities and more general insights.
For use: the creation of digital observatories
Organisations, especially representative institutions, now have the opportunity to listen cheaply to attitudes expressed on Twitter that matter to them. They should consider establishing digital observatories that are able to identify, collect and listen to digital voices, and establish ways for them to be appropriately reflected in how the organisation behaves, the decisions it makes and the priorities it has. Digital observatories, constantly producing real-time information on how people are receiving and talking about events that are happening, could be transformative in how organisations relate to wider societies.
There must be clear understanding of how they can be used. In the face of the challenges that have just been outlined, the validation of attitudinal research on Twitter is especially important in two senses. Digital observation must:
• validate social media research by the source itself, such as through a common reporting framework that rates the ‘confidence’ in any piece of freestanding piece of research that points out potential vulnerabilities
• address biases in the acquisition and analysis of the information and caveats outcomes accordingly
Social media outputs must be cross-referenced and compared with more methodologically mature forms of offline research, such as ‘gold standard’ administered and curated data sets (such as Census data, and other sets held by the Office for National Statistics), [81] and the increasing body of ‘open data’ that now exists on a number of different issues, from crime and health to public attitudes, finances and transport, or bespoke research conducted in parallel to research projects. [82] The comparisons – whether as overlays, correlations, or simply reporting that can be read side by side – can be used to contextualise the safety of findings from social media research.
Digital observations must be weighed against other forms of insight. All attitudinal research methods have strengths and weaknesses – some are better able at reaching the groups that are needed, some produce more accurate or detailed results, some are quicker and some are cheaper. It is important to recognise the strengths and weaknesses of attitudinal research on Twitter, relative to the other methods of conducting this sort of research that exist, to be clear about where it fits into the methodological armoury of attitudinal researchers.
The chapters above have described and showcased a new kind of event-specific research method to understand attitudes on Twitter: digital observation.
It is essential to know whether, how far, and in what ways this method of analysis can actually tell us something about people’s attitudes – their values, concerns, dispositions, fears and convictions. Finally, what is its future?
Our study found that these data are extremely valuable. We found millions of digital voices talking about EU-related themes, in real time. Many tweets expressed political attitudes about pressing events as they were happening. These tweets were surrounded by a cloud of metadata – everything from when the tweet was made, to how many followers the tweeter has, and sometimes where they are. Some of these metadata were leveraged in this project to aid analysis – but much more could be done (and is being done elsewhere). Overall, Twitter is a new venue for politics, and there exists an extremely valuable opportunity to understand it.
We found that such data sets are ‘social big data’. They are often far larger than comparative data sets gathered through conventional polling, interviewing and surveying techniques. Social media data are also noisy, messy and chaotic. Twitter is prone to viral surges in topic, kinds of language used, theme and meme. Twitter data sets are also subject to ‘powerlaws’: the most prolific tweeters tend to be much more prolific than others, those with the most followers tend to have many more followers than anyone else, the most shared links tend to be much more shared than any other. Taken together, any given data set will be profoundly influenced by a number of factors that are very difficult to anticipate beforehand.
Conventional polls, surveys and interviews are not designed to handle the speed and scale at which data are created on Twitter. We found that in order to understand Twitter data, we needed to deploy new technologies that are unfamiliar to sociologists and sociological methods.
Our solution – digital observation – attempted to reconcile and integrate new technologies with conventional techniques, and the long-standing values of social science, but as with any new method of analysis there is a pervasive concern for its quality and credibility.
Generalisability
A key challenge to digital observation is generalisability. When a smaller, representative group is studied, it allows us to extend the findings onto the wider group from which it is drawn. Digital observation does not study representative groups for various reasons:
The data gathered from Twitter may not represent Twitter
Strategies to gather data from Twitter, including our own, often return large bodies of data that are non-representative expressions of systemic non-random bias. [72] As we described above, we used APIs to deliver tweets that match a series of search terms. The search terms that we used attempted (imperfectly) to gather as many tweets about a given topic as possible, and as few tweets about any other topic as possible. This is difficult to achieve: language use on Twitter is constantly changing, and subject to viral, short-term changes in the way that language is mobilised to describe any particular topic. Trending topics, #tags and memes change the landscape of language in ways that cannot be anticipated, but can crucially undermine the ability of any body of search terms to return a reasonably comprehensive and precise sample. It is therefore probable that tweets about the relevant issue were missed and these tweets, through virtue of using different words and expressions, may be systematically different in attitudes to the ones we did collect.
Tweets may not represent Twitter users
In general, tweets are produced by a small number of high-volume tweeters. Some research suggests that a small number, around 5 per cent, of ‘power-users’ on Twitter are responsible for 75 per cent of Twitter activity. [73] These include a small number of dedicated commentators or campaigners on a related issue.
Twitter users may not represent actual people
We found a number of prolific accounts in the data sets that we gathered that not only accounted for a large number of tweets, but were also not EU citizens – our target demographic. These included:
• ‘Twitterbots’ or ‘fake’ accounts programmed to produce automated posts on Twitter
• Official accounts, especially from the EU itself, including the accounts of EU politicians, communications and external affairs agencies and EU offices. [74]
Twitter users may not be representative of EU citizens Take-up and use of Twitter has not been consistent across EU member states or within them:
• Geographically: Around 16 per cent of Europeans use Twitter, and a higher proportion of the population use Twitter in Britain than in France or Germany. Most tweets cannot be accurately located to a particular area – and this study differentiated only on the basis of the language, not specific location, of the tweet.
• Demographically: The background of people who use Twitter continues to change, and is linked to the complex phenomenon of how people adopt technology and new habits of using technology. The demographic of the EU’s Twitter users is unlikely to reflect the overall demographic of the EU. The most detailed demographic studies of Twitter use, from the USA, have identified that Twitter users there tend to be young, affluent, educated and non-white. [75]
Digital observation
Truly getting hold of attitudes is a fraught process. Attitudes are complex constructs, labels for those myriad ‘inclinations and feelings, prejudice and bias, preconceived notions, ideas, fears, threats, and convictions’, which we can only infer from what people say. [76] Does digital observation really uncover attitudes? Can it reliably measure what people say, and does what people say relate to the attitudes that they have? [77]
We have drawn the following conclusions:
Attitudes on Twitter are mixed with a lot of ‘noise’
A significant proportion of our data did not appear to include any discernible attitude at all: the general broadcasting of information, in tweets and through the sharing of links. [78] Practically, therefore, the mixture of attitudinal and nonattitudinal data drawn from Twitter are not always readily distinguishable. Why precisely people decide to share certain stories is not well understood – and has, to our knowledge, not been studied in detail.
The use of natural language processing is necessary
Faced with far too much data of differing quality and relevance to read and sort manually, the use of new, automated technologies was necessary. The ability of digital observation to measure accurately what millions of people are saying depends on the success or failure of a vital new technology – NLP. Assessing whether and when it can work is vital to understanding when digital observation can add insight, and when it cannot.
To be successful, natural language processing must be used on events, not generically
We showed in chapter 2 that the success of NLP technology overwhelmingly depends on the context in which it is used. Natural language processing tends to succeed when built bespoke to understand a specific event, at a specific time. It tends to fail when it is used in attempts to understand nonspecific data over a long period of time.
When used correctly, natural language processing is highly accurate
Where NLP was used appropriately, it was very accurate. As it continues to improve, it is clear that NLP has great potential as part of a reliable and valid way of researching a large number of conversations.
Digital observation will always misinterpret some data
The meaning of language – its intent, motivation, social signification, denotation and connotation – is wrapped up in the context where it was used. When tweets are aggregated as large data sets, they lose this context. Because of this, neither the manual nor automated analysis of tweets will ever be perfect. Automated analysis especially will struggle with non-literal language uses, such as sarcasm, pastiche, slang and spoofs.
Even if we can accurately measure tweets, what do they mean? We make the following observations:
Attitudinal indicators on Twitter may not represent underlying attitudes
There is no straightforward or easy relationship between even attitudinal expressions on Twitter, and the underlying inclinations of the tweeter. Twitter is a new medium: digital social platforms, including Twitter, are new social spaces, and are allowing the explosion and growth of any number of digital cultures and sub-cultures with distinct norms, ways of transacting and speaking. This exerts ‘medium effects’ on the message – social and cognitive influences on what is said. ‘Online disinhibition effect’ is one such influence – where statements made in online spaces, often because of the immediacy and anonymity of the platform, are more critical and rude, and less subject to offline social norms and etiquettes than statements made offline.
It is unclear how Twitter fits into people’s lives
To understand how attitudes on Twitter relate to people, it is important to understand how Twitter fits into people’s broader lives, how they experience it, and when they use it. Social media, including Twitter, as a widespread habit as well as a technology, is constantly evolving. Our event-specific research was an attempt to fit attitudes on Twitter into how Twitter fits into people’s lives. By providing context to situate attitudinal data from Twitter into a narrative of events, it also could then touch on causes, consequences and explanations of attitudes – the ‘why’ as well as the ‘what’.
Current methods struggle to move from ‘what?’ to ‘why?’
The generation of raw, descriptive enumeration of attitudes is not enough. Beyond this, researchers must engage with and contribute towards more general explanatory theories – abstract propositions and inferences about the social world in general, causes and explanations, even predictions – ‘why?’ and ‘where next?’, as well as ‘what?’. Sociologists understanding meaning in this way often draw on different theories – from positivism to interpretivism and constructionism – each with their own ideas on how to expose the representational, symbolic or performative significance implied or contained in what is said.
Conclusion: a new type of attitudinal research
Digital observation cannot be considered in the same light as a representative poll. Our digital observation of the EU did not attempt to intervene within the EU – by convening a panel, mailing out interviews – to attempt to understand what the whole of the EU thinks. Rather, it lets a researcher observe a new, evolving digital forum of political expression, the conversations of the EU’s energised, arguing digital-citizens as they otherwise and anyway talk about events.
This new technique to conduct attitudinal research has considerable strengths and weaknesses compared with conventional approaches to research. It is able to leverage more data about people than ever before, with hardly any delay and at very little cost. On the other hand, it uses new, unfamiliar technologies to measure new digital worlds, all of which are not well understood, producing event-specific, ungeneralisable insights that are very different from what has until now been produced by attitudinal research in the social sciences.
We believe digital observation is a viable new way of beginning to realise the considerable research potential that Twitter has. It will continue to improve as the technology gets better, and our understanding of how to use and our sense of how digital observation fits in with other ways of researching attitudes become more sophisticated.
Overall
An interaction of qualitative and quantitative methods
Automated techniques are only able to classify social media data into one of a small number of preset categories at a certain (limited) level of accuracy for each message. Manual analysis is therefore almost always a useful and important component; in this report it is used to look more closely at a small number of randomly selected pieces of data drawn from a number of these categories. In scenarios when a deeper and subtler view of the social media data is required, the random selection of social media information can be drawn from a data pool, and sorted manually by an analyst into different categories of meaning.
Subject matter experts at every step
It is vital that attempts to collect and analyse ‘big data’ attitudes are guided by an understanding of what is to be studied: how people express themselves, the languages that are used, the social and political contexts that attitudes are expressed in, and the issues that they are expressed about. Analysts who understand the issues and controversies that surround the EU are therefore vital: to contextualise and explain the attitudes that are found on Twitter, and to help build the methods used to find and collect these attitudes.
For acquiring data
New roving, changeable sampling techniques
The collection of systemically biased data from Twitter is far from easy. The search terms that are used are vulnerable to the fact that Twitter is chaotically subject to viral, short-term surging variations in the way that language is mobilised to describe any particular topic. During this study, a new data acquisition technique was piloted that attempted to reflect the changing and unstable way people discuss subjects on Twitter. The ‘information gain cascade’ was developed. It is a method intended to ‘discover’ words and phrases that coincide with, and therefore indicate, topics of interest. To do this, a sample of tweets on a topic is collected using high recall ‘originator terms’. A relevancy classifier is built for this stream in the usual way and applied to a large sample of tweets.
The terms (either words or phrases) that this classifier uses as the basis for classification are ranked based on their information gain: a measure of the extent to which the term aligns with the relevant or irrelevant classes. Terms that are randomly distributed between the relevant and irrelevant classes have low information gain, and terms that are much more likely to be in one class than another have high information gain. The terms that have high information gain in the relevant class are designated ‘candidate search terms’. Each candidate search term is then independently streamed, to create its own tweet sample, analysed on their own merits and then, on the decision of an analyst, either graduated to become full search terms, or discarded. This process iteratively ‘cascades’ to continuously construct a growing cloud of terms discovered to be coincident with the originator terms.
This approach allows the search queries used to arise from a statistical appreciation of the data themselves, rather than the preconceptions of the analyst. This method is designed to produce samples containing a large proportion of all conversations that might be of interest – high recall.
Automatic identification of twitcidents
An important but separate area of study is to detect the emergence of twitcidents automatically through statistically finding the ripples that they cast into the tweet stream. [79] This technology can be used to identify twitcidents as they occur, allowing for the research to be real time, and used reactively.
For analysis
Natural language processing classifiers should:
• be bespoke and event-driven rather than generic
• work with each other: classifiers, each making a relatively simple decision, can be collected into larger architectures of classifiers that can conduct more sophisticated analyses and make more complex overall decisions
• reflect the data: when categories to sort and organise data are applied a priori, there is a danger that they reflect the preconceptions of the analyst rather than the evidence. It is important that classifiers should be constructed to organise data along lines that reflect the data rather than the researcher’s expectations; this is consistent with a wellknown sociological method called grounded theory [80]
For interpretation
• Accepting uncertainty: Many of the technologies that can now be used for Twitter produce probabilistic rather than definite outcomes. Uncertainty is therefore an inherent property of the new research methods in this area, and the insights they produce. Therefore there needs to be an increased comfortableness with confidence scores and systematically attached caveats in order to use them.
• From metrics to meaning: Of all aspects of attitudinal research on Twitter, the generation of meaningful insight that can be acted on requires the most development, and can add the most value. Attitudinal measurements must be contextualised within broader bodies of work in order to draw out causalities and more general insights.
For use: the creation of digital observatories
Organisations, especially representative institutions, now have the opportunity to listen cheaply to attitudes expressed on Twitter that matter to them. They should consider establishing digital observatories that are able to identify, collect and listen to digital voices, and establish ways for them to be appropriately reflected in how the organisation behaves, the decisions it makes and the priorities it has. Digital observatories, constantly producing real-time information on how people are receiving and talking about events that are happening, could be transformative in how organisations relate to wider societies.
There must be clear understanding of how they can be used. In the face of the challenges that have just been outlined, the validation of attitudinal research on Twitter is especially important in two senses. Digital observation must:
• validate social media research by the source itself, such as through a common reporting framework that rates the ‘confidence’ in any piece of freestanding piece of research that points out potential vulnerabilities
• address biases in the acquisition and analysis of the information and caveats outcomes accordingly
Social media outputs must be cross-referenced and compared with more methodologically mature forms of offline research, such as ‘gold standard’ administered and curated data sets (such as Census data, and other sets held by the Office for National Statistics), [81] and the increasing body of ‘open data’ that now exists on a number of different issues, from crime and health to public attitudes, finances and transport, or bespoke research conducted in parallel to research projects. [82] The comparisons – whether as overlays, correlations, or simply reporting that can be read side by side – can be used to contextualise the safety of findings from social media research.
Digital observations must be weighed against other forms of insight. All attitudinal research methods have strengths and weaknesses – some are better able at reaching the groups that are needed, some produce more accurate or detailed results, some are quicker and some are cheaper. It is important to recognise the strengths and weaknesses of attitudinal research on Twitter, relative to the other methods of conducting this sort of research that exist, to be clear about where it fits into the methodological armoury of attitudinal researchers.