Altruistic World Online Library

Posted: **Sun Jun 28, 2015 11:01 pm**

5 Digital observation

The chapters above have described and showcased a new kind of event-specific research method to understand attitudes on Twitter: digital observation.

It is essential to know whether, how far, and in what ways this method of analysis can actually tell us something about people’s attitudes – their values, concerns, dispositions, fears and convictions. Finally, what is its future?

Our study found that these data are extremely valuable. We found millions of digital voices talking about EU-related themes, in real time. Many tweets expressed political attitudes about pressing events as they were happening. These tweets were surrounded by a cloud of metadata – everything from when the tweet was made, to how many followers the tweeter has, and sometimes where they are. Some of these metadata were leveraged in this project to aid analysis – but much more could be done (and is being done elsewhere). Overall, Twitter is a new venue for politics, and there exists an extremely valuable opportunity to understand it.

We found that such data sets are ‘social big data’. They are often far larger than comparative data sets gathered through conventional polling, interviewing and surveying techniques. Social media data are also noisy, messy and chaotic. Twitter is prone to viral surges in topic, kinds of language used, theme and meme. Twitter data sets are also subject to ‘powerlaws’: the most prolific tweeters tend to be much more prolific than others, those with the most followers tend to have many more followers than anyone else, the most shared links tend to be much more shared than any other. Taken together, any given data set will be profoundly influenced by a number of factors that are very difficult to anticipate beforehand.

Conventional polls, surveys and interviews are not designed to handle the speed and scale at which data are created on Twitter. We found that in order to understand Twitter data, we needed to deploy new technologies that are unfamiliar to sociologists and sociological methods.

Our solution – digital observation – attempted to reconcile and integrate new technologies with conventional techniques, and the long-standing values of social science, but as with any new method of analysis there is a pervasive concern for its quality and credibility.

Generalisability

A key challenge to digital observation is generalisability. When a smaller, representative group is studied, it allows us to extend the findings onto the wider group from which it is drawn. Digital observation does not study representative groups for various reasons:

The data gathered from Twitter may not represent Twitter

Strategies to gather data from Twitter, including our own, often return large bodies of data that are non-representative expressions of systemic non-random bias. [72] As we described above, we used APIs to deliver tweets that match a series of search terms. The search terms that we used attempted (imperfectly) to gather as many tweets about a given topic as possible, and as few tweets about any other topic as possible. This is difficult to achieve: language use on Twitter is constantly changing, and subject to viral, short-term changes in the way that language is mobilised to describe any particular topic. Trending topics, #tags and memes change the landscape of language in ways that cannot be anticipated, but can crucially undermine the ability of any body of search terms to return a reasonably comprehensive and precise sample. It is therefore probable that tweets about the relevant issue were missed and these tweets, through virtue of using different words and expressions, may be systematically different in attitudes to the ones we did collect.

Tweets may not represent Twitter users

In general, tweets are produced by a small number of high-volume tweeters. Some research suggests that a small number, around 5 per cent, of ‘power-users’ on Twitter are responsible for 75 per cent of Twitter activity. [73] These include a small number of dedicated commentators or campaigners on a related issue.

Twitter users may not represent actual people

We found a number of prolific accounts in the data sets that we gathered that not only accounted for a large number of tweets, but were also not EU citizens – our target demographic. These included:

• ‘Twitterbots’ or ‘fake’ accounts programmed to produce automated posts on Twitter
• Official accounts, especially from the EU itself, including the accounts of EU politicians, communications and external affairs agencies and EU offices. [74]

Twitter users may not be representative of EU citizens Take-up and use of Twitter has not been consistent across EU member states or within them:

• Geographically: Around 16 per cent of Europeans use Twitter, and a higher proportion of the population use Twitter in Britain than in France or Germany. Most tweets cannot be accurately located to a particular area – and this study differentiated only on the basis of the language, not specific location, of the tweet.
• Demographically: The background of people who use Twitter continues to change, and is linked to the complex phenomenon of how people adopt technology and new habits of using technology. The demographic of the EU’s Twitter users is unlikely to reflect the overall demographic of the EU. The most detailed demographic studies of Twitter use, from the USA, have identified that Twitter users there tend to be young, affluent, educated and non-white. [75]

Digital observation

Truly getting hold of attitudes is a fraught process. Attitudes are complex constructs, labels for those myriad ‘inclinations and feelings, prejudice and bias, preconceived notions, ideas, fears, threats, and convictions’, which we can only infer from what people say. [76] Does digital observation really uncover attitudes? Can it reliably measure what people say, and does what people say relate to the attitudes that they have? [77]

We have drawn the following conclusions:

Attitudes on Twitter are mixed with a lot of ‘noise’

A significant proportion of our data did not appear to include any discernible attitude at all: the general broadcasting of information, in tweets and through the sharing of links. [78] Practically, therefore, the mixture of attitudinal and nonattitudinal data drawn from Twitter are not always readily distinguishable. Why precisely people decide to share certain stories is not well understood – and has, to our knowledge, not been studied in detail.

The use of natural language processing is necessary

Faced with far too much data of differing quality and relevance to read and sort manually, the use of new, automated technologies was necessary. The ability of digital observation to measure accurately what millions of people are saying depends on the success or failure of a vital new technology – NLP. Assessing whether and when it can work is vital to understanding when digital observation can add insight, and when it cannot.

To be successful, natural language processing must be used on events, not generically

We showed in chapter 2 that the success of NLP technology overwhelmingly depends on the context in which it is used. Natural language processing tends to succeed when built bespoke to understand a specific event, at a specific time. It tends to fail when it is used in attempts to understand nonspecific data over a long period of time.

When used correctly, natural language processing is highly accurate

Where NLP was used appropriately, it was very accurate. As it continues to improve, it is clear that NLP has great potential as part of a reliable and valid way of researching a large number of conversations.

Digital observation will always misinterpret some data

The meaning of language – its intent, motivation, social signification, denotation and connotation – is wrapped up in the context where it was used. When tweets are aggregated as large data sets, they lose this context. Because of this, neither the manual nor automated analysis of tweets will ever be perfect. Automated analysis especially will struggle with non-literal language uses, such as sarcasm, pastiche, slang and spoofs.

Even if we can accurately measure tweets, what do they mean? We make the following observations:

Attitudinal indicators on Twitter may not represent underlying attitudes

There is no straightforward or easy relationship between even attitudinal expressions on Twitter, and the underlying inclinations of the tweeter. Twitter is a new medium: digital social platforms, including Twitter, are new social spaces, and are allowing the explosion and growth of any number of digital cultures and sub-cultures with distinct norms, ways of transacting and speaking. This exerts ‘medium effects’ on the message – social and cognitive influences on what is said. ‘Online disinhibition effect’ is one such influence – where statements made in online spaces, often because of the immediacy and anonymity of the platform, are more critical and rude, and less subject to offline social norms and etiquettes than statements made offline.

It is unclear how Twitter fits into people’s lives

To understand how attitudes on Twitter relate to people, it is important to understand how Twitter fits into people’s broader lives, how they experience it, and when they use it. Social media, including Twitter, as a widespread habit as well as a technology, is constantly evolving. Our event-specific research was an attempt to fit attitudes on Twitter into how Twitter fits into people’s lives. By providing context to situate attitudinal data from Twitter into a narrative of events, it also could then touch on causes, consequences and explanations of attitudes – the ‘why’ as well as the ‘what’.

Current methods struggle to move from ‘what?’ to ‘why?’

The generation of raw, descriptive enumeration of attitudes is not enough. Beyond this, researchers must engage with and contribute towards more general explanatory theories – abstract propositions and inferences about the social world in general, causes and explanations, even predictions – ‘why?’ and ‘where next?’, as well as ‘what?’. Sociologists understanding meaning in this way often draw on different theories – from positivism to interpretivism and constructionism – each with their own ideas on how to expose the representational, symbolic or performative significance implied or contained in what is said.

Conclusion: a new type of attitudinal research

Digital observation cannot be considered in the same light as a representative poll. Our digital observation of the EU did not attempt to intervene within the EU – by convening a panel, mailing out interviews – to attempt to understand what the whole of the EU thinks. Rather, it lets a researcher observe a new, evolving digital forum of political expression, the conversations of the EU’s energised, arguing digital-citizens as they otherwise and anyway talk about events.

This new technique to conduct attitudinal research has considerable strengths and weaknesses compared with conventional approaches to research. It is able to leverage more data about people than ever before, with hardly any delay and at very little cost. On the other hand, it uses new, unfamiliar technologies to measure new digital worlds, all of which are not well understood, producing event-specific, ungeneralisable insights that are very different from what has until now been produced by attitudinal research in the social sciences.

We believe digital observation is a viable new way of beginning to realise the considerable research potential that Twitter has. It will continue to improve as the technology gets better, and our understanding of how to use and our sense of how digital observation fits in with other ways of researching attitudes become more sophisticated.

Overall

An interaction of qualitative and quantitative methods

Automated techniques are only able to classify social media data into one of a small number of preset categories at a certain (limited) level of accuracy for each message. Manual analysis is therefore almost always a useful and important component; in this report it is used to look more closely at a small number of randomly selected pieces of data drawn from a number of these categories. In scenarios when a deeper and subtler view of the social media data is required, the random selection of social media information can be drawn from a data pool, and sorted manually by an analyst into different categories of meaning.

Subject matter experts at every step

It is vital that attempts to collect and analyse ‘big data’ attitudes are guided by an understanding of what is to be studied: how people express themselves, the languages that are used, the social and political contexts that attitudes are expressed in, and the issues that they are expressed about. Analysts who understand the issues and controversies that surround the EU are therefore vital: to contextualise and explain the attitudes that are found on Twitter, and to help build the methods used to find and collect these attitudes.

For acquiring data

New roving, changeable sampling techniques

The collection of systemically biased data from Twitter is far from easy. The search terms that are used are vulnerable to the fact that Twitter is chaotically subject to viral, short-term surging variations in the way that language is mobilised to describe any particular topic. During this study, a new data acquisition technique was piloted that attempted to reflect the changing and unstable way people discuss subjects on Twitter. The ‘information gain cascade’ was developed. It is a method intended to ‘discover’ words and phrases that coincide with, and therefore indicate, topics of interest. To do this, a sample of tweets on a topic is collected using high recall ‘originator terms’. A relevancy classifier is built for this stream in the usual way and applied to a large sample of tweets.

The terms (either words or phrases) that this classifier uses as the basis for classification are ranked based on their information gain: a measure of the extent to which the term aligns with the relevant or irrelevant classes. Terms that are randomly distributed between the relevant and irrelevant classes have low information gain, and terms that are much more likely to be in one class than another have high information gain. The terms that have high information gain in the relevant class are designated ‘candidate search terms’. Each candidate search term is then independently streamed, to create its own tweet sample, analysed on their own merits and then, on the decision of an analyst, either graduated to become full search terms, or discarded. This process iteratively ‘cascades’ to continuously construct a growing cloud of terms discovered to be coincident with the originator terms.

This approach allows the search queries used to arise from a statistical appreciation of the data themselves, rather than the preconceptions of the analyst. This method is designed to produce samples containing a large proportion of all conversations that might be of interest – high recall.

Automatic identification of twitcidents

An important but separate area of study is to detect the emergence of twitcidents automatically through statistically finding the ripples that they cast into the tweet stream. [79] This technology can be used to identify twitcidents as they occur, allowing for the research to be real time, and used reactively.

For analysis

Natural language processing classifiers should:

• be bespoke and event-driven rather than generic
• work with each other: classifiers, each making a relatively simple decision, can be collected into larger architectures of classifiers that can conduct more sophisticated analyses and make more complex overall decisions
• reflect the data: when categories to sort and organise data are applied a priori, there is a danger that they reflect the preconceptions of the analyst rather than the evidence. It is important that classifiers should be constructed to organise data along lines that reflect the data rather than the researcher’s expectations; this is consistent with a wellknown sociological method called grounded theory [80]

For interpretation

• Accepting uncertainty: Many of the technologies that can now be used for Twitter produce probabilistic rather than definite outcomes. Uncertainty is therefore an inherent property of the new research methods in this area, and the insights they produce. Therefore there needs to be an increased comfortableness with confidence scores and systematically attached caveats in order to use them.
• From metrics to meaning: Of all aspects of attitudinal research on Twitter, the generation of meaningful insight that can be acted on requires the most development, and can add the most value. Attitudinal measurements must be contextualised within broader bodies of work in order to draw out causalities and more general insights.

For use: the creation of digital observatories

Organisations, especially representative institutions, now have the opportunity to listen cheaply to attitudes expressed on Twitter that matter to them. They should consider establishing digital observatories that are able to identify, collect and listen to digital voices, and establish ways for them to be appropriately reflected in how the organisation behaves, the decisions it makes and the priorities it has. Digital observatories, constantly producing real-time information on how people are receiving and talking about events that are happening, could be transformative in how organisations relate to wider societies.

There must be clear understanding of how they can be used. In the face of the challenges that have just been outlined, the validation of attitudinal research on Twitter is especially important in two senses. Digital observation must:

• validate social media research by the source itself, such as through a common reporting framework that rates the ‘confidence’ in any piece of freestanding piece of research that points out potential vulnerabilities
• address biases in the acquisition and analysis of the information and caveats outcomes accordingly

Social media outputs must be cross-referenced and compared with more methodologically mature forms of offline research, such as ‘gold standard’ administered and curated data sets (such as Census data, and other sets held by the Office for National Statistics), [81] and the increasing body of ‘open data’ that now exists on a number of different issues, from crime and health to public attitudes, finances and transport, or bespoke research conducted in parallel to research projects. [82] The comparisons – whether as overlays, correlations, or simply reporting that can be read side by side – can be used to contextualise the safety of findings from social media research.

Digital observations must be weighed against other forms of insight. All attitudinal research methods have strengths and weaknesses – some are better able at reaching the groups that are needed, some produce more accurate or detailed results, some are quicker and some are cheaper. It is important to recognise the strengths and weaknesses of attitudinal research on Twitter, relative to the other methods of conducting this sort of research that exist, to be clear about where it fits into the methodological armoury of attitudinal researchers.

Posted: **Sun Jun 28, 2015 11:07 pm**

Annex: methodology

The methodology annex sets out a more detailed explanation and description of the methods used in this study, and how they performed.

Data collection

APIs

All data from Twitter were collected from its APIs. Twitter has three different APIs that are available to researchers. The ‘search’ API returns a collection of relevant tweets matching a specified query (word match) from an index that extends up to roughly a week in the past. Its ‘filter’ API continually produces tweets that contain one of a number of keywords to the researcher, in real time as they are made. Its ‘sample’ API returns a random sample of a fixed percentage of all public tweets in real time. Each of these APIs (consistent with the vast majority of all social media platform APIs) is constrained by the amount of data they will return. A public, free ‘spritzer’ account caps the search API at 180 calls every 15 minutes with up to 100 tweets returned per call; the filter API caps the number of matching tweets returned by the filter to no more than 1 per cent of the total stream in any given second, and the sample API returns a random 1 per cent of the tweet stream. Others use white-listed research accounts (known informally as ‘the garden hose’), which have 10 per cent rather than 1 per cent caps on the filter and sample APIs, while still others use the commercially available ‘firehose’ of 100 per cent of daily tweets. With daily tweet volumes averaging roughly 400 million, many researchers do not find the spritzer account restrictions to be limiting to the number of tweets they collect (or need) on any particular topic.

Keywords

To gather data for this report, we accessed the search API that delivers already posted tweets that match a certain keyword, and a filter API that does the same in real time, as tweets are posted. Both of these APIs collect all public instances of a designated keyword being used in either the tweet or the user name of the tweeter. Both these APIs restrict the total number of tweets they will produce as a given total proportion of the total number of tweets that are sent. These ‘rate limits’ were never exceeded during the course of the project.

Acquiring data from Twitter on a particular topic through the use of keywords is a trade-off between precision and comprehensiveness. A very precise data collection strategy generally only returns tweets that are on-topic, but will likely miss some. A more comprehensive data collection strategy collects more of the tweets that are on-topic, but will likely include some which are off-topic. Individual words themselves, reflecting how and when they are used, can be inherently either precise or comprehensive. ‘Euro’ cuts across many different types of issues that are often discussed in high volumes, from the football competition to foreign exchange speculation. Others, like ‘Barroso’, are more often used specifically in the context of discussing José Manuel Barroso.

As noted above, precision and comprehensiveness are inherently conflicting properties of a sample, and a balance must be struck between them. To do this, the search strategy and exact search terms used for each stream were evolved over the early part of the project, before the final phase of data collection began. The search terms for each stream were incrementally crafted by analysts, who monitored how the addition of each term or specific, often topical, annotation of tweets (hashtags) influenced the tweets that were subsequently collected. Both strategies were tried out before final data collection started; in the first week, a high precision search strategy using only a single core term for each stream was used, in the second week a long list of related terms was used to achieve a high recall, and in the third, a balance was struck between both, where enough relevant tweets were collected without flooding the stream with irrelevant ones. From the third week onwards, a final, balanced approach was taken in which only a short list of directly relevant scraper terms and hashtags was used per stream. [83]

Each stream struck this balance differently. Some returned larger and generally less precise bodies of data, others smaller, more precise returns. The finalised search terms and the numbers that each produced are shown in tables 5 to 7. Between 5 March and 6 June 2013, we collected approximately 1.91 million tweets in English across the data streams, 1.04 million in French, and 328,800 in German.

Sampling on Twitter is an important example of the lack of clear methodological best practice in social media research. Current conventional sampling strategies on social media construct ‘hand-crafted’ or ‘incidental’ samples using inclusion criteria that are arbitrarily derived. [84] There are many reasons why a small body of keywords should not be expected to return a sociologically robust, systemically unbiased sample: they are likely to return data sets with ‘systemic bias’, wherein data have been systematically included or excluded in a systematic way; some words or hashtags may be most used by people who hold a particular political position, while other words or hashtags may be used by people who hold another; and unless both sets of words are equally identified and used to acquire a sample, the sample will be biased.

Table 5 shows the data volumes collected for search terms in English on the six themes studied.

Table 5 The exact search terms used in English and total number of tweets per theme

Table 6 shows the data volumes collected for search terms in French on the six stream topics studied.

Table 6 The exact search terms used in French and total number of tweets per theme

Table 7 shows the data volumes collected for search terms in German on the six stream topics studied.

Table 7 The exact search terms used in German and total number of tweets per theme

Data analysis

For our study we used a web-hosted software platform, developed by the project team, called Method51, which uses NLP technology to allow the researcher to construct bespoke classifiers rapidly to sort defined bodies of tweets into categories (defined by the analyst). [85] To create each classifier we went through the following phases using this technology:

Phase 1 — Definition of categories

The formal criteria explaining how tweets should be annotated were developed. This, importantly, continued throughout the early interaction of the data: categories and definitions of meaning were not arrived at a priori, but through relating the direct observation of the contours of the data with the overall research aims. These guidelines were provided to all the annotators working on the task.

Phase 2 — Creation of a gold-standard baseline

On the basis of this formal criteria, analysts manually annotated a set of around 100–200 ‘gold-standard’ tweets using Method51. This phase has two important functions. First, it measures the inter-annotator agreement: the extent to which two human beings agreed with each other on the correct categories for each of the tweets. A low (typically, below 80 per cent) inter-annotator agreement implies that the categories are incorrect: they either are not distinct enough to allow human beings to tell the difference between them dependably, or they do not fit the data, forcing the analyst to make imperfect, awkward and inconsistent categorisations. Second, gold-standard tweets provide a baseline of truth against which the classifier performance was tested.

Phase 3 — Training

The analyst manually annotated a set of tweets to train the machine learning classifier, through web access to the digital observation software interface. The number of tweets that were annotated depended on the performance of the classifier, which itself depended on the scenario. For some streams and for some classifiers, the decision the classifier was required to make, and the data it was required to make the decision on, was relatively straightforward. In others, the analytical challenge was more difficult, and required the creation of larger bodies of training data. Between 200 and 2,000 tweets were analysed for each stream.

Phase 4 — Performance review and modification

The performance of the classifier was reviewed, and examples of its outputs were read. Where feasible and necessary, the algorithm was modified to improve its performance.

Architecture of classifiers

The process above was followed, throughout the lifetime of the project, by 15 human annotators to create a specific ‘architecture’, or system of cooperating classifiers, for each stream. Each stream’s architecture was in the form of a cascade: a number of classifiers that were connected first to the tweets that were being automatically connected, and then with each other to create a coherent cascade of data.

Each architecture comprised at least six levels:

Level 1 – Collection of raw data:

All the tweets were collected through Twitter’s filter APIs, which matched the body of search terms for each tweet.

Level 2 – Language filter:

Raw data were first passed through a language filter to ensure that each tweet was in the correct language for the stream.

Level 3 – Relevancy filter:

All data in the correct language were passed through a ‘relevancy classifier’, an NLP algorithm trained to decide whether a tweet was relevant to the particular theme under which it was collected. The relevancy classifiers were meant to filter out any tweet that did not refer to the topic. For instance, if it was collected under the ‘Barroso’ theme, was the tweet about José Manuel Barroso, the President of the European Commission? The classifier was trained to categorise all tweets as either relevant or irrelevant. Tweets judged to be irrelevant by the classifier were discarded. [86]

Level 4 – Attitudinal filter:

All tweets judged to be relevant were passed through an ‘attitudinal classifier’, an algorithm trained to categorise whether data were attitudinally relevant expressions by an EU citizen, or not. ‘Attitudinally relevant’ tweets were those that expressed, implied or included a non-neutral comment on the topic of the stream as defined for the relevancy classifier. [87]

We only considered tweets that expressed the attitude of the poster as attitudinal; many of the tweets we found contained attitudinal statements from people other than the tweeter, which were quoted or paraphrased as such, but where it could not be assumed that this implied endorsement. All tweets judged to be the former were collected and stored. All tweets judged to be the latter were discarded.

Level 5 – Polarity:

All attitudinal data were passed through an algorithm to categorise tweets as ‘positive’, ‘negative’ or neutral in the nature of the sentiment expressed towards the theme of the stream. Double negative tweets that rejected criticism of the person or institution of interest were considered positive, while obvious sarcastically positive tweets as well as back-handed compliments were considered negative (eg ‘After ruining the European economy, Barroso finally realises austerity has reached its limit. Better late than never I guess’). [88] For lack of an appropriate category, tweets that simultaneously expressed a positive opinion about one aspect of the stream topic, but a negative one about another (for example, tweets attacking one but defending another MEP for the parliament stream) were marked as neutral.

Level 6+ – Event-specific analysis:

In some cases, additional classifiers were built to make highly bespoke categorisations of the data collected by specific streams in specific time-windows (see below). In these circumstances, a classifier was trained to classify relevant tweets into very context-specific categories of meaning.

Classifier performance

We tested the performance of all the classifiers used in the project by comparing the decisions they made against a human analyst making the same decisions about the same tweets. As stated above, phase 2 of classifier training involved the creation of a ‘gold-standard’ data set containing around 100–200 tweets for each classifier, annotated by a human annotator into the same categories of meaning as the algorithm was designed to do.

The performance of each classifier could then be assessed by comparing the decisions that it made on those tweets against the decisions made by the human analyst. There are three outcomes of this test, and each measures the ability of the classifier to make the same decisions as a human – and thus its overall performance – in a different way:

• Recall: The number of correct selections that the classifier makes as a proportion of the total correct selections it could have made. If there were ten relevant tweets in a data set, and a relevancy classifier successfully picks eight of them, it has a recall score of 80 per cent.
• Precision: The number of correct selections the classifier makes as a proportion of all the selections it has made. If a relevancy classifier selects ten tweets as relevant, and eight of them actually are indeed relevant, it has a precision score of 80 per cent.
• Overall, or ‘F1’: All classifiers are a trade-off between recall and precision. Classifiers with a high recall score tend to be less precise, and vice versa. ‘F1’ equally reconciles performance and recall to create one, overall measurement of performance for the classifier. The F1 score is the harmonic mean of precision and recall. [89]

Note precision and recall must be understood with reference to a particular target class, for example this would typically be the ‘relevant’ class for the relevancy classifier, and the ‘attitudinal’ class for the attitudinal classifier. This is particularly important when there are more than two classes, as in such cases there are distinct ‘F1’ scores for each of the possible target class. In tables 8–10 we show F1 scores for each language, with two scores shown for the sentiment classifiers, the first in cases where the target class is the ‘positive’ class, and the second where it is the ‘negative’ class. The performance of each of the decisions that a classifier makes can be drastically different: it can much more reliably select ‘relevant’ rather than ‘irrelevant’ tweets, or ‘negative’ rather than ‘positive’ ones.

Table 8 Classifier scores for tweets in English

Table 9 Classifier scores for tweets in French

Table 10 Classifier scores for tweets in German

Classifier performance: event-specific data sets

We also produced a small number of event-specific classifiers for chapter 4 (case studies of real world events). These classifiers were trained on smaller data sets, but were specific to one event that caused a large surge in traffic surrounding an offline event:

• Classifier 1: European Commission opening on 22 May – whether the tweeter was ‘broadly optimistic’ or ‘broadly pessimistic’ about the ability of the European Commission to enact positive influence on the tweeter’s life; this had an F1 score of 0.63
• Classifier 2: whether, in the context of the proposal to suspend Britain’s membership of the European Convention on Human Rights temporarily in order to deport Abu Qatada, the tweeter was ‘broadly positive about the European Court of Human Rights’, or ‘broadly negative’; this had an F1 score of 0.68
• Classifier 3: whether tweets that were supportive of José Manuel Barroso’s criticism of France’s failure to enact meaningful budgetary reform on 15 May 2013; this had an F1 score for ‘supportive’ of 1.0 and 0.9 for ‘unsupportive’

Ethics

We consider that the two most important principles to consider for this work are whether informed consent is necessary to collect, store, analyse and interpret public tweets, and whether there are any possible harms to participants in including and possibly republishing their tweets, as part of a research project, which must be measured, managed and minimised.

Informed consent is widely understood to be required in any occasion of ‘personal data’ use when research subjects have an expectation of privacy. Determining the reasonable expectation of privacy someone might have is important in both offline and online research contexts. How to do this is not simple. The individual must expect the action to be private and this expectation must be accepted in society as objectively reasonable.

Within this frame, an important determination of an individual’s expectation of privacy on social media is by reference to whether the individual has made any explicit effort or decision in order to ensure that third parties cannot access this information. In the UK, there are a number of polls and surveys that have gauged public attitudes on this subject, including a small number of representative, national level surveys. Some research suggests that some users have become increasingly aware of the privacy risks and have reacted by placing more of their social media content onto higher privacy settings with more restricted possible readerships. [90] Users are taking more care to manage their online accounts actively; figures for deleting comments, friends and tags from photos are all increasing, reported a Pew internet survey. [91] Taken together, the surveys find that citizens are increasingly worried about losing control over what happens to their personal information, and the potential for misuse by governments and commercial companies. [92] However, these surveys also show that it is less clear what people actually understand online privacy to entail. They found that there is no clear agreement about what constitutes personal or public data on the internet. [93]

Applying these two principles to Twitter for our work we believe that those who tweet publicly available messages in general expect a low level of privacy. (This is not true of all social networks.) Twitter’s terms of service and privacy policy both state: ‘What you say on Twitter may be viewed all around the world instantly’, [94] and the terms of service also states: ‘We encourage and permit broad re-use of Content. The Twitter API exists to enable this.’ [95] We believe that people have a relatively low expectation of privacy on Twitter, given recent court cases that have determined tweets are closely analogous to acts of publishing, and can thus also be prosecuted under laws governing public communications, including libel.

That does not remove the burden on researchers to make sure they are not causing any likely harm to users, given users have not given a clear, informed, express consent. Harm is difficult to measure in social media research. We drew a distinction in our research between key word searches and named account searches. We built no detailed profiles about any online user, or offline person. This was partly a technological challenge: extraction tools need to be designed to avoid accidental extraction from non-public accounts, and new forms of collection – such as extracting profile information – might in some instances require explicit consent.

Posted: **Sun Jun 28, 2015 11:10 pm**

Notes:

1 There are a number of new and emerging academic disciplines developing in this area, most notably computational sociology and digital anthropology.

2 It may also partly be a reflection of the network effect of social networks. For example, given the high proportion of English on Twitter, non-English users may also feel compelled to use English as well, to take part in conversations on the network.

3 Attitudinal research itself can often change the context of what is said, and in doing so introduce ‘observation’ or ‘measurement’ effects’. This is ‘reactivity’ – the phenomenon that occurs when individuals alter their behaviour when they are aware that they are being observed. People involved in a poll are often seen to change their behaviour in consistent ways: to be more acceptable in general, more acceptable to the researcher specifically, or in ways that they believe meet the expectations of the observers. See PP Heppner, BE Wampold and DM Kivlighan, Research Design in Counseling, Thompson, 2008, p 331.

4 See BG Glaser and AL Strauss, The Discovery of Grounded Theory, New Brunswick: AldineTransaction, 1967.

5 These are the six principles: research should be designed, reviewed and undertaken to ensure integrity, quality and transparency; research staff and participants must normally be informed fully about the purpose, methods and intended possible uses of the research, what their participation in the research entails and what risks, if any, are involved; the confidentiality of information supplied by research participants and the anonymity of respondents must be respected; research participants must take part voluntarily, free from any coercion; harm to research participants and researchers must be avoided in all instances; and the independence of research must be clear, and any conflicts of interest or partiality must be explicit. See ESRC, ‘Framework for Research Ethics’, latest version, Economic and Social Research Council Sep 2012, www.esrc.ac.uk/ about-esrc/information/research-ethics.aspx (accessed 13 Apr 2014).

6 However, a growing group of internet researchers has issued various types of guidance themselves. See AoIR, Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee (Version 2.0), Association of Internet Researchers, 2012, p 2.

7 European Commission, Eurobarometer survey on trust in institutions, Nov 2013, http://ec.europa.eu/public_opinion/ cf/showchart_column.cfm?keyID=2189&nationID=6,3,15,& startdate=2012.05&enddate=2013.11 (accessed 24 Apr 2014); I van Biezen, P Mair and T Poguntke (2012) ‘Going, going… gone? The decline of party membership in contemporary Europe’, European Journal of Political Research 51, no 1, 2012, pp 24–56.

8 J Birdwell, F Farook and S Jones, Trust in Practice, London: Demos, 2009.

9 European Commission, ‘Public opinion in the European Union: first results’, Standard Eurobarometer 78, Dec 2012, http://ec.europa.eu/public_opinion/archives/eb/eb78/ eb78_first_en.pdf (accessed 10 Apr 2014).

10 Pew Research Center, ‘The sick man of Europe: the European Union’, 13 May 2013, www.pewglobal.org/ 2013/05/13/the-new-sick-man-of-europe-the-european-union/ (accessed 10 Apr 2014).

11 European Commission, ‘Two years to go to the 2014 European elections’, Eurobarometer 77, no 4, 2012, www. europarl.europa.eu/pdf/eurobarometre/2012/election_2012/ eb77_4_ee2014_synthese_analytique_en.pdf (accessed 11 Apr 2014).

12 P Huyst, ‘The Europeans of tomorrow: researching European identity among young Europeans’, Centre for EUstudies, Ghent University, nd, http://aei.pitt.edu/33069/1/ huyst._petra.pdf (accessed 11 Apr 2014).

13 M Henn and N Foard, ‘Young people, political participation and trust in Britain’, Parliamentary Affairs 65, no 1, 2012.

14 Eg J Sloam, ‘Rebooting democracy: youth participation in politics in the UK’, Parliamentary Affairs, 60, 2007.

15 D Zeng et al, ‘Social media analytics and intelligence: guest editors’ introduction’, in Proceedings of the IEEE Computer Society, Nov–Dec 2010, p 13.

16 Emarketer, ‘Where in the world are the hottest social networking countries?’, 29 Feb 2012, www.emarketer. com/Article/Where-World-Hottest-Social-Networking- Countries/1008870 (accessed 11 Apr 2014).

17 Social-media-prism, ‘The conversation’, nd, www.google. co.uk/imgres?imgurl=http://spirdesign.no/wp-content/ uploads/2010/11/social-media-prism.jpg&imgrefurl=http:// spirdesign.no/blog/webdesignidentitet-og-trender/ attachment/social-media-prism/&h=958&w=1024&sz=3 01&tbnid=EFQcS2D-zhOj8M:&tbnh=90&tbnw=96&z oom=1&usg=__VXussUcXEMznT42YLhgk6kOsPIk= &docid=ho9_RAXkIYvcpM&sa=X&ei=9QBXUdeYOiJ0AXdyIHYAg& ved=0CEoQ9QEwAg&dur=47 (accessed 11 Apr 2014).

18 F Ginn, ‘Global social network stats confirm Facebook as largest in US & Europe (with 3 times the usage of 2nd place)’, Search Engine Land, 17 Oct 2011, http:// searchengineland.com/global-social-network-stats-confirmfacebook- as-largest-in-u-s-europe-with-3-times-the-usageof- 2nd-place-97337 (accessed 11 Apr 2014).

19 Emarketer, ‘Twitter is widely known in France, but garners few regular users’, 30 Apr 2013, www.emarketer.com/Article/ Twitter-Widely-Known-France-Garners-Few-Regular- Users/1009851 (accessed 11 Apr 2014).

20 For a map of current Twitter languages and demographic data, see E Fischer, ‘Language communities of Twitter’, 24 Oct 2011, www.flickr.com/photos/walkingsf/6277163176/ in/photostream/lightbox/ (accessed 10 Apr 2014); DMR, ‘(March 2014) by the numbers: 138 amazing Twitter statistics’, Digital Market Ramblings, 23 Mar 2014, http:// expandedramblings.com/index.php/march-2013-by-thenumbers- a-few-amazing-twitter-stats/ (accessed 10 Apr 2014).

21 Slideshare, ‘Media measurement: social media trends by age and country’, 2011, www.slideshare.net/MML_Annabel/ media-measurement-social-media-trends-by-country-and-age (accessed 11 Apr 2014).

22 Emarketer, ‘Twitter grows stronger in Mexico’, 24 Sep 2012, www.emarketer.com/Article/Twitter-Grows-Stronger- Mexico/1009370 (accessed 10 Apr 2014); Inforrm’s Blog, ‘Social media: how many people use Twitter and what do we think about it?’, International Forum for Responsible Media Blog, 16 Jun 2013, http://inforrm.wordpress.com/2013/06/16/ social-media-how-many-people-use-twitter-and-what-dowe- think-about-it/ (accessed 11 Apr 2014).

23 Eg M Bamburic, ‘Twitter: 500 million accounts, billions of tweets, and less than one per cent use their location’, 2012, http://betanews.com/2012/07/31/twitter- ... naccounts- billions-of-tweets-and-less-than-one-per cent-usetheir- location/ (accessed 11 Apr 2014).

24 Beevolve, ‘Global heatmap of Twitter users’, 2012, www. beevolve.com/twitter-statistics/#a3 (accessed 11 Apr 2014).

25 European Commission, ‘Political participation and EU citizenship: perceptions and behaviours of young people’, nd, http://eacea.ec.europa.eu/youth/tools/documents/ perception-behaviours.pdf (accessed 11 Apr 2014).

26 S Creasey, ‘Perceptual engagement: the potential and pitfalls of using social media for political campaigning’, London School of Economics, 2011, http://blogs.lse.ac.uk/ polis/files/2011/06/PERPETUAL-ENGAGEMENT-THEPOTENTIAL- AND-PITFALLS-OF-USING-SOCIALMEDIA- FOR-POLITICAL-CAMPAIGNING.pdf (accessed 29 Apr 2014).

27 WH Dutton and G Blank, Next Generation Users: The internet in Britain, Oxford Internet Survey 2011 report, 2011, www.oii.ox.ac.uk/publications/oxis2011_report.pdf (accessed 3 Apr 2013).

28 Ibid.

29 J Bartlett et al, Virtually Members: The Facebook and Twitter followers of UK political parties, London: Demos 2013.

30 J Bartlett et al, New Political Actors in Europe: Beppe Grillo and the M5S, London: Demos, 2012; J Birdwell and J Bartlett, Populism in Europe: CasaPound, London: Demos, 2012; J Bartlett, J Birdwell and M Littler, The New Face of Digital Populism, London: Demos, 2011.

31 C McPhedran, ‘Pirate Party makes noise in German politics’, Washington Times, 10 May 2012, www.washingtontimes.com/news/2012/may/10/upstartparty- making-noise-in-german-politics/?page=all (accessed 11 Apr 2014).

32 T Postmes and S Brunsting, ‘Collective action in the age of the internet: mass communication and online mobilization’, Social Science Computer Review 20, issue 3, 2002; M Castells, ‘The mobile civil society: social movements, political power and communication networks’ in M Castells et al, Mobile Communication and Society: A global perspective, Cambridge MA: MIT Press, 2007.

33 G Blakeley, ‘Los Indignados: a movement that is here to stay’, Open Democracy, 5 Oct 2012, www.opendemocracy.net/georgina-blakeley ... smovement- that-is-here-to-stay (accessed 11 Apr 2014).

34 N Vallina-Rodriguez et al, ‘Los Twindignados: the rise of the Indignados Movement on Twitter’, in Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on Social Computing (SocialCom), www.cl.cam.ac.uk/~nv240/ papers/twindignados.pdf (accessed 11 Apr 2014).

35 GT Madonna and M Young, ‘The first political poll’, Politically Uncorrected, 18 Jun 2002, www.fandm.edu/ politics/politically-uncorrected-column/2002-politicallyuncorrected/ the-first-political-poll (accessed 11 Apr 2014).

36 For example, are federal expenditures for relief and recovery too great, too little, or about right? Responses were as follows: 60 per cent too great; 9 per cent too little; 31 per cent about right. See ‘75 years ago, the first Gallup Poll’, Polling Matters, 20 Oct 2010, http://pollingmatters. gallup.com/2010/10/75-years-ago-first-gallup-poll.html (accessed 11 Apr 2014).

37 Thereby avoiding a number of measurement biases often present during direct solicitation of social information, including memory bias, questioner bias and social acceptability bias. Social media, by contrast, is often a completely unmediated spectacle.

38 VM Schonberger and K Cukier, Big Data, London: John Murray, 2013.

39 Early and emerging examples of Twitterology were presented at the International Conference on Web Search and Data Mining 2008. It is important to note that there is a large difference between what are current capabilities, and what are published capabilities. We do not have access to a great deal of use-cases – including novel techniques, novel applications of techniques or substantive findings – that are either under development or extant but unpublished. Academic peer-reviewed publishing can take anywhere from six months to two years, while many commercial capabilities are proprietary. Furthermore, much social media research is conducted either by or on behalf of the social media platforms themselves, and never made public. The growing distance between development and publishing, and the increasing role of proprietary methodologies and private sector ownership and exploitation of focal data sets, are important characteristics of the social media research environment. Good examples include P Carvalhoet al, ‘Liars and saviors in a sentiment annotated corpus of comments to political debates’ in Proceedings of the Association for Computational Linguistics, 2011, pp 564–68; N Diakopoulos and D Shammar, ‘Characterising debate performance via aggregated Twitter sentiment’ in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010, pp 1195–8; S Gonzalez-Bailon, R Banchs and A Kaltenbrunner, ‘Emotional reactions and the pulse of public opinion: measuring the impact of political events on the sentiment of online discussions’, ArXiv e-prints, 2010, arXiv 1009.4019; G Huwang et al, ‘Conversational tagging in Twitter’ in Proceedings of the 21st ACM conference on Hypertext and Hypermedia, 2010, pp 173–8; M Marchetti-Bowick and N Chambers, ‘Learning for microblogs with distant supervision: political forecasting with Twitter’ in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp 603–12; B O’Connor et al, ‘From tweets to polls: linking text sentiment to public opinion time series’ in Proceedings of the AAAI Conference on Weblogs and Social Media, 2010, pp 122–9; A Pak and P Paroubak, ‘Twitter as a corpus for sentiment analysis and opinion mining’ in Proceedings of the Seventh International Conference on Language Resources and Evaluation, 2010; C Tan et al, ‘User-level sentiment analysis incorporating social networks’ in Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2011; A Tumasjan et al, ‘Election forecasts with Twitter: how 140 characters reflect the political landscape’, Social Science Computer Review, 2010. See also RE Wilson, SD Gosling and LT Graham, ‘A review of Facebook research in the social sciences’, Perspectives on Psychological Science, 7, no 3, 2012 pp 203–20.

40 Early and emerging examples of Twitterology were presented at the International Conference on Web Search and Data Mining, 2008.

41 European Commission, ‘Europeans and their languages’, Special Eurobarometer 243, Feb 2006, http://ec.europa. eu/public_opinion/archives/ebs/ebs_243_sum_en.pdf (accessed 11 Apr 2014).

42 It is also possible to acquire a large amount of social media data via licensed data providers. These are often third party resellers.

43 Some APIs can deliver historical data, stretching back months or years, while others only deliver very recent content. Some deliver a random selection of social media data taken from the platform; others deliver all data that match the queries – usually keywords selected by the analyst to be present in the post or tweet – provided by the researcher. In general, all APIs produce data in a consistent, ‘structured’ format, in large quantities.

44 Twitter has three different APIs available to researchers. The search API returns a collection of relevant tweets matching a specified query (word match) from an index that extends up to roughly a week in the past. Its filter API continually produces tweets that contain one of a number of keywords to the researcher, in real time as they are made. Its sample API returns a random sample of a fixed percentage of all public tweets in real time. Each of these APIs (consistent with the vast majority of all social media platform APIs) is constrained by the amount of data they will return. A public, free ‘spritzer’ account caps the search API at 180 calls every 15 minutes with up to 100 tweets returned per call; the filter API caps the number of matching tweets returned by the filter to no more than 1 per cent of the total stream in any given second; and the sample API returns a random 1 per cent of the tweet stream. Others use white-listed research accounts (known informally as ‘the garden hose’), which have 10 per cent rather than 1 per cent caps on the filter and sample APIs, while still others use the commercially available ‘firehose’ of 100 per cent of daily tweets. With daily tweet volumes averaging roughly 400 million, many researchers do not find the spritzer account restrictions to be limiting to the number of tweets they collect (or need) on any particular topic.

45 S Fodden, ‘Anatomy of a tweet: metadata on Twitter’, Slaw, 17 Nov 2011, www.slaw.ca/2011/11/17/the-anatomy-of-a-tweetmetadata- on-twitter/ (accessed 11 Apr 2014); R Krikorian, ‘Map of a Twitter status object’, 18 Apr 2010, www.slaw. ca/wp-content/uploads/2011/11/map-of-a-tweet-copy.pdf (accessed 11 Apr 2014).

46 Acquiring data from Twitter on a particular topic is a trade-off between precision and comprehensiveness. A precise data collection strategy only returns tweets that are on-topic, but is likely to miss some. A comprehensive data collection strategy collects all the tweets that are on-topic, but is likely to include some which are off-topic. Individual words themselves can be inherently either precise or comprehensive, depending on how and when they are used.

47 Ibid.

48 The choice of these keywords and hashtags for each topic in each language was made in a quick manual review of the data collected in the early stages of the project. The inclusion of these terms was meant to bring in conversations that were relevant to the stream but did not explicitly reference the topic by its full name, without overwhelming the streams with irrelevant data. For a full list of scraper terms used per stream, see the annex.

49 AoIR, Ethical Decision-Making and Internet Research; J Bartlett and C Miller, ‘How to measure and manage harms to privacy when accessing and using communications data’, submission by the Centre for the Analysis of Social Media, as requested by the Joint Parliamentary Select Committee on the Draft Communications Data Bill, Oct 2012, www.demos.co.uk/files/Demos%20CASM%20 submission%20on%20Draft%20Communications%20 Data%20bill.pdf (accessed 11 Apr 2014).

50 It may also partly be a reflection of the network affect of social networks. For example, given the high proportion of tweets in English on Twitter, non-English users may also feel compelled to use English as well to take part in conversations on the network.

51 Emarketer, ‘Twitter grows stronger in Mexico’; Inforrm’s Blog, ‘Social media’.

52 Given the historical nature of our data set, each twitcident was identified from a single data stream, rather than across Twitter as a whole (which would be a far better way of collecting data relating to an event). See discussion in chapter 4.

53 ‘Dix milliards d’euros pour sauver Chypre’, Libération, 16 Mar 2013, www.liberation.fr/economie/2013/03/16/ chypre-cinquieme-pays-de-la-zone-euro-a-beneficier-del- aide-internationale_889016 (accessed 11 Apr 2014); I de Foucaud, ‘Chypre: un sauvetage inédit à 10 milliards d’euros’, Le Figaro, 16 Mar 2013, www.lefigaro.fr/ conjoncture/2013/03/16/20002-20130316ARTFIG00293- chypre-un-sauvetage-inedit-a-10-milliards-d-euros.php (accessed 11 Apr 2014); ‘A Chypre, la population sous le choc, le président justifie les sacrifices’, Le Monde, 17 Mar 2014, www.lemonde.fr/europe/article/2013/03/16/a-chyprela- population-dans-l-incertitude-apres-l-annonce-du-plande- sauvetage_1849491_3214.html (accessed 11 Apr 2014).

54 ‘Hitting the savers: Eurozone reaches deal on Cyprus bailout’, Spiegel International, 16 Mar 2013, www.spiegel.de/ international/europe/savers-will-be-hit-as-part-of-deal-tobail- out-cyprus-a-889252.html (accessed 11 Apr 2014)

55 This echoed much of the early press coverage, especially in Germany, with the Frankfurter Allgemeine stating ‘Zyperns Rettung Diesmal bluten die Sparer’(‘Cyprus rescue bleeding time savers’)

56 ‘Does the bailout deal mean the worst is over for Cyprus? – poll’, Guardian, 25 Mar 2013, www.theguardian.com/ business/poll/2013/mar/25/bailout-deal-worst-over-cypruspoll (accessed 11 Apr 2014).

57 Pew Research Center, The New Sick Man of Europe: The European Union, 2013, www.pewglobal.org/files/2013/05/ Pew-Research-Center-Global-Attitudes-Project-European- Union-Report-FINAL-FOR-PRINT-May-13-2013.pdf (accessed 11 Apr 2014).

58 YouGov survey results, fieldwork 21–27 Mar 2013, http://d25d2506sfb94s.cloudfront.net/cumulus_uploads/ document/eh65gpse1v/YG-Archive_Eurotrack-March- Cyprus-EU-representatives-Easter.pdf (accessed 11 Apr 2014).

59 From a background level of 117 tweets on 9 March 2013, 141 on 10 March and 288 on 11 March, there is an increase to 391 on 12 March, 844 on 13 March and a peak of 1,786 on 14 March.

60 See the section ‘classifier performance’ in the annex for a discussion of its accuracy.

61 ‘Affichette “casse-toi pov’ con”: la France condamnée par la CEDH’, Le Monde, 14 Mar 2013, www.lemonde.fr/societe/ article/2013/03/14/affichette-casse-toi-pov-con-la-francecondamnee- par-la-cedh_1847686_3224.html (accessed 11 Apr 2014).

62 European Court of Human Rights, ‘Affaire Eon c. France’, requête 26118/10, 14 Mar 2013, http://hudoc.echr.coe.int/ sites/fra/pages/search.aspx?i=001-117137#{‘itemid’:[‘001-117137’]} (accessed 11 Apr 2014).

63 W Jordan, ‘Public: ignore courts and deport Qatada’, YouGov, 26 Apr 2013, http://yougov.co.uk/news/2013/ 04/26/brits-ignore-courts-and-deport-qatada/ (accessed 24 Apr 2014).

64 Ipsos MORI, ‘Public blamed ECHR over the Home Secretary for Qatada delays’, 26 Apr 2013, www.ipsos-mori.com/researchpublications ... charchive/ 2964/Public-blamed-ECHR-over-the-Home-Secretary-for- Abu-Qatada-delays.aspx (accessed 24 Apr 2014).

65 Average calculated across March, April and May.

66 ‘Récession: “La situation est grave”, juge Hollande’.

67 ‘Barroso: “la France doit présenter des reformes crédibles”’.

68 ‘José Manuel Barroso: “Être contre la mondialisation, c’est cracher contre le vent”’.

69 ‘Hollande ne va pas passer un “examen” à Bruxelles, souligne Barroso’.

70 ‘François Hollande au révélateur de la Commission européenne: le président de la République a rencontré les 27 commissaires européens à Bruxelles pour évoquer les réformes structurelles réclamées à la France’.

71 This was the highest performing classifier trained during the project – with a far higher accuracy than the generic attitudinal classifiers that attempted to make more generic decisions over a longer term.

72 See, for instance, O’Connor et al, ‘From tweets to polls’. The authors collected their sample using just a few keyword searches. Some more promisingly methodical approaches also exist: see J Leskovec, J Kleinberg and C Faloutsos, ‘Graphevolution: densification and shrinking diameters’, Data 1, no 1, Mar 2007, www.cs.cmu.edu/~jure/ pubs/powergrowth-tkdd.pdf (accessed 16 Apr 2012); J Leskovec and C Faloutsos, ‘Sampling from large graphs’ in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, www.stat.cmu.edu/~fienberg/Stat36-835/L ... plingkdd06. pdf (accessed 17 Apr 2012); P Rusmevichientong et al, ‘Methods for sampling pages uniformly from the world wide web’ in Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation, 2001, pp 121–8.

73 D Singer, ‘Forget the 80/20 principle, with Twitter it is 79/7’, Social Media Today, 25 Feb 2010, http://socialmediatoday.com/index.php?q=SMC/177538 (accessed 11 Apr 2014).

74 European Union, ‘Twitter accounts’, nd, http://europa.eu/contact/take-part/twitter/index_en.htm (accessed 11 Apr 2014).

75 S Bennett, ‘Who uses Twitter? Young, affluent, educated non-white males, suggests data [study]’, All Twitter, 6 Aug 2013, www.mediabistro.com/alltwitter/twitterusers- 2013_b47437 (accessed 11 Apr 2014).

76 M Bulmer, ‘Facts, concepts, theories and problems’ in M Bulmer (ed.), Sociological Research Methods: An introduction, London: Macmillan, 1984.

77 Surveys often tap attitudes by using a sophisticated barrage of different indicators and different ways of measuring them. The Likert scale measures intensity of feelings (usually measured on a scale from 1 to 5) on a number of different specific questions to gauge an underlying attitude. A body of work around question design has produced settled dos and don’ts aimed at avoiding the unreliable measurement of attitudinal indicators. Questions are avoided if they are too long, ambiguous, leading, general, technical or unbalanced, and many surveys use specific wordings of questions drawn from ‘question banks’ designed to best practice standards for use by major surveys.

78 S Jeffares, ‘Coding policy tweets’, paper presented to the social text analysis workshop, University of Birmingham, 28 Mar 2012.

79 S Wibberley and C Miller, ‘Detecting events from Twitter: situational awareness in the age of social media’ in C Hobbs, M Matthew and D Salisbury, Open Source Intelligence in the Twenty-first Century: New approaches and opportunities, Palgrave MacMillan, forthcoming 2014.

80 Glaser and Strauss, The Discovery of Grounded Theory.

81 COSMOS platform.

82 Open Knowledge Foundation, ‘Open data – an introduction’, nd, http://okfn.org/opendata/ (accessed 11 Apr 2014).

83 The choice of these keywords and hashtags for each topic in each language was made on the basis of a quick manual review of the data that were collected in the early stages of the project. The inclusion of these terms was meant to bring in conversations that were relevant to the stream but did not explicitly reference the topic by its full name, without overwhelming the streams with irrelevant data. For a full list of scraper terms used per stream see annex.

84 Marchetti-Bowick and Chambers, ‘Learning for microblogs with distant supervision’; O’Connor et al, ‘From tweets to polls’.

85 Method51 is a software suite developed by the project team over the last 18 months. It is based on an open source project called DUALIST. See B Settles, ‘Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances’, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, pp 1467–78. Method51 enables non-technical analysts to build machine-learning classifiers. The most important feature is the speed wherein accurate classifiers can be built. Classically, an NLP algorithm would require many thousands of examples of ‘marked-up’ tweets to achieve reasonable accuracy. This is expensive and takes days to complete. However, DUALIST innovatively uses ‘active learning’ (an application of information theory that can identify pieces of text that the NLP algorithm would learn most from) and semi-supervised learning (an approach to learning that not only learns from manually labelled data, but also exploits patterns in large unlabelled data sets). This radically reduces the number of marked-up examples from many thousands to a few hundred. Overall, in allowing social scientists to build and evaluate classifiers quickly, and therefore to engage directly with big social media data sets, the AAF makes possible the methodology used in this project.

86 On the one hand, we have been fairly inclusive on the relevancy level, in that all discussions on something directly related to the topic were usually included as relevant. For example, for the European Parliament stream, all tweets about individual MEPs were considered relevant, as were tweets about individual commissioners for the European Commission stream. Similarly, for the European Court of Human Rights stream, tweets about the European Convention of Human Rights, on which the Court’s jurisdiction is based, were included. Anything about the management of the euro by the Eurozone countries and the European Central Bank, as well as euro-induced austerity, was considered relevant for the euro stream. On the other hand, some tweets that directly referred to the stream topic were considered irrelevant, because they did not match our criteria of interest in the six streams as they relate to the European project. For example, tweets that referred to the European Union purely as a geographical area, as a shorthand for a group of countries, without referring in any sense to this group of countries as belonging to a political union, were marked as irrelevant (eg ‘Car sales in the EU have gone down 20 per cent’). Similarly, tweets referring to the euro from a purely financial perspective, quoting solely the price of things in euros or exchange rates, were irrelevant.

87 For example, for the European Parliament stream, tweets that expressed an opinion about its decisions, discussions taking place in the Parliament, individual MEPs and ‘lobbying’ directed at it (eg ‘@EP: please outlaw pesticides and save the bees!’) were considered attitudinal.

88 For the Parliament and Commission streams, positive or negative comments on individual MEPs and commissioners and specific decisions taken by each institution were marked as such.

89 The harmonic mean of p and r is equal to p × q, p + q.

90 Wilson et al, ‘A review of Facebook research in the social sciences’.

91 M Madden, ‘Privacy management on social media sites’, Pew Research Center, 2012, www.pewinternet.org/~/ media//Files/Reports/2012/PIP_Privacy_management_ on_social_media_sites_022412.pdf (accessed 11 Apr 2014).

92 J Bartlett, The Data Dialogue, London: Demos, 2010. Also see J Bartlett and C Miller, Demos CASM Submission to the Joint Committee on the Draft Communications Data Bill, Demos, 2012

93 Bartlett, The Data Dialogue. This is based on a representative population level poll of circa 5,000 people. See also Bartlett and Miller, ‘How to measure and manage harms to privacy when accessing and using communications data’.

94 Twitter, ‘Terms of Service’, 2012, www.twitter.com/tos (accessed 11 Apr 2014); Twitter, ‘Twitter Privacy Policy’, 2013, www.twitter.com/privacy (accessed 11 Apr 2014).

95 Twitter, ‘Terms of Service’.

Posted: **Sun Jun 28, 2015 11:13 pm**

References

‘75 years ago, the first Gallup Poll’, Polling Matters, 20 Oct 2010, http://pollingmatters.gallup.com/2010/1 ... -agofirst- gallup-poll.html (accessed 11 Apr 2014).

‘A Chypre, la population sous le choc, le président justifie les sacrifices’, Le Monde, 17 Mar 2014, www.lemonde.fr/europe/ article/2013/03/16/a-chypre-la-population-dans-l-incertitudeapres- l-annonce-du-plan-de-sauvetage_1849491_3214.html (accessed 11 Apr 2014).

‘Affichette “casse-toi pov’ con”: la France condamnée par la CEDH’, Le Monde, 14 Mar 2013, www.lemonde.fr/societe/ article/2013/03/14/affichette-casse-toi-pov-con-la-francecondamnee- par-la-cedh_1847686_3224.html (accessed 11 Apr 2014).

‘Dix milliards d’euros pour sauver Chypre’, Libération, 16 Mar 2013,
www.liberation.fr/economie/2013/03/16/chyprecinquieme- pays-de-la-zone-euro-a-beneficier-de-l-aideinternationale_ 889016 (accessed 11 Apr 2014).

‘Does the bailout deal mean the worst is over for Cyprus? – poll’, Guardian, 25 Mar 2013, www.theguardian.com/ business/poll/2013/mar/25/bailout-deal-worst-over-cyprus-poll (accessed 11 Apr 2014).

‘Hitting the savers: Eurozone reaches deal on Cyprus bailout’, Spiegel International, 16 Mar 2013, www.spiegel.de/ international/europe/savers-will-be-hit-as-part-of-deal-to-bailout- cyprus-a-889252.html (accessed 39 Apr 2014)

AoIR, Ethical Decision-Making and Internet Research: Recommendations from the AoIR Ethics Working Committee (Version 2.0), Association of Internet Researchers, 2012, p 2.

Bamburic M, ‘Twitter: 500 million accounts, billions of tweets, and less than one per cent use their location’, 2012, http:// betanews.com/2012/07/31/twitter-500-million-accountsbillions- of-tweets-and-less-than-one-per cent-use-their-location/ (accessed 11 Apr 2014).

Bartlett J, The Data Dialogue, London: Demos, 2010.

Bartlett J, Birdwell J and Littler M, The New Face of Digital Populism, London: Demos, 2011.

Bartlett J and Miller C, ‘How to measure and manage harms to privacy when accessing and using communications data’, submission by the Centre for the Analysis of Social Media, as requested by the Joint Parliamentary Select Committee on the Draft Communications Data Bill, Oct 2012, www.demos.co.uk/ files/Demos%20CASM%20submission%20on%20Draft%20 Communications%20Data%20bill.pdf (accessed 11 Apr 2014).

Bartlett J et al, New Political Actors in Europe: Beppe Grillo and the M5S, London: Demos, 2012.

Bartlett J et al, Virtually Members: The Facebook and Twitter followers of UK political parties, London: Demos 2013.

Beevolve, ‘Global heatmap of Twitter users’, 2012, www.beevolve.com/twitter-statistics/#a3 (accessed 11 Apr 2014).

Bennett S, ‘Who uses Twitter? Young, affluent, educated non-white males, suggests data [study]’, All Twitter, 6 Aug 2013, www.mediabistro.com/alltwitter/twitter- ... 013_b47437 (accessed 11 Apr 2014).

Birdwell J and Bartlett J, Populism in Europe: CasaPound, London: Demos, 2012.

Birdwell J, Farook F and Jones S, Trust in Practice, London: Demos, 2009.

Blakeley G, ‘Los Indignados: a movement that is here to stay’, Open Democracy, 5 Oct 2012, www.opendemocracy.net/ georgina-blakeley/los-indignados-movement-that-is-here-to-stay (accessed 11 Apr 2014).

Bulmer M, ‘Facts, concepts, theories and problems’ in M Bulmer (ed.), Sociological Research Methods: An introduction, London: Macmillan, 1984.

Carvalho P et al, ‘Liars and saviors in a sentiment annotated corpus of comments to political debates’ in Proceedings of the Association for Computational Linguistics, 2011, pp 564–68.

Castells M, ‘The mobile civil society: social movements, political power and communication networks’ in M Castells et al, Mobile Communication and Society: A global perspective, Cambridge MA: MIT Press, 2007.

Creasey S, ‘Perceptual engagement: the potential and pitfalls of using social media for political campaigning’, London School of Economics, 2011, http://blogs.lse.ac.uk/ polis/files/2011/06/PERPETUAL-ENGAGEMENT-THEPOTENTIAL- AND-PITFALLS-OF-USING-SOCIALMEDIA- FOR-POLITICAL-CAMPAIGNING.pdf (accessed 29 Apr 2014).

de Foucaud I, ‘Chypre: un sauvetage inédit à 10 milliards d’euros’, Le Figaro, 16 Mar 2013, www.lefigaro.fr/conjoncture/ 2013/03/16/20002-20130316ARTFIG00293-chypre-un-sauvetageinedit- a-10-milliards-d-euros.php (accessed 11 Apr 2014).

Diakopoulos N and Shammar D, ‘Characterising debate performance via aggregated Twitter sentiment’ in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 2010, pp 1195–8.

DMR, ‘(March 2014) by the numbers: 138 amazing Twitter statistics’, Digital Market Ramblings, 23 Mar 2014, http:// expandedramblings.com/index.php/march-2013-by-thenumbers- a-few-amazing-twitter-stats/ (accessed 10 Apr 2014).

Dutton WH and Blank G, Next Generation Users: The internet in Britain, Oxford Internet Survey 2011 report, 2011, www.oii.ox.ac.uk/publications/oxis2011_report.pdf (accessed 3 Apr 2013).

Emarketer, ‘Twitter grows stronger in Mexico’, 24 Sep 2012, www.emarketer.com/Article/Twitter-Grows-Stronger- Mexico/1009370 (accessed 10 Apr 2014).

Emarketer, ‘Twitter is widely known in France, but garners few regular users’, 30 Apr 2013, www.emarketer.com/Article/ Twitter-Widely-Known-France-Garners-Few-Regular- Users/1009851 (accessed 11 Apr 2014).

Emarketer, ‘Where in the world are the hottest social networking countries?’, 29 Feb 2012, www.emarketer.com/ Article/Where-World-Hottest-Social-Networking- Countries/1008870 (accessed 11 Apr 2014).

ESRC, ‘Framework for Research Ethics’, latest version, Economic and Social Research Council, Sep 2012, www.esrc.ac.uk/about-esrc/information/r ... thics.aspx (accessed 13 Apr 2014).

European Commission, Eurobarometer survey on trust in institutions, Nov 2013, http://ec.europa.eu/public_opinion/cf/ showchart_column.cfm?keyID=2189&nationID=6,3,15,&startd ate=2012.05&enddate=2013.11 (accessed 24 Apr 2014).

European Commission, ‘Europeans and their languages’, Special Eurobarometer 243, Feb 2006, http://ec.europa.eu/ public_opinion/archives/ebs/ebs_243_sum_en.pdf (accessed 11 Apr 2014).

European Commission, ‘Political participation and EU citizenship: perceptions and behaviours of young people’, nd, http://eacea.ec.europa.eu/youth/tools/documents/ perception-behaviours.pdf (accessed 11 Apr 2014).

European Commission, ‘Public opinion in the European Union: first results’, Standard Eurobarometer 78, Dec 2012, http://ec.europa.eu/public_opinion/arch ... eb78/eb78_ first_en.pdf (accessed 10 Apr 2014).

European Commission, ‘Two years to go to the 2014 European elections’, Eurobarometer 77, no 4, 2012, www.europarl.europa. eu/pdf/eurobarometre/2012/election_2012/eb77_4_ee2014_ synthese_analytique_en.pdf (accessed 11 Apr 2014).

European Court of Human Rights, ‘Affaire Eon c. France’, requête 26118/10, 14 Mar 2013, http://hudoc.echr.coe.int/sites/ fra/pages/search.aspx?i=001-117137#{‘itemid’:[‘001-117137’]} (accessed 11 Apr 2014).

European Union, ‘Twitter accounts’, nd, http://europa.eu/ contact/take-part/twitter/index_en.htm (accessed 11 Apr 2014).

Fischer E, ‘Language communities of Twitter’, 24 Oct 2011, www.flickr.com/photos/walkingsf/6277163 ... otostream/ lightbox/ (accessed 10 Apr 2014).

Fodden S, ‘Anatomy of a tweet: metadata on Twitter’, Slaw, 17 Nov 2011, www.slaw.ca/2011/11/17/the-anatomy-of-a-tweetmetadata- on-twitter/ (accessed 11 Apr 2014).

Ginn F, ‘Global social network stats confirm Facebook as largest in US & Europe (with 3 times the usage of 2nd place)’, Search Engine Land, 17 Oct 2011, http://searchengineland.com/ global-social-network-stats-confirm-facebook-as-largest-inu- s-europe-with-3-times-the-usage-of-2nd-place-97337 (accessed 11 Apr 2014).

Glaser BG and Strauss AL, The Discovery of Grounded Theory, New Brunswick: AldineTransaction, 1967.

Gonzalez-Bailon S, Banchs R and Kaltenbrunner A, ‘Emotional reactions and the pulse of public opinion: measuring the impact of political events on the sentiment of online discussions’, ArXiv e-prints, 2010, arXiv 1009.4019.

Henn M and Foard N, ‘Young people, political participation and trust in Britain’, Parliamentary Affairs 65, no 1, 2012.
Heppner PP, Wampold BE and Kivlighan DM, Research Design in Counseling, Thompson, 2008, p 331.

Huwang G et al, ‘Conversational tagging in Twitter’ in Proceedings of the 21st ACM conference on Hypertext and Hypermedia, 2010, pp 173–8.

Huyst P, ‘The Europeans of tomorrow: researching European identity among young Europeans’, Centre for EU Studies, Ghent University, nd, http://aei.pitt.edu/33069/ 1/huyst._petra.pdf (accessed 11 Apr 2014).

Inforrm’s Blog, ‘Social media: how many people use Twitter and what do we think about it?’, International Forum for Responsible Media Blog, 16 Jun 2013, http://inforrm. wordpress.com/2013/06/16/social-media-how-many-people-use-twitter- and-what-do-we-think-about-it/ (accessed 11 Apr 2014).

Ipsos MORI, ‘Public blamed ECHR over the Home Secretary for Qatada delays’, 26 Apr 2013, www.ipsos-mori.com/ researchpublications/researcharchive/2964/Public-blamed- ECHR-over-the-Home-Secretary-for-Abu-Qatada-delays.aspx (accessed 24 Apr 2014).

Jeffares S, ‘Coding policy tweets’, paper presented to the social text analysis workshop, University of Birmingham, 28 Mar 2012.

Jordan W, ‘Public: ignore courts and deport Qatada’, YouGov, 26 Apr 2013, http://yougov.co.uk/news/2013/04/26/ brits-ignore-courts-and-deport-qatada/ (accessed 24 Apr 2014).

Krikorian R, ‘Map of a Twitter status object’, 18 Apr 2010, www.slaw.ca/wp-content/uploads/2011/11/mapof- a-tweet-copy.pdf (accessed 11 Apr 2014).

Leskovec J and Faloutsos C, ‘Sampling from large graphs’ in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, www.stat.cmu. edu/~fienberg/Stat36-835/Leskovec-sampling-kdd06.pdf (accessed 17 Apr 2012).

Leskovec J, Kleinberg J and Faloutsos C, ‘Graphevolution: densification and shrinking diameters’, Data 1, no 1, Mar 2007, www.cs.cmu.edu/~jure/pubs/powergrowth-tkdd.pdf (accessed 16 Apr 2012).

Madden M, ‘Privacy management on social media sites’, Pew Research Center, 2012, www.pewinternet.org/~/media// Files/Reports/2012/PIP_Privacy_management_on_social_ media_sites_022412.pdf (accessed 11 Apr 2014).

Madonna GT and Young M, ‘The first political poll’, Politically Uncorrected, 18 Jun 2002, www.fandm.edu/politics/ politically-uncorrected-column/2002-politically-uncorrected/ the-first-political-poll (accessed 11 Apr 2014).

Marchetti-Bowick M and Chambers N, ‘Learning for microblogs with distant supervision: political forecasting with Twitter’ in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp 603–12.

McPhedran C, ‘Pirate Party makes noise in German politics’, Washington Times, 10 May 2012, www.washingtontimes.com/ news/2012/may/10/upstart-party-making-noise-in-germanpolitics/? page=all (accessed 11 Apr 2014).

O’Connor B et al, ‘From tweets to polls: linking text sentiment to public opinion time series’ in Proceedings of the AAAI Conference on Weblogs and Social Media, 2010, pp 122–9.

Open Knowledge Foundation, ‘Open data – an introduction’, nd, http://okfn.org/opendata/ (accessed 11 Apr 2014).

Pak A and Paroubak P, ‘Twitter as a corpus for sentiment analysis and opinion mining’ in Proceedings of the Seventh International Conference on Language Resources and Evaluation, 2010.

Pew Research Center, The New Sick Man of Europe: The European Union, 2013, www.pewglobal.org/files/2013/05/Pew-Research- Center-Global-Attitudes-Project-European-Union-Report- FINAL-FOR-PRINT-May-13-2013.pdf (accessed 11 Apr 2014).

Pew Research Center, ‘The sick man of Europe: the European Union’, 13 May 2013, www.pewglobal.org/2013/05/13/the-newsick- man-of-europe-the-european-union/ (accessed 10 Apr 2014).

Postmes T and Brunsting S, ‘Collective action in the age of the internet: mass communication and online mobilization’, Social Science Computer Review 20, issue 3, 2002.

Rusmevichientong P et al, ‘Methods for sampling pages uniformly from the world wide web’ in Proceedings of the AAAI Fall Symposium on Using Uncertainty Within Computation, 2001, pp 121–8.

Schonberger VM and Cukier K, Big Data, London: John Murray, 2013.

Settles B, ‘Closing the loop: fast, interactive semi-supervised annotation with queries on features and instances’, Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2011, pp 1467–78.

Singer D, ‘Forget the 80/20 principle, with Twitter it is 79/7’, Social Media Today, 25 Feb 2010, http://socialmediatoday.com/ index.php?q=SMC/177538 (accessed 11 Apr 2014).

Slideshare, ‘Media measurement: social media trends by age and country’, 2011, www.slideshare.net/MML_Annabel/ media-measurement-social-media-trends-by-country-and-age (accessed 11 Apr 2014).

Sloam J, ‘Rebooting democracy: youth participation in politics in the UK’, Parliamentary Affairs, 60, 2007.

Social-media-prism, ‘The conversation’, nd, www.google.co.uk/ imgres?imgurl=http://spirdesign.no/wp-content/uploads/2010/ 11/social-media-prism.jpg&imgrefurl=http://spirdesign.no/ blog/webdesignidentitet-og-trender/attachment/social-mediaprism/& h=958&w=1024&sz=301&tbnid=EFQcS2D-zhOj8M:&tb nh=90&tbnw=96&zoom=1&usg=__VXussUcXEMznT42YLhgk 6kOsPIk=&docid=ho9_RAXkIYvcpM&sa=X&ei=9QBXUdeYOiJ0AXdyIHYAg& ved=0CEoQ9QEwAg&dur=47 (accessed 11 Apr 2014).

Tan et al C, ‘User-level sentiment analysis incorporating social networks’ in Proceedings of the 17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2011.

Tumasjan A et al, ‘Election forecasts with Twitter: how 140 characters reflect the political landscape’, Social Science Computer Review, 2010.

Twitter, ‘Terms of Service’, 2012, www.twitter.com/tos (accessed 11 Apr 2014).

Twitter, ‘Twitter Privacy Policy’, 2013, www.twitter.com/privacy (accessed 11 Apr 2014).

Vallina-Rodriguez N et al, ‘Los Twindignados: the rise of the Indignados Movement on Twitter’, in Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on Social Computing (SocialCom), www.cl.cam.ac.uk/~nv240/papers/ twindignados.pdf (accessed 11 Apr 2014).

van Biezen I, Mair P and Poguntke T, ‘Going, going… gone? The decline of party membership in contemporary Europe’, European Journal of Political Research 51, no 1, 2012, pp 24–56.

Wibberley S and Miller C, ‘Detecting events from Twitter: situational awareness in the age of social media’ in Hobbs C, Matthew M and Salisbury D, Open Source Intelligence in the Twenty-first Century: New approaches and opportunities, Palgrave Macmillan, 2014.

Wilson RE, Gosling SD and Graham LT, ‘A review of Facebook research in the social sciences’, Perspectives on Psychological Science 7, no 3, pp 203–20.

Zeng D et al, ‘Social media analytics and intelligence: guest editors’ introduction’, in Proceedings of the IEEE Computer Society, Nov–Dec 2010, p 13.

Posted: **Sun Jun 28, 2015 11:14 pm**

Demos -- Licence to Publish

The work (as defined below) is provided under the terms of this licence ('licence'). The work is protected by copyright and/or other applicable law. Any use of the work other than as authorized under this licence is prohibited. By exercising any rights to the work provided here, you accept and agree to be bound by the terms of this licence. Demos grants you the rights contained here in consideration of your acceptance of such terms and conditions.

1 Definitions

a 'Collective Work' means a work, such as a periodical issue, anthology or encyclopedia, in which the Work in its entirety in unmodified form, along with a number of other contributions, constituting separate and independent works in themselves, are assembled into a collective whole. A work that constitutes a Collective Work will not be considered a Derivative Work (as defined below) for the purposes of this Licence.

b 'Derivative Work' means a work based upon the Work or upon the Work and other preexisting works, such as a musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which the Work may be recast, transformed, or adapted, except that a work that constitutes a Collective Work or a translation from English into another language will not be considered a Derivative Work for the purpose of this Licence.

c 'Licensor' means the individual or entity that offers the Work under the terms of this Licence.

d 'Original Author' means the individual or entity who created the Work.

e 'Work' means the copyrightable work of authorship offered under the terms of this Licence.

f 'You' means an individual or entity exercising rights under this Licence who has not previously violated the terms of this Licence with respect to the Work,or who has received express permission from Demos to exercise rights under this Licence despite a previous violation.

2 Fair Use Rights

Nothing in this licence is intended to reduce, limit, or restrict any rights arising from fair use, first sale or other limitations on the exclusive rights of the copyright owner under copyright law or other applicable laws.

3 Licence Grant

Subject to the terms and conditions of this Licence, Licensor hereby grants You a worldwide, royalty-free, non-exclusive,perpetual (for the duration of the applicable copyright) licence to exercise the rights in the Work as stated below:

a to reproduce the Work, to incorporate the Work into one or more Collective Works, and to reproduce the Work as incorporated in the Collective Works;

b to distribute copies or phonorecords of, display publicly,perform publicly, and perform publicly by means of a digital audio transmission the Work including as incorporated in Collective Works; The above rights may be exercised in all media and formats whether now known or hereafter devised.The above rights include the right to make such modifications as are technically necessary to exercise the rights in other media and formats. All rights not expressly granted by Licensor are hereby reserved.

4 Restrictions

The licence granted in Section 3 above is expressly made subject to and limited by the following restrictions:

a You may distribute, publicly display, publicly perform, or publicly digitally perform the Work only under the terms of this Licence, and You must include a copy of, or the Uniform Resource Identifier for, this Licence with every copy or phonorecord of the Work You distribute, publicly display, publicly perform, or publicly digitally perform.You may not offer or impose any terms on the Work that alter or restrict the terms of this Licence or the recipients’ exercise of the rights granted hereunder. You may not sublicence the Work.You must keep intact all notices that refer to this Licence and to the disclaimer of warranties. You may not distribute, publicly display, publicly perform, or publicly digitally perform the Work with any technological measures that control access or use of the Work in a manner inconsistent with the terms of this Licence Agreement. The above applies to the Work as incorporated in a Collective Work, but this does not require the Collective Work apart from the Work itself to be made subject to the terms of this Licence. If You create a Collective Work, upon notice from any Licencor You must, to the extent practicable, remove from the Collective Work any reference to such Licensor or the Original Author, as requested.

b You may not exercise any of the rights granted to You in Section 3 above in any manner that is primarily intended for or directed toward commercial advantage or private monetary compensation. The exchange of the Work for other copyrighted works by means of digital filesharing or otherwise shall not be considered to be intended for or directed toward commercial advantage or private monetary compensation, provided there is no payment of any monetary compensation in connection with the exchange of copyrighted works.

c If you distribute, publicly display, publicly perform, or publicly digitally perform the Work or any Collective Works, You must keep intact all copyright notices for the Work and give the Original Author credit reasonable to the medium or means You are utilizing by conveying the name (or pseudonym if applicable) of the Original Author if supplied; the title of the Work if supplied. Such credit may be implemented in any reasonable manner; provided, however, that in the case of a Collective Work, at a minimum such credit will appear where any other comparable authorship credit appears and in a manner at least as prominent as such other comparable authorship credit.

5 Representations, Warranties and Disclaimer

a By offering the Work for public release under this Licence, Licensor represents and warrants that, to the best of Licensor’s knowledge after reasonable inquiry:

i Licensor has secured all rights in the Work necessary to grant the licence rights hereunder and to permit the lawful exercise of the rights granted hereunder without You having any obligation to pay any royalties, compulsory licence fees, residuals or any other payments;

ii The Work does not infringe the copyright, trademark, publicity rights, common law rights or any other right of any third party or constitute defamation, invasion of privacy or other tortious injury to any third party.

b except as expressly stated in this licence or otherwise agreed in writing or required by applicable law, the work is licenced on an 'as is'basis, without warranties of any kind, either express or implied including, without limitation, any warranties regarding the contents or accuracy of the work.

6 Limitation on Liability

Except to the extent required by applicable law, and except for damages arising from liability to a third party resulting from breach of the warranties in section 5, in no event will licensor be liable to you on any legal theory for any special, incidental,consequential, punitive or exemplary damages arising out of this licence or the use of the work, even if licensor has been advised of the possibility of such damages.

7 Termination

a This Licence and the rights granted hereunder will terminate automatically upon any breach by You of the terms of this Licence. Individuals or entities who have received Collective Works from You under this Licence, however, will not have their licences terminated provided such individuals or entities remain in full compliance with those licences. Sections 1, 2, 5, 6, 7, and 8 will survive any termination of this Licence.

b Subject to the above terms and conditions, the licence granted here is perpetual (for the duration of the applicable copyright in the Work). Notwithstanding the above, Licensor reserves the right to release the Work under different licence terms or to stop distributing the Work at any time; provided, however that any such election will not serve to withdraw this Licence (or any other licence that has been, or is required to be, granted under the terms of this Licence), and this Licence will continue in full force and effect unless terminated as stated above.

8 Miscellaneous

a Each time You distribute or publicly digitally perform the Work or a Collective Work, Demos offers to the recipient a licence to the Work on the same terms and conditions as the licence granted to You under this Licence.

b If any provision of this Licence is invalid or unenforceable under applicable law, it shall not affect the validity or enforceability of the remainder of the terms of this Licence, and without further action by the parties to this agreement, such provision shall be reformed to the minimum extent necessary to make such provision valid and enforceable.

c No term or provision of this Licence shall be deemed waived and no breach consented to unless such waiver or consent shall be in writing and signed by the party to be charged with such waiver or consent.

d This Licence constitutes the entire agreement between the parties with respect to the Work licensed here. There are no understandings, agreements or representations with respect to the Work not specified here. Licensor shall not be bound by any additional provisions that may appear in any communication from You. This Licence may not be modified without the mutual written agreement of Demos and You.

© Demos 2014

Over the last decade European citizens have gained a digital voice. Close to 350 million people in Europe currently use social networking sites, with more of us signing into a social media platform at least once a day than voted in the last European elections. EU citizens have transferred many aspects of their lives onto these social media platforms, including politics and activism. Taken together, social media represent a new digital commons where people join their social and political lives to those around them.

This paper examines the potential of listening to these digital voices on Twitter, and the consequences for how EU leaders apprehend, respond to and thereby represent their citizens. It looks at how European citizens use Twitter to discuss issues related to the EU and how their digital attitudes and views evolve in response to political and economic crises. It also addresses the many formidable challenges that this new method faces: how far it can be trusted, when it can be used, the value such use could bring and how its use can be publicly acceptable and ethical.

We have never before had access to the millions of voices that together form society’s constant political debate, nor the possibility of understanding them. This report demonstrates how capturing and understanding these citizen voices potentially offers a new way of listening to people, a transformative opportunity to understand what they think, and a crucial opportunity to close the democratic deficit. Jamie Bartlett is Director of the Centre for the Analysis of Social Media (CASM) at Demos. Carl Miller is Research Director at CASM. David Weir is a Professor of Computer Science at the University of Sussex. Jeremy Reffin and Simon Wibberley are Research Fellows in the Department of Informatics at the University of Sussex.

ISBN 978-1-909037-63-2 £10

Altruistic World Online Library

There’s No Comparing Male and Female Harassment Online

Re: There’s No Comparing Male and Female Harassment Online

Re: There’s No Comparing Male and Female Harassment Online

Re: There’s No Comparing Male and Female Harassment Online

Re: There’s No Comparing Male and Female Harassment Online

Re: There’s No Comparing Male and Female Harassment Online