Flirtation score indicates if a post contains language commonly used in pickup lines, compliments regarding appearance, or subtle sexual innuendo.įor more information about how these scores are calculated and evaluated, see the Perspective API documentation.Sexually Explicit score indicates if a post contains references to sexual acts, body parts, or other lewd content.Threat score represents the degree to which a post displays an intention to inflict pain or violence against an individual or group.Insult score helps to identify insulting or inflammatory posts.Identity Attack score indicates if a post contains hateful language targeting someone because of their identity.Profanity score indicates if swear words or other profane language is used.Severe Toxicity score represents how hateful, aggressive, and disrespectful the comment is.Toxicity score represents the degree to which the comment is rude, disrespectful, or unreasonable.Once tweets in the thread were collected, Communalytic assigned the following toxicity scores/attributes (with values ranging from 0 to 1) to each tweet in the dataset, as calculated by Perpective: See our online tutorial on how to run a Perspective API toxicity analysis in Communalytic.) (If the bulk of the posts in your dataset are not in one of these 7 languages, you should not run a toxicity analysis on your data as the toxicity scores it generates will be highly unreliable. Since the majority of posts in our dataset (83%, 247,975 of 298,172) were written in English (as detected by Twitter), we will select English in the drop down menu in Communalytic. When you run this analysis in Communalytic, you will need to specify what is the primary language of the majority of posts in your dataset. Currently, Perspective API can analyze posts in one of 7 different languages (English, Spanish, French, German, Portuguese, Italian, Russian). This API relies on machine learning models to score the perceived impact a post might have on a conversation and assigns toxicity scores to each post. (For information about how to collect Twitter data with Communalytic, see our online tutorial here and for the full list of available metadata elements for a Tweet, see the Twitter API documentation.) Toxicity Analysis with Perspective APIĬommunalytic uses Google’s Perspective API (Perspective API) to calculate and assign toxicity scores. Each tweet in the dataset includes a number of metadata attributes provided by the Twitter API, such as information about who and when it was posted, in what language, and how many other users engaged with it (retweeted or favorited). In the instance case, our dataset consists of 298,172 replies to Donald Trump’s tweet announcing his COVID-19 diagnosis, posted on October 2 nd between 6:00am and 12:30pm (ET). It uses advanced text and social network analysis techniques to automatically pinpoint toxic and anti-social interactions, identify influencers, map shared interests and the spread of misinformation, and detect signs of possible coordination among seemingly disparate actors. Communalytic can collect and analyze public data from various social media platforms including Reddit, Twitter, and Facebook/Instagram (via CrowdTangle). To collect and analyze this Twitter thread, we used Communalytic, a research tool for studying online communities and online discourse. Within seconds, his tweet received thousands of replies on Twitter.įigure 1: Donald Trump’s tweet announcing his COVID-19 diagnosis President Donald Trump tweeted that he and the First Lady of the United States (FLOTUS), Melania Trump, both tested positive for COVID-19 (see Figure 1). This post will also show users how to explore the potential relationship between the toxicity scores of individual tweets and the likelihood that a tweet will still be publicly available days after the original data collection. As this is a tutorial, please note that any findings noted in this post are for illustrative purposes only. The aim of this tutorial is to introduce users to a few analyses that can be performed with Communalytic and a Python library called twarc. Interestingly, there was no clear connection between posting toxic tweets and an account being suspended by Twitter. We also found that tweets that are blocked by Twitter have, on average, higher toxicity scores than (1) tweets that are still publicly available on the platform or (2) tweets posted by suspended or protected accounts. We found that most of the toxicity scores produced by Perspective API are highly correlated. In this tutorial, we conducted an exploratory toxicity analysis of a Twitter dataset consisting of 298,172 replies to a Donald Trump’s tweet announcing his positive COVID-19 diagnosis.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |