Skip to main content

Understanding Covid-19 Pandemic through Social Media Discussion

Submission Number: 130
Submission ID: 228
Submission UUID: b482ba13-6da9-4325-a459-2fffd9c0c277
Submission URI: /form/project

Created: Sun, 12/05/2021 - 13:17
Completed: Sun, 12/05/2021 - 13:17
Changed: Tue, 08/09/2022 - 15:16

Remote IP address: 74.103.220.121
Submitted by: Gaurav Khanna
Language: English

Is draft: No
Webform: Project
Understanding Covid-19 Pandemic through Social Media Discussion
CAREERS
covid.jpeg
ai (271), data-analysis (422), natural-language-processing (274), programming (5), programming-best-practices (49), python (69)
Complete

Project Leader

Suhong Li
{Empty}
{Empty}

Project Personnel

Suhong Li
Brenna Rojek
{Empty}

Project Information

Dr. Li has been collecting covid-19 tweets since March 2020 and currently has about 1.2 billion tweets. She is still collecting the tweets and expects to have more in the future. This project focuses on the understanding of the impact of covid-19 pandemic through social media discussion on Twitter. The following topics will be explored: 1). What are the top topics discussed regarding covid-19? How has the discussion of the topics changed over time? 2). What is sentiment/emotion of the topic by time, location, and gender? and 3). How to identify misinformation/fake news about covid-19.

The student will work on this project from start to finish using various data analytic methodology including data exploration, topic modelling, natural language processing and machine learning.

Project Information Subsection

{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
Bryant University
{Empty}
CR-University of Rhode Island
{Empty}
No
Already behind3Start date is flexible
{Empty}
03/09/2022
07/20/2022
  • Milestone Title: Milestone #1
    Milestone Description: Student learns about NLP and other needed libraries/packages; launch presentation; sets up github repo.
    Completion Date Goal: 2022-03-01
    Actual Completion Date: 2022-03-01
  • Milestone Title: Milestone #2
    Milestone Description: Student reviews the twitter data set and formats the data for use by the data mining, NLP, etc. software.
    Completion Date Goal: 2022-04-01
    Actual Completion Date: 2022-04-01
  • Milestone Title: Milestone #3
    Milestone Description: Student performs extensive analysis of the formatted data using ML techniques. The following trends will be studied:
    What are the top topics/themes discussed regarding covid-19? How have top topics changed by time, location, and gender?
    What is the sentiment/emotion of the topic by time, location, and gender?
    Completion Date Goal: 2022-06-01
    Actual Completion Date: 2022-06-01
  • Milestone Title: Milestone #4
    Milestone Description: Student works with faculty to interpret the results and writes a short report.
    Completion Date Goal: 2022-07-01
    Actual Completion Date: 2022-07-20
  • Milestone Title: Milestone #5
    Milestone Description: The student presents the results in a Zoom "wrap" presentation; contributes developed code/script/documentation to the github repo.
    Completion Date Goal: 2022-07-31
    Actual Completion Date: 2022-07-20
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}
{Empty}

Final Report

This project focuses on the understanding of the impact of the covid-19 pandemic through social media discussion on Twitter and explore a dataset of over 13 million tweets with the keywords related to covid-19 and ‘vaccine’ or ‘vax’, spanning from March 2020 to February 2022. Due to the size of the data, the analysis was done on the Unity cluster. Various analysis, including topic modelling and emotion analysis were conducted to understand how the topic of the vaccine was discussed in Twitter, how the discussion of the topics changed over time and what is people’s emotion regarding this topic and how it differs by time and location.

The project explores the possibility/challenges of running state of the art natural language processing algorithm on a big data set using HPC.
This project contributes to our knowledge in the field of psychology and health care. The result of this project will provide insights on people’s attitude and emotion toward covid-19 vaccination, how such emotion differs by time and location. This finding helps understand the psychological impact of the pandemic and may facilitate the adoption of covid-19 vaccination.
None
{Empty}
None
None
None
As mentioned previously, the project is timely and will deepen our understanding of the impact of covid-19 pandemic by identifying dominant topics discussed and people’s emotions associated with this topic.
The student (Brenna Rojek) working on this project was able to learn start-of-art natural language processing algorithms and learn to use GPU cluster. Due to the large data size, it takes a very long time (more than one week) to process all data. A better approach needs to be developed to scale the data better in the future.
The four emotions (joy, optimism, sadness, and anger) were extracted from each tweet using Huggingface Carddiff NLP emotion model. The results show the dominant emotion regarding covid1-19 are anger and sadness. In addition, people’s emotion toward covid-19 vaccination change over time. There is a substantial increase in anger since August 2021 toward the discussion of covid-19 vaccination. In addition, some states (Arizona, Wyoming, and Florida) also show a higher level of anger compared to other states.

https://public.tableau.com/app/profile/brenna.rojek/viz/shared/KYCRFGDWT