240117 Elda Laïson

Identification of potential Lyme disease cases using self-reported worldwide tweets: A deep learning modeling approach enhanced with sentimental words via emojis.

Speaker: Elda Laïson, University of Montreal

Date and Time: Wednesday, January 17, 2024 - 2:00pm to 3:00pm

Abstract

Background: Effective surveillance for Lyme disease, a disease commonly transmitted by ticks worldwide, necessitates prompt medical diagnosis and precise laboratory testing. Web-based data sources could be used to enhance surveillance.

To better understand Twitter's potential and limits as a tool for Lyme disease surveillance, we evaluate data from Twitter users worldwide. Additionally, we suggest using self-reported tweets to identify possible Lyme disease cases using a transformer-based classification method.

Methods: 20,000 tweets from throughout the globe were selected for our initial sample from a database containing over 1.3 million tweets about the Lyme disease. Following the preprocessing and geolocation of tweets, a portion of the original sample's tweets were manually classified using terms that were carefully chosen as either possible Lyme disease cases or not. We transformed the emojis in these tweets to sentiment words in order to address their use, and then we replaced them in the tweets where needed. The DistilBERT, ALBERT, and BERTweet classifiers were then trained, validated, and tested on this set of labeled tweets

Results: The empirical findings demonstrated that, with the highest average F1-score of 89.3%, classification accuracy of 90.0%, and precision of 97.1%, BERTweet is the best classifier out of all the examined models. Recall, on the other hand, was better for TF-IDF and k-NN, with 93.2% and 82.6%, respectively. Emoji enrichment of the tweet embeddings resulted in an 8% increase in recall for BERTweet; DistilBERT showed a much higher F1-score of 93.8% (+4%) and a 94.1% (+4%) classification accuracy, whereas ALBERT showed an F1-score of 93.1% (5%) and a 93.9% (+5%) classification accuracy.

Conclusions: The study highlights the robustness of BERTweet and DistilBERT as classifiers for potential cases of Lyme disease from self-reported data. The results show that emojis are effective for enriching features, thereby improving the accuracy of tweet embedding and the performance of the classifiers. Specifically, emoji that reflect sadness, empathy, and encouragement can reduce false negatives.

RELATED ARTICLESExplain
EIDM 
Networks
MfPH
MfPH – Training
2023-2024 MfPH Next Generation Seminar Series
240117 Elda Laïson
Université de Montréal
Lyme disease
Mining and Summarization of Early Warning Pandemic Signals
Signal Detection from Social Media
Social networks
Machine learning
230809 Jane Knochel
230823 Arvin Vaziry
231011 Rado Ramasy
231025 James Watmough
231108 Julien Arino
231206 Mortaza Baky Haskuee
240131 Zahra Mohammadi
240214 William Ruth
240228 Tanya Philippsen
Graph of this discussion
Enter the title of your article


Enter a short (max 500 characters) summation of your article
Enter the main body of your article
Lock
+Comments (0)
+Citations (0)
+About
Enter comment

Select article text to quote
welcome text

First name   Last name 

Email

Skip