ASN's Mission

ASN leads the fight to prevent, treat, and cure kidney diseases throughout the world by educating health professionals and scientists, advancing research and innovation, communicating new knowledge, and advocating for the highest quality care for patients.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005


The Latest on Twitter

Kidney Week

Abstract: PO0910

Machine Learning Classification of Tweets for Patient Dialysis Experience

Session Information

Category: Dialysis

  • 701 Dialysis: Hemodialysis and Frequent Dialysis


  • Leidner, Alexander S., Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
  • Gay, Hawkins, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States
  • Ho, Bing, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States

Popular microblog (e.g. Twitter, Facebook) services provide a continuous stream of public health information. This data has been used to monitor viral spread, medication adherence, and false health information. There are thousands of posts on Twitter daily regarding personal dialysis experience, access, and side effects. While these posts include valuable public health information, evaluating these posts to meaningfully assist dialysis patients is difficult as there are even more tweets mentioning dialysis in a professional context. We aimed to modify a state of the art natural language model to classify posts about dialysis as personal or professional.


We filtered posts containing the word dialysis. Posts were manually labeled as personal or professional by a nephrologist depending on the context dialysis was mentioned. The data was randomized and split for 60% training, 20% validation, and 20% testing. The text was preprocessed to remove extraneous characters and input into a Bidirectional Encoder Representations from Transformers (BERT) model for fine tuning, and a term frequency inverse document frequency vectorized Multinomial Naive Bayes Classifier.


We collected 6011 tweets from May 3, 2021 to May 14, 2021.1000 tweets were randomized and labeled. 57% were categorized as professional. BERT and Naive Bayes models attained 88% and 82% accuracy, respectively, on the testing data. The BERT model classified far less false negatives with a small increase in false positives (Figure 1).


BERT's semantically rich word embeddings can enhance social media mining algorithms on dialysis content. We show superiority of a BERT model over a traditional count-based language model. This method can be easily applied as a pre-processing step to remove noisy posts to better study dialysis and other health trends in social media. This novel processing task and pipeline have broad clinical and public health implications for reducing the amount of data and time required for accurate, real-time monitoring of patient level posts.