ASN's Mission

To create a world without kidney diseases, the ASN Alliance for Kidney Health elevates care by educating and informing, driving breakthroughs and innovation, and advocating for policies that create transformative changes in kidney medicine throughout the world.

learn more

Contact ASN

1401 H St, NW, Ste 900, Washington, DC 20005

email@asn-online.org

202-640-4660

The Latest on X

Kidney Week

Abstract: FR-PO0024

Accuracy of Large Language Model Chatbots for Hemodialysis Meal Planning

Session Information

Category: Artificial Intelligence, Digital Health, and Data Science

  • 300 Artificial Intelligence, Digital Health, and Data Science

Authors

  • Shi, Kevin Xin, University of California San Francisco, San Francisco, California, United States
  • Hamdan, Hiba, University of California Davis, Davis, California, United States
  • Cheng, Elizabeth, University of California Berkeley, Berkeley, California, United States
  • Tuot, Delphine S., University of California San Francisco, San Francisco, California, United States
Background

For hemodialysis patients, nutritional counseling is key. Personalized and feasible nutrition counseling is challenging, as dietary habits are shaped by factors such as cultural background and budget. Large language model (LLM) chatbots can potentially improve nutritional counseling, but the accuracy of these tools in this context is unknown.

Methods

Four LLMs, ChatGPT-o3-mini (OpenAI), Claude Sonnet 3.7 (Anthropic), Gemini 2.5 Flash Thinking Experimental (Google), and Llama 3.1 (Meta) were asked to make a culturally-concordant one day meal plan with specified portions and nutrients (calories, protein, fiber, calcium, phosphorus, potassium, and sodium) for 50 simulated hemodialysis patients generated from national US demographic, biometric, and socio-economic data. Nutrient content of LLM meal plans were compared to validated nutrition databases (e.g. USDA, AUSNUT).

Results

The simulated population had a mean age of 63 years, 58% had diabetes, 82% had fixed incomes, and 60% had ethnic cuisine preferences. The stated nutritional content of chatbot meal plans was generally inaccurate. The nutrient components (e.g. calories, protein) of most LLM meal plans fell beyond a 10% error margin compared to reference values more than 50% of the time (Table). Accuracy was worse for micronutrients compared to macronutrients. All LLMs underestimated phosphorus and potassium content in meal plans (Figure).

Conclusion

LLMs failed to generate nutritionally accurate meal plans for hemodialysis patients. Future projects should emphasize focusing model searches on validated data sources.

Accuracy (within 10%) of LLM Outputs
 CaloriesProteinFiberCalciumPhosphorusPotassiumSodium
ChatGPT42%52%28%9%24%41%18%
Claude38%34%24%8%12%10%16%
Gemini28%50%32%10%12%8%24%
Llama12%28%26%36%6%20%8%

Funding

  • Other NIH Support

Digital Object Identifier (DOI)