Abstract: FR-PO0024
Accuracy of Large Language Model Chatbots for Hemodialysis Meal Planning
Session Information
- Artificial Intelligence and Digital Health at the Bedside
November 07, 2025 | Location: Exhibit Hall, Convention Center
Abstract Time: 10:00 AM - 12:00 PM
Category: Artificial Intelligence, Digital Health, and Data Science
- 300 Artificial Intelligence, Digital Health, and Data Science
Authors
- Shi, Kevin Xin, University of California San Francisco, San Francisco, California, United States
- Hamdan, Hiba, University of California Davis, Davis, California, United States
- Cheng, Elizabeth, University of California Berkeley, Berkeley, California, United States
- Tuot, Delphine S., University of California San Francisco, San Francisco, California, United States
Background
For hemodialysis patients, nutritional counseling is key. Personalized and feasible nutrition counseling is challenging, as dietary habits are shaped by factors such as cultural background and budget. Large language model (LLM) chatbots can potentially improve nutritional counseling, but the accuracy of these tools in this context is unknown.
Methods
Four LLMs, ChatGPT-o3-mini (OpenAI), Claude Sonnet 3.7 (Anthropic), Gemini 2.5 Flash Thinking Experimental (Google), and Llama 3.1 (Meta) were asked to make a culturally-concordant one day meal plan with specified portions and nutrients (calories, protein, fiber, calcium, phosphorus, potassium, and sodium) for 50 simulated hemodialysis patients generated from national US demographic, biometric, and socio-economic data. Nutrient content of LLM meal plans were compared to validated nutrition databases (e.g. USDA, AUSNUT).
Results
The simulated population had a mean age of 63 years, 58% had diabetes, 82% had fixed incomes, and 60% had ethnic cuisine preferences. The stated nutritional content of chatbot meal plans was generally inaccurate. The nutrient components (e.g. calories, protein) of most LLM meal plans fell beyond a 10% error margin compared to reference values more than 50% of the time (Table). Accuracy was worse for micronutrients compared to macronutrients. All LLMs underestimated phosphorus and potassium content in meal plans (Figure).
Conclusion
LLMs failed to generate nutritionally accurate meal plans for hemodialysis patients. Future projects should emphasize focusing model searches on validated data sources.
Accuracy (within 10%) of LLM Outputs
| Calories | Protein | Fiber | Calcium | Phosphorus | Potassium | Sodium | |
| ChatGPT | 42% | 52% | 28% | 9% | 24% | 41% | 18% |
| Claude | 38% | 34% | 24% | 8% | 12% | 10% | 16% |
| Gemini | 28% | 50% | 32% | 10% | 12% | 8% | 24% |
| Llama | 12% | 28% | 26% | 36% | 6% | 20% | 8% |
Funding
- Other NIH Support