喵ID:xwatl2免责声明

Evaluating ChatGPT responses on thyroid nodules for patient education.

评估 ChatGPT 对甲状腺结节的反应以进行患者教育。

基本信息

DOI:
10.1089/thy.2023.0491
发表时间:
2023
期刊:
Thyroid : official journal of the American Thyroid Association
影响因子:
--
通讯作者:
Elizabeth Cottrill
中科院分区:
文献类型:
--
作者: Daniel J Campbell;Leonard E Estephan;Elliott Sina;Eric V Mastrolonardo;Rahul Alapati;Dev R Amin;Elizabeth Cottrill研究方向: -- MeSH主题词: --
关键词: --
来源链接:pubmed详情页地址

文献摘要

BACKGROUND ChatGPT, an artificial intelligence (AI) chatbot, is the fastest growing consumer application in history. Given recent trends identifying increasing patient use of Internet sources for self-education, we seek to evaluate the quality of ChatGPT-generated responses for patient education on thyroid nodules. METHODS ChatGPT was queried 4 times with 30 identical questions. Queries differed by initial chatbot prompting: no prompting, patient-friendly prompting, 8th-grade level prompting, and prompting for references. Answers were scored on a hierarchical score: incorrect, partially correct, correct, or correct with references. Proportions of responses at incremental score thresholds were compared by prompt type using chi-squared analysis. Flesch-Kincaid grade level was calculated for each answer. The relationship between prompt type and grade level was assessed using analysis of variance. References provided within ChatGPT answers were totaled and analyzed for veracity. RESULTS Across all prompts (n=120 questions), 83 answers (69.2%) were at least correct. Proportions of responses that were at least partially correct (p=0.795) and correct (p=0.402) did not differ by prompt; responses that were correct with references did (p<0.0001). Responses from 8th-grade level prompting were the lowest mean grade level (13.43 ± 2.86) and were significantly lower than no prompting (14.97 ± 2.01, p=0.01) and prompting for references (16.43 ± 2.05, p<0.0001). Prompting for references generated 80/80 (100%) of referenced publications within answers. Seventy references (87.5%) were legitimate citations, and 58/80 (72.5%) provided accurately reported information from the referenced publications. CONCLUSION ChatGPT overall provides appropriate answers to most questions on thyroid nodules regardless of prompting. Despite targeted prompting strategies, ChatGPT reliably generates responses corresponding to grade levels well-above accepted recommendations for presenting medical information to patients. Significant rates of AI hallucination may preclude clinicians from recommending the current version of ChatGPT as an educational tool for patients at this time.
背景 ChatGPT是一种人工智能聊天机器人,是历史上增长最快的消费应用程序。鉴于近期发现患者越来越多地使用互联网资源进行自我教育的趋势,我们试图评估ChatGPT针对甲状腺结节患者教育所生成回答的质量。 方法 用30个相同的问题对ChatGPT进行了4次询问。询问因初始聊天机器人提示方式不同而有所差异:无提示、患者友好型提示、八年级水平提示以及要求提供参考文献的提示。答案按等级评分:不正确、部分正确、正确或正确且有参考文献。使用卡方分析按提示类型比较不同得分阈值的回答比例。计算每个答案的弗莱施 - 金凯德年级水平。使用方差分析评估提示类型与年级水平之间的关系。对ChatGPT答案中提供的参考文献进行汇总并分析其真实性。 结果 在所有提示(共120个问题)中,83个答案(69.2%)至少是正确的。至少部分正确(p = 0.795)和正确(p = 0.402)的回答比例不因提示方式而有差异;正确且有参考文献的回答则有差异(p < 0.0001)。八年级水平提示的回答平均年级水平最低(13.43 ± 2.86),且显著低于无提示(14.97 ± 2.01,p = 0.01)和要求提供参考文献的提示(16.43 ± 2.05,p < 0.0001)。要求提供参考文献的提示在答案中生成了80/80(100%)的参考文献。70个参考文献(87.5%)是合理引用,且58/80(72.5%)准确提供了参考文献中的信息。 结论 无论提示方式如何,ChatGPT总体上对大多数甲状腺结节问题都能提供恰当的答案。尽管有针对性的提示策略,但ChatGPT生成的回答对应的年级水平远高于向患者呈现医学信息的公认建议水平。人工智能产生幻觉的比例较高,可能会使临床医生目前不推荐将当前版本的ChatGPT作为患者的教育工具。
参考文献(1)
被引文献(12)

数据更新时间:{{ references.updateTime }}

Elizabeth Cottrill
通讯地址:
--
所属机构:
--
电子邮件地址:
--
免责声明免责声明
1、猫眼课题宝专注于为科研工作者提供省时、高效的文献资源检索和预览服务;
2、网站中的文献信息均来自公开、合规、透明的互联网文献查询网站,可以通过页面中的“来源链接”跳转数据网站。
3、在猫眼课题宝点击“求助全文”按钮,发布文献应助需求时求助者需要支付50喵币作为应助成功后的答谢给应助者,发送到用助者账户中。若文献求助失败支付的50喵币将退还至求助者账户中。所支付的喵币仅作为答谢,而不是作为文献的“购买”费用,平台也不从中收取任何费用,
4、特别提醒用户通过求助获得的文献原文仅用户个人学习使用,不得用于商业用途,否则一切风险由用户本人承担;
5、本平台尊重知识产权,如果权利所有者认为平台内容侵犯了其合法权益,可以通过本平台提供的版权投诉渠道提出投诉。一经核实,我们将立即采取措施删除/下架/断链等措施。
我已知晓