Trial Outcomes & Findings for Artificial Intelligent Clinical Decision Support System Simulation Center Study for Technology Acceptance (NCT NCT05816473)

NCT ID: NCT05816473

Last Updated: 2026-05-22

Results Overview

The study will use a common set of dependent variables to assess baseline and post-intervention attitudes towards machine learning algorithms in clinical care using an adapted Unified Theory of Acceptance and Use of Technology (UTAUT) survey assessing perceived usefulness of the system, perceived ease of use, attitudes towards using it, behavioral intentions, and trust, measured with a 5-point Likert scale. Percent change in UTAUT survey response between Large Language Model-based Interaction and Machine Learning Dashboard at recruitment prior to administration of scenarios and immediately after completion of scenarios. The difference in time between the two will be approximately 60 minutes. Higher change indicates greater acceptance/intention to use the GutGPT+Dashboard.

Recruitment status

COMPLETED

Study phase

NA

Target enrollment

108 participants

Primary outcome timeframe

Approximately 60 minutes

Results posted on

2026-05-22

Participant Flow

Participant milestones

Participant milestones
Measure
Large Language Model-based Interaction
LLM-powered chatbot with the machine learning dashboard to provide the risk assessment and provide rationale based on interpretability metrics provided by the dashboard in which study participants can directly interact with using natural language. Participants will be provided the Generative Pre-trained Transformer (GPT) chatbot powered machine learning model dashboard.
Machine Learning Dashboard
Machine learning algorithm output with an interactive dashboard that can be used to explain, or interpret the input factors that contribute most towards the generated risk score. Participants will have access to the machine learning dashboard only.
Overall Study
STARTED
52
56
Overall Study
COMPLETED
52
54
Overall Study
NOT COMPLETED
0
2

Reasons for withdrawal

Reasons for withdrawal
Measure
Large Language Model-based Interaction
LLM-powered chatbot with the machine learning dashboard to provide the risk assessment and provide rationale based on interpretability metrics provided by the dashboard in which study participants can directly interact with using natural language. Participants will be provided the Generative Pre-trained Transformer (GPT) chatbot powered machine learning model dashboard.
Machine Learning Dashboard
Machine learning algorithm output with an interactive dashboard that can be used to explain, or interpret the input factors that contribute most towards the generated risk score. Participants will have access to the machine learning dashboard only.
Overall Study
Lost to Follow-up
0
2

Baseline Characteristics

Not collected

Baseline characteristics by cohort

Baseline characteristics by cohort
Measure
Large Language Model-based Interaction
n=52 Participants
LLM-powered chatbot with the machine learning dashboard to provide the risk assessment and provide rationale based on interpretability metrics provided by the dashboard in which study participants can directly interact with using natural language. Participants will be provided the Generative Pre-trained Transformer (GPT) chatbot powered machine learning model dashboard.
Machine Learning Dashboard
n=54 Participants
Machine learning algorithm output with an interactive dashboard that can be used to explain, or interpret the input factors that contribute most towards the generated risk score. Participants will have access to the machine learning dashboard only.
Total
n=106 Participants
Total of all reporting groups
Age, Customized
18 - 24 years
2 Participants
n=52 Participants
0 Participants
n=54 Participants
2 Participants
n=106 Participants
Age, Customized
25-29 years
28 Participants
n=52 Participants
32 Participants
n=54 Participants
60 Participants
n=106 Participants
Age, Customized
30-34 years
15 Participants
n=52 Participants
16 Participants
n=54 Participants
31 Participants
n=106 Participants
Age, Customized
35-39 years
6 Participants
n=52 Participants
4 Participants
n=54 Participants
10 Participants
n=106 Participants
Age, Customized
40-44 years
1 Participants
n=52 Participants
1 Participants
n=54 Participants
2 Participants
n=106 Participants
Age, Customized
45-49 years
0 Participants
n=52 Participants
1 Participants
n=54 Participants
1 Participants
n=106 Participants
Sex: Female, Male
Female
9 Participants
n=52 Participants
6 Participants
n=54 Participants
15 Participants
n=106 Participants
Sex: Female, Male
Male
43 Participants
n=52 Participants
48 Participants
n=54 Participants
91 Participants
n=106 Participants
Race (NIH/OMB)
American Indian or Alaska Native
0 Participants
n=52 Participants
0 Participants
n=54 Participants
0 Participants
n=106 Participants
Race (NIH/OMB)
Asian
7 Participants
n=52 Participants
17 Participants
n=54 Participants
24 Participants
n=106 Participants
Race (NIH/OMB)
Native Hawaiian or Other Pacific Islander
0 Participants
n=52 Participants
0 Participants
n=54 Participants
0 Participants
n=106 Participants
Race (NIH/OMB)
Black or African American
6 Participants
n=52 Participants
7 Participants
n=54 Participants
13 Participants
n=106 Participants
Race (NIH/OMB)
White
34 Participants
n=52 Participants
26 Participants
n=54 Participants
60 Participants
n=106 Participants
Race (NIH/OMB)
More than one race
0 Participants
n=52 Participants
0 Participants
n=54 Participants
0 Participants
n=106 Participants
Race (NIH/OMB)
Unknown or Not Reported
5 Participants
n=52 Participants
4 Participants
n=54 Participants
9 Participants
n=106 Participants
Ethnicity (NIH/OMB)
Hispanic or Latino
0 Participants
Not collected
Ethnicity (NIH/OMB)
Not Hispanic or Latino
0 Participants
Not collected
Ethnicity (NIH/OMB)
Unknown or Not Reported
0 Participants
Not collected
Training level
Residency
40 Participants
n=52 Participants
41 Participants
n=54 Participants
81 Participants
n=106 Participants
Training level
Medical Student
12 Participants
n=52 Participants
13 Participants
n=54 Participants
25 Participants
n=106 Participants
Familiarity with Artificial Intelligence (AI)
Some AI Coursework
3 Participants
n=52 Participants
4 Participants
n=54 Participants
7 Participants
n=106 Participants
Familiarity with Artificial Intelligence (AI)
Not at all or slightly
42 Participants
n=52 Participants
41 Participants
n=54 Participants
83 Participants
n=106 Participants
Familiarity with Artificial Intelligence (AI)
Unknown/Did not answer
7 Participants
n=52 Participants
9 Participants
n=54 Participants
16 Participants
n=106 Participants
Mean baseline Unified Theory of Acceptance and Use of Technology (UTAUT) survey score
Behavioral Intention
3.3 score on a scale
STANDARD_DEVIATION 0.1 • n=52 Participants
3.5 score on a scale
STANDARD_DEVIATION 0.1 • n=54 Participants
3.4 score on a scale
STANDARD_DEVIATION 0.1 • n=106 Participants
Mean baseline Unified Theory of Acceptance and Use of Technology (UTAUT) survey score
Performance Expectancy
3.4 score on a scale
STANDARD_DEVIATION 0.1 • n=52 Participants
3.6 score on a scale
STANDARD_DEVIATION 0.1 • n=54 Participants
3.5 score on a scale
STANDARD_DEVIATION 0.1 • n=106 Participants
Mean baseline Unified Theory of Acceptance and Use of Technology (UTAUT) survey score
Effort Expectancy
2.8 score on a scale
STANDARD_DEVIATION 0.1 • n=52 Participants
3.0 score on a scale
STANDARD_DEVIATION 0.1 • n=54 Participants
2.9 score on a scale
STANDARD_DEVIATION 0.1 • n=106 Participants
Mean baseline Unified Theory of Acceptance and Use of Technology (UTAUT) survey score
Social Influence
3.4 score on a scale
STANDARD_DEVIATION 0.1 • n=52 Participants
3.7 score on a scale
STANDARD_DEVIATION 0.1 • n=54 Participants
3.6 score on a scale
STANDARD_DEVIATION 0.1 • n=106 Participants
Mean baseline Unified Theory of Acceptance and Use of Technology (UTAUT) survey score
Facilitating Conditions
2.8 score on a scale
STANDARD_DEVIATION 0.1 • n=52 Participants
2.7 score on a scale
STANDARD_DEVIATION 0.1 • n=54 Participants
2.8 score on a scale
STANDARD_DEVIATION 0.1 • n=106 Participants

PRIMARY outcome

Timeframe: Approximately 60 minutes

The study will use a common set of dependent variables to assess baseline and post-intervention attitudes towards machine learning algorithms in clinical care using an adapted Unified Theory of Acceptance and Use of Technology (UTAUT) survey assessing perceived usefulness of the system, perceived ease of use, attitudes towards using it, behavioral intentions, and trust, measured with a 5-point Likert scale. Percent change in UTAUT survey response between Large Language Model-based Interaction and Machine Learning Dashboard at recruitment prior to administration of scenarios and immediately after completion of scenarios. The difference in time between the two will be approximately 60 minutes. Higher change indicates greater acceptance/intention to use the GutGPT+Dashboard.

Outcome measures

Outcome measures
Measure
Large Language Model-based Interaction (GutGPT+ Dashboard)
n=52 Participants
LLM-powered chatbot with the machine learning dashboard to provide the risk assessment and provide rationale based on interpretability metrics provided by the dashboard in which study participants can directly interact with using natural language. Participants will be provided the Generative Pre-trained Transformer (GPT) chatbot powered machine learning model dashboard.
Machine Learning Dashboard
n=54 Participants
Machine learning algorithm output with an interactive dashboard that can be used to explain, or interpret the input factors that contribute most towards the generated risk score. Participants will have access to the machine learning dashboard only.
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Behavioral intentions
0.0 units on a scale
Interval 0.0 to 0.3
0.0 units on a scale
Interval 0.0 to 0.3
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Performance Expectancy
0.0 units on a scale
Interval 0.0 to 0.3
0.3 units on a scale
Interval 0.0 to 0.5
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Effort Expectancy
0.6 units on a scale
Interval 0.3 to 1.0
0.3 units on a scale
Interval 0.0 to 0.5
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Social influence
0.0 units on a scale
Interval 0.0 to 0.3
0.0 units on a scale
Interval 0.0 to 0.3
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Facilitating conditions
0.1 units on a scale
Interval 0.0 to 0.3
0.0 units on a scale
Interval 0.0 to 0.3
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Trust
0.2 units on a scale
Interval 0.1 to 0.6
0.4 units on a scale
Interval 0.2 to 0.8
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Benefit
0.2 units on a scale
Interval 0.0 to 0.5
0.2 units on a scale
Interval 0.2 to 0.5
Median Change in Attitudes Towards Machine Learning Algorithms in Clinical Care Using UTAUT
Risk
-0.1 units on a scale
Interval -0.3 to 0.0
-0.1 units on a scale
Interval -0.4 to 0.0

SECONDARY outcome

Timeframe: Approximately 60 minutes

Mean percentage of decision accuracy per participant. Accuracy is defined as the percentage of times participants accurately choose the correct clinical decision for each simulation scenario of acute upper GI bleeding for each treatment condition. Immediately after completion of scenarios (60 minutes from initiation of study for each participant). No further follow up afterwards.

Outcome measures

Outcome measures
Measure
Large Language Model-based Interaction (GutGPT+ Dashboard)
n=52 Participants
LLM-powered chatbot with the machine learning dashboard to provide the risk assessment and provide rationale based on interpretability metrics provided by the dashboard in which study participants can directly interact with using natural language. Participants will be provided the Generative Pre-trained Transformer (GPT) chatbot powered machine learning model dashboard.
Machine Learning Dashboard
n=54 Participants
Machine learning algorithm output with an interactive dashboard that can be used to explain, or interpret the input factors that contribute most towards the generated risk score. Participants will have access to the machine learning dashboard only.
Clinician Decision Making of Triage of GI Bleeding
91.7 percent accuracy per participant
Standard Deviation 27.9
92.1 percent accuracy per participant
Standard Deviation 27.2

Adverse Events

Large Language Model-based Interaction

Serious events: 0 serious events
Other events: 0 other events
Deaths: 0 deaths

Machine Learning Dashboard

Serious events: 0 serious events
Other events: 0 other events
Deaths: 0 deaths

Serious adverse events

Adverse event data not reported

Other adverse events

Adverse event data not reported

Additional Information

Sunny Chung

Yale School of Medicine

Phone: 203-824-1459

Results disclosure agreements

  • Principal investigator is a sponsor employee
  • Publication restrictions are in place