Trial Outcomes & Findings for Project 3 Example: Human-AI Collaboration Tester (HAICT) Exp. 7 (NCT NCT05272189)

NCT ID: NCT05272189

Last Updated: 2026-01-20

Results Overview

D' (d-prime) is the signal detection theory measure of the level of performance on a task. It is computed by calculating the proportion of true positive responses =(true positive trials)/(true positive + false negative trials) = p(TP) and by calculating the proportion of false positive responses =(false positive trials)/(false positive + true negative trials) = p(FP). These values are transformed into 'z-scores' (for example, using NORMSINV in Excel to calculate the inverse of the standard normal distribution). D' is defined as Z(TP)-Z(FP). Its range is from 0 for cases where no signal can be discriminated from the noise, to \~4.0. The upper limit is not defined, but 4 would mean that and observer is essentially perfect at discriminating signal from noise.

Recruitment status

COMPLETED

Study phase

NA

Target enrollment

12 participants

Primary outcome timeframe

Data are collected within a session of about an hour.

Results posted on

2026-01-20

Participant Flow

Participant milestones

Participant milestones
Measure
Experiment
All participants are tested in all conditions of this experiment. Simulated Second Reader AI: In this experiment, in some conditions, the participant makes their decision in the presence of information about a simulated artificial intelligence decision. Target Prevalence: The frequency with which targets are presented varies from 10% to 90%
Overall Study
STARTED
12
Overall Study
COMPLETED
12
Overall Study
NOT COMPLETED
0

Reasons for withdrawal

Withdrawal data not reported

Baseline Characteristics

Project 3 Example: Human-AI Collaboration Tester (HAICT) Exp. 7

Baseline characteristics by cohort

Baseline characteristics by cohort
Measure
Experiment
n=12 Participants
All participants are tested in all conditions of this experiment. Simulated Second Reader AI: In this experiment, in some conditions, the participant makes their decision in the presence of information about a simulated artificial intelligence decision. Target Prevalence: The frequency with which targets are presented varies from 10% to 90%
Age, Continuous
31 years
STANDARD_DEVIATION 9 • n=37 Participants
Sex: Female, Male
Female
7 Participants
n=37 Participants
Sex: Female, Male
Male
5 Participants
n=37 Participants
Ethnicity (NIH/OMB)
Hispanic or Latino
0 Participants
n=37 Participants
Ethnicity (NIH/OMB)
Not Hispanic or Latino
12 Participants
n=37 Participants
Ethnicity (NIH/OMB)
Unknown or Not Reported
0 Participants
n=37 Participants
Race (NIH/OMB)
American Indian or Alaska Native
0 Participants
n=37 Participants
Race (NIH/OMB)
Asian
1 Participants
n=37 Participants
Race (NIH/OMB)
Native Hawaiian or Other Pacific Islander
0 Participants
n=37 Participants
Race (NIH/OMB)
Black or African American
4 Participants
n=37 Participants
Race (NIH/OMB)
White
7 Participants
n=37 Participants
Race (NIH/OMB)
More than one race
0 Participants
n=37 Participants
Race (NIH/OMB)
Unknown or Not Reported
0 Participants
n=37 Participants

PRIMARY outcome

Timeframe: Data are collected within a session of about an hour.

D' (d-prime) is the signal detection theory measure of the level of performance on a task. It is computed by calculating the proportion of true positive responses =(true positive trials)/(true positive + false negative trials) = p(TP) and by calculating the proportion of false positive responses =(false positive trials)/(false positive + true negative trials) = p(FP). These values are transformed into 'z-scores' (for example, using NORMSINV in Excel to calculate the inverse of the standard normal distribution). D' is defined as Z(TP)-Z(FP). Its range is from 0 for cases where no signal can be discriminated from the noise, to \~4.0. The upper limit is not defined, but 4 would mean that and observer is essentially perfect at discriminating signal from noise.

Outcome measures

Outcome measures
Measure
Experiment
n=12 Participants
All participants are tested in all conditions of this experiment.
D'
Baseline, Prevalence = .1 'Prevalence' = proportion of trials that are 'positive'. Here 10%.
1.69 d'
Standard Error .21
D'
Baseline, Prevalence = .33 = 33% target present trials.
1.93 d'
Standard Error .13
D'
Baseline, Prevalence = .67
1.86 d'
Standard Error .12
D'
Baseline, Prevalence = .9
1.92 d'
Standard Error .15
D'
Second Reader, Prevalence = .1
2.44 d'
Standard Error .09
D'
Second Reader, Prevalence = .3
2.28 d'
Standard Error .07
D'
Second Reader, Prevalence = .67
.230 d'
Standard Error .10
D'
Second Reader, Prevalence = .9
2.34 d'
Standard Error .15

PRIMARY outcome

Timeframe: Data are collected within a session of about an hour.

Criterion, like D' (see above) is calculated from z(TP) and z(FP). Criterion ( c ) = (z(TP)+z(FP))/-2. A value of zero means that the observer is equally likely to make a positive (e.g. 'target present') response as a negative (absent) response. Positive values mean that the observer is more likely to say "absent" (a "conservative" criterion). Negative values mean the observer is more likely to say "present" (a "liberal" criterion). Liberal and conservative have no political connotations in this case. Criterion values almost always fall between -2 and 2.

Outcome measures

Outcome measures
Measure
Experiment
n=12 Participants
All participants are tested in all conditions of this experiment.
Criterion
Second Reader, Prevalence = 0.33
.23 criterion (c)
Standard Error .06
Criterion
Baseline, Prevalence = 0.1
.7 criterion (c)
Standard Error .14
Criterion
Baseline, Prevalence = 0.33
.24 criterion (c)
Standard Error .09
Criterion
Baseline, Prevalence = 0.67
-.26 criterion (c)
Standard Error .08
Criterion
Baseline, Prevalence = 0.9
-.54 criterion (c)
Standard Error .14
Criterion
Second Reader, Prevalence = 0.1
.47 criterion (c)
Standard Error .13
Criterion
Second Reader, Prevalence = 0.67
-.16 criterion (c)
Standard Error 05
Criterion
Second Reader, Prevalence = 0.9
-.35 criterion (c)
Standard Error .10

SECONDARY outcome

Timeframe: Data are collected within a session of about an hour.

This is the measure of how long it takes to make a response.

Outcome measures

Outcome measures
Measure
Experiment
n=12 Participants
All participants are tested in all conditions of this experiment.
Reaction Time
Baseline, Prevalence =0.1
543 Response Time (msec)
Standard Error 56
Reaction Time
Baseline, Prevalence =0.3
735 Response Time (msec)
Standard Error 108
Reaction Time
Baseline, Prevalence =0.67
620 Response Time (msec)
Standard Error 70
Reaction Time
Baseline, Prevalence =0.9
592 Response Time (msec)
Standard Error 74
Reaction Time
Second Reader, Prevalence =0.1
510 Response Time (msec)
Standard Error 49
Reaction Time
Second Reader, Prevalence =0.33
772 Response Time (msec)
Standard Error 77
Reaction Time
Second Reader, Prevalence =0.67
963 Response Time (msec)
Standard Error 9
Reaction Time
Second Reader, Prevalence =0.9
557 Response Time (msec)
Standard Error 66

Adverse Events

Experiment

Serious events: 0 serious events
Other events: 0 other events
Deaths: 0 deaths

Serious adverse events

Adverse event data not reported

Other adverse events

Adverse event data not reported

Additional Information

Jeremy M Wolfe, Professor and Principle Investigator

Brigham & Women's Hospital

Phone: 6178511166

Results disclosure agreements

  • Principal investigator is a sponsor employee
  • Publication restrictions are in place