Observational Study on AI Accuracy in Diagnosing and Treating Failed or Painful Hip Arthroplasty

Summary

Primary Goal:

This study aims to evaluate the diagnostic and therapeutic accuracy of GPT-4 (an advanced AI language model) compared to three orthopedic surgeons with varying experience levels in cases of failed or painful total hip arthroplasty.

Key Research Questions:

Diagnostic Accuracy:

Does GPT-4 provide correct, partially correct, or incorrect diagnoses compared to human orthopaedic surgeons?

Diagnostic Completeness:

Are GPT-4's diagnostic suggestions complete, partially complete, or incomplete compared to those of orthopedic surgeons?

Treatment Accuracy:

Does GPT-4 recommend correct, partially correct, or incorrect treatments for failed hip arthroplasty?

Treatment Completeness:

Are GPT-4's treatment recommendations fully comprehensive, partially complete, or incomplete compared to those of orthopaedic surgeon?

Study Design:

Participants:

20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024.

Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data).

Comparison Groups:

GPT-4 (via ChatGPT interface)

Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon)

Method:

Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors.

They must provide a diagnosis and treatment recommendations.

Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete).

Statistical analysis compares GPT-4 vs. human performance.

Expected Outcomes:

Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures.

Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making.

Ethical \& Privacy Considerations:

No real-time patient data is used-only anonymized past cases.

No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface).

Study complies with GDPR, HIPAA, and ethical AI guidelines.

Timeline:

Study duration: \~8 months (from ethics approval to final analysis).

Results will be published regardless of outcome.

Why This Study Matters:

First study evaluating GPT-4's role in complex orthopedic diagnostics.

Could influence future AI-assisted clinical decision-making in joint replacement surgeries.

Conditions

Total Hip Arthroplasty (THA)

Interventions

OTHER

GPT-4 Assessment

Diagnostic/Prognostic evaluation of any single case provided by AI (GPT-4). GPT-4 provides diagnosis/treatment recommendations via standardized prompts

OTHER

Arthroplasty Fellow Assessment

Diagnostic/Prognostic evaluation of any single case provided by an human expert

OTHER

Specializing Resident (4th year) Assessment

Diagnostic/Prognostic evaluation of any single case provided by an human expert

OTHER

Junior Resident (3rd year) Assessment

Diagnostic/Prognostic evaluation of any single case provided by an human expert

Sponsors & Collaborators

Istituto Ortopedico Rizzoli
lead OTHER

Principal Investigators

Francesco Castagnini, MD · IRCCS Istituto Ortopedico Rizzoli

Eligibility

Min Age: 18 Years
Max Age: 80 Years
Sex: ALL
Healthy Volunteers: No

Timeline & Regulatory

Start: 2025-05-31
Primary Completion: 2025-06-30
Completion: 2025-07-01

Countries

Italy

Observational Study on AI Accuracy in Diagnosing and Treating Failed or Painful Hip Arthroplasty

Summary

Conditions

Interventions

Sponsors & Collaborators

Principal Investigators

Eligibility

Timeline & Regulatory

Countries

Study Locations

More Related Trials

Summary

Conditions

Interventions

Sponsors & Collaborators

Principal Investigators

Eligibility

Timeline & Regulatory

Countries

Study Locations

Related Clinical Trials

Pending Failure in Hard-hard Total Hip Arthroplasty

A Short Metaphyseal Fitting Total Hip Arthroplasty in Young and Elderly Patients

AI-assisted Preoperative Planning Technology for THA for DDH

Kinematics and Muscle Strength in Two, Five or 10 Years Afther Total Hip Arthroplasty

Patient Scores and Functional Tests After Hip Surgery

More Related Trials