Spaces:
Running
A newer version of the Gradio SDK is available:
6.1.0
SPEC_14: Add Outcome Measures to ClinicalTrials.gov Fields
Status: Draft (Validated via API Documentation Review) Priority: P1 GitHub Issue: #95 Estimated Effort: Small (~40 lines of code) Last Updated: 2025-11-30
Problem Statement
The ClinicalTrialsTool retrieves trial metadata but misses critical efficacy data:
Current Fields Retrieved
# src/tools/clinicaltrials.py:24-33
FIELDS: ClassVar[list[str]] = [
"NCTId",
"BriefTitle",
"Phase",
"OverallStatus",
"Condition",
"InterventionName",
"StartDate",
"BriefSummary",
]
Missing Data (Critical for Research)
| Data | Location in Response | Purpose |
|---|---|---|
| Primary Outcomes | protocolSection.outcomesModule.primaryOutcomes[].measure |
Main efficacy endpoint |
| Secondary Outcomes | protocolSection.outcomesModule.secondaryOutcomes[].measure |
Additional endpoints |
| Has Results | study.hasResults (top-level) |
Whether results are posted |
| Results Date | protocolSection.statusModule.resultsFirstPostDateStruct.date |
When results posted |
Impact
Current Output:
Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
Interventions: Sildenafil.
Desired Output:
Trial Phase: PHASE3. Status: COMPLETED. Conditions: Erectile Dysfunction.
Interventions: Sildenafil.
Primary Outcome: Change from baseline in IIEF-EF domain score at Week 12.
Results Available: Yes (posted 2024-01-15).
API Documentation Review (2025-11-30)
ClinicalTrials.gov API v2 Response Structure
Source: Stack Overflow - ClinicalTrials.gov API v2
The API returns nested JSON. Key findings:
hasResultsis a top-level field on each study object (NOT insideprotocolSection)- Outcomes are in
protocolSection.outcomesModule:study['protocolSection']['outcomesModule']['primaryOutcomes'] # List study['protocolSection']['outcomesModule']['secondaryOutcomes'] # List - Results date is in
protocolSection.statusModule.resultsFirstPostDateStruct.date
fields Parameter Behavior (VERIFIED VIA LIVE API TESTING)
The fields query parameter filters what the API returns. If you don't request a field, you don't get it.
Live API Test Results (2025-11-30):
# Test 1: With limited fields - NO outcomesModule returned
curl "...&fields=NCTId,BriefTitle"
# β Returns ONLY: protocolSection.identificationModule.{nctId, briefTitle}
# Test 2: Without fields param - outcomesModule IS present
curl "...&pageSize=1"
# β Returns: hasResults: false, outcomesModule: {primaryOutcomes, secondaryOutcomes, otherOutcomes}
# Test 3: Valid field names for outcomes
curl "...&fields=NCTId,OutcomesModule" # β
Works - returns full outcomesModule
curl "...&fields=NCTId,PrimaryOutcome" # β
Works - returns only primaryOutcomes
curl "...&fields=NCTId,HasResults" # β
Works - returns hasResults at top level
Valid Field Names (Tested):
OutcomesModuleβ Returns fullprotocolSection.outcomesModulewith all outcomesPrimaryOutcomeβ Returns onlyprimaryOutcomesarraySecondaryOutcomeβ Returns onlysecondaryOutcomesarrayHasResultsβ ReturnshasResultsat study top level
Proposed Solution
β UPDATE FIELDS Constant (REQUIRED)
The current implementation explicitly passes fields=",".join(self.FIELDS) at line 67.
The API ONLY returns requested fields. We MUST add the new field names.
# src/tools/clinicaltrials.py - UPDATE FIELDS
FIELDS: ClassVar[list[str]] = [
"NCTId",
"BriefTitle",
"Phase",
"OverallStatus",
"Condition",
"InterventionName",
"StartDate",
"BriefSummary",
# NEW: Outcome measures (verified via live API testing 2025-11-30)
"OutcomesModule", # Returns protocolSection.outcomesModule.{primaryOutcomes, secondaryOutcomes}
"HasResults", # Returns study.hasResults (top-level boolean)
]
β
Update _study_to_evidence() Method
def _study_to_evidence(self, study: dict[str, Any]) -> Evidence:
"""Convert a clinical trial study to Evidence."""
# Navigate nested structure
protocol = study.get("protocolSection", {})
id_module = protocol.get("identificationModule", {})
status_module = protocol.get("statusModule", {})
desc_module = protocol.get("descriptionModule", {})
design_module = protocol.get("designModule", {})
conditions_module = protocol.get("conditionsModule", {})
arms_module = protocol.get("armsInterventionsModule", {})
outcomes_module = protocol.get("outcomesModule", {}) # NEW
# ... existing field extraction (nct_id, title, status, phase, etc.) ...
# NEW: Extract outcome measures
primary_outcomes = outcomes_module.get("primaryOutcomes", [])
primary_outcome_str = ""
if primary_outcomes:
# Get first primary outcome measure and timeframe
first = primary_outcomes[0]
measure = first.get("measure", "")
timeframe = first.get("timeFrame", "")
# Truncate long outcome descriptions
primary_outcome_str = measure[:200]
if timeframe:
primary_outcome_str += f" (measured at {timeframe})"
secondary_outcomes = outcomes_module.get("secondaryOutcomes", [])
secondary_count = len(secondary_outcomes)
# NEW: Check if results are available (hasResults is TOP-LEVEL, not in protocol!)
has_results = study.get("hasResults", False)
# Results date is in statusModule (nested inside date struct)
results_date_struct = status_module.get("resultsFirstPostDateStruct", {})
results_date = results_date_struct.get("date", "")
# Build content with key trial info (UPDATED)
content_parts = [
f"{summary[:400]}...",
f"Trial Phase: {phase}.",
f"Status: {status}.",
f"Conditions: {conditions_str}.",
f"Interventions: {interventions_str}.",
]
if primary_outcome_str:
content_parts.append(f"Primary Outcome: {primary_outcome_str}.")
if secondary_count > 0:
content_parts.append(f"Secondary Outcomes: {secondary_count} additional endpoints.")
if has_results:
results_info = "Results Available: Yes"
if results_date:
results_info += f" (posted {results_date})"
content_parts.append(results_info + ".")
else:
content_parts.append("Results Available: Not yet posted.")
content = " ".join(content_parts)
return Evidence(
content=content[:2000],
citation=Citation(
source="clinicaltrials",
title=title[:500],
url=f"https://clinicaltrials.gov/study/{nct_id}",
date=start_date,
authors=[],
),
relevance=0.90 if has_results else 0.85, # Boost relevance for trials with results
)
API Reference
The ClinicalTrials.gov API v2 returns nested JSON:
{
"protocolSection": {
"outcomesModule": {
"primaryOutcomes": [
{
"measure": "Change from Baseline in IIEF-EF Domain Score",
"description": "...",
"timeFrame": "Baseline to Week 12"
}
],
"secondaryOutcomes": [
{
"measure": "Subject Global Assessment Question",
"timeFrame": "Week 12"
}
]
}
},
"hasResults": true
}
See: https://clinicaltrials.gov/data-api/api
Test Plan
Unit Tests (tests/unit/tools/test_clinicaltrials.py)
@pytest.mark.unit
class TestClinicalTrialsOutcomes:
"""Tests for outcome measure extraction."""
@pytest.mark.asyncio
async def test_extracts_primary_outcome(self, tool: ClinicalTrialsTool) -> None:
"""Test that primary outcome is extracted from response."""
mock_study = {
"protocolSection": {
"identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
"statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
"descriptionModule": {"briefSummary": "Summary"},
"designModule": {"phases": ["PHASE3"]},
"conditionsModule": {"conditions": ["ED"]},
"armsInterventionsModule": {"interventions": []},
"outcomesModule": {
"primaryOutcomes": [
{
"measure": "Change in IIEF-EF score",
"timeFrame": "Week 12"
}
]
},
},
"hasResults": True,
}
mock_response = MagicMock()
mock_response.json.return_value = {"studies": [mock_study]}
mock_response.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_response):
results = await tool.search("test", max_results=1)
assert len(results) == 1
assert "Primary Outcome" in results[0].content
assert "IIEF-EF" in results[0].content
assert "Week 12" in results[0].content
@pytest.mark.asyncio
async def test_includes_results_status(self, tool: ClinicalTrialsTool) -> None:
"""Test that results availability is shown."""
mock_study = {
"protocolSection": {
"identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
"statusModule": {
"overallStatus": "COMPLETED",
"startDateStruct": {"date": "2023"},
# Note: resultsFirstPostDateStruct, not resultsFirstSubmitDate
"resultsFirstPostDateStruct": {"date": "2024-06-15"},
},
"descriptionModule": {"briefSummary": "Summary"},
"designModule": {"phases": ["PHASE3"]},
"conditionsModule": {"conditions": ["ED"]},
"armsInterventionsModule": {"interventions": []},
"outcomesModule": {},
},
"hasResults": True, # Note: hasResults is TOP-LEVEL
}
mock_response = MagicMock()
mock_response.json.return_value = {"studies": [mock_study]}
mock_response.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_response):
results = await tool.search("test", max_results=1)
assert "Results Available: Yes" in results[0].content
assert "2024-06-15" in results[0].content
@pytest.mark.asyncio
async def test_shows_no_results_when_missing(self, tool: ClinicalTrialsTool) -> None:
"""Test that missing results are indicated."""
mock_study = {
"protocolSection": {
"identificationModule": {"nctId": "NCT12345678", "briefTitle": "Test"},
"statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
"descriptionModule": {"briefSummary": "Summary"},
"designModule": {"phases": ["PHASE2"]},
"conditionsModule": {"conditions": ["ED"]},
"armsInterventionsModule": {"interventions": []},
"outcomesModule": {},
},
"hasResults": False,
}
mock_response = MagicMock()
mock_response.json.return_value = {"studies": [mock_study]}
mock_response.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_response):
results = await tool.search("test", max_results=1)
assert "Results Available: Not yet posted" in results[0].content
@pytest.mark.asyncio
async def test_boosts_relevance_for_results(self, tool: ClinicalTrialsTool) -> None:
"""Trials with results should have higher relevance score."""
with_results = {
"protocolSection": {
"identificationModule": {"nctId": "NCT11111111", "briefTitle": "With Results"},
"statusModule": {"overallStatus": "COMPLETED", "startDateStruct": {"date": "2023"}},
"descriptionModule": {"briefSummary": "Summary"},
"designModule": {"phases": []},
"conditionsModule": {"conditions": []},
"armsInterventionsModule": {"interventions": []},
"outcomesModule": {},
},
"hasResults": True,
}
without_results = {
"protocolSection": {
"identificationModule": {"nctId": "NCT22222222", "briefTitle": "No Results"},
"statusModule": {"overallStatus": "RECRUITING", "startDateStruct": {"date": "2024"}},
"descriptionModule": {"briefSummary": "Summary"},
"designModule": {"phases": []},
"conditionsModule": {"conditions": []},
"armsInterventionsModule": {"interventions": []},
"outcomesModule": {},
},
"hasResults": False,
}
mock_response = MagicMock()
mock_response.json.return_value = {"studies": [with_results, without_results]}
mock_response.raise_for_status = MagicMock()
with patch("requests.get", return_value=mock_response):
results = await tool.search("test", max_results=2)
assert results[0].relevance == 0.90 # With results
assert results[1].relevance == 0.85 # Without results
Integration Test
@pytest.mark.integration
class TestClinicalTrialsOutcomesIntegration:
"""Integration tests with real API."""
@pytest.mark.asyncio
async def test_real_completed_trial_has_outcome(self) -> None:
"""Real completed Phase 3 trials should have outcome measures."""
tool = ClinicalTrialsTool()
# Search for completed Phase 3 ED trials (likely to have outcomes)
results = await tool.search(
"sildenafil erectile dysfunction Phase 3 COMPLETED",
max_results=3
)
# At least one should have primary outcome
has_outcome = any("Primary Outcome" in r.content for r in results)
assert has_outcome, "No completed trials with outcome measures found"
Files to Modify
| File | Change |
|---|---|
src/tools/clinicaltrials.py |
ADD OutcomesModule and HasResults to FIELDS, update _study_to_evidence() |
tests/unit/tools/test_clinicaltrials.py |
Add outcome parsing tests |
Acceptance Criteria
FIELDS Constant (REQUIRED CHANGE)
-
FIELDSincludes"OutcomesModule"(returns full outcomesModule) -
FIELDSincludes"HasResults"(returns top-level boolean)
_study_to_evidence() Method
- Extracts
protocolSection.outcomesModule.primaryOutcomes - Accesses
study.hasResultsat TOP LEVEL (not inside protocolSection) - Results date extracted from
statusModule.resultsFirstPostDateStruct.date - Evidence content includes primary outcome measure when available
- Evidence content shows results availability status
- Outcome measure text truncated to 200 chars
- Trials with results have boosted relevance (0.90 vs 0.85)
Testing
- All unit tests pass
- Integration test confirms real trials return outcome data
- Live API test confirms
OutcomesModuleandHasResultsfields work
Edge Cases
No outcomes defined: Some early-phase trials don't have outcomes yet
- Solution: Gracefully skip outcome section if
outcomesModuleis empty or missing
- Solution: Gracefully skip outcome section if
Multiple primary outcomes: Some trials have 2-3 primary outcomes
- Solution: Show first outcome only, mention count of others
Long outcome descriptions: Some measures are very verbose (500+ chars)
- Solution: Truncate measure to 200 chars with
[:200]
- Solution: Truncate measure to 200 chars with
hasResults without resultsFirstPostDateStruct: Some completed trials may have results without a posted date
- Solution: Show "Results Available: Yes" without date
outcomesModule missing entirely: Not all API responses include this module
- Solution: Use
.get("outcomesModule", {})for safe access
- Solution: Use
Rollback Plan
If outcome extraction causes issues:
- DO NOT modify
FIELDS- nothing to revert there - Remove outcome extraction code from
_study_to_evidence() - Existing tests should still pass
References
- GitHub Issue #95
- ClinicalTrials.gov API v2 Studies Endpoint
- Stack Overflow - ClinicalTrials.gov API v2 Response Structure
TOOL_ANALYSIS_CRITICAL.md- "Tool 2: ClinicalTrials.gov > Current Implementation Gaps"