Spaces:

MCP-1st-Birthday
/

DeepBoner

Running

VibecoderMcSwaggins commited on 13 days ago

Commit

c99c9c2

1 Parent(s): fd28242

docs: add SPEC_04 (Magentic UX) and SPEC_05 (Orchestrator Cleanup)

SPEC_04: Magentic Mode UX Improvements (#68)
- P0: Chat history cleared on timeout (1-line fix)
- P1: Timeout too short (300s → 600s)
- P1: Timeout not configurable (add env var)
- P2: No graceful degradation (future)

SPEC_05: Orchestrator Module Cleanup (#67)
- Empty src/orchestrator/ folder should be deleted
- orchestrator_hierarchical.py is dead code (0% coverage)
- Option A (minimal cleanup) recommended for now

Files changed (2) hide show

docs/specs/SPEC_04_MAGENTIC_UX.md +235 -0
docs/specs/SPEC_05_ORCHESTRATOR_CLEANUP.md +160 -0

docs/specs/SPEC_04_MAGENTIC_UX.md ADDED Viewed

	@@ -0,0 +1,235 @@

+# SPEC 04: Magentic Mode UX Improvements
+## Priority: P1 (Demo Quality)
+## Problem Statement
+Magentic (advanced) mode has several UX issues that degrade the user experience:
+1. **P0: Chat history cleared on timeout** - When timeout occurs, all progress events are erased
+2. **P1: Timeout too short** - 300s default insufficient for complex multi-agent workflows
+3. **P1: Timeout not configurable** - Users can't adjust based on their needs
+4. **P2: No graceful degradation** - System doesn't synthesize early when timeout approaches
+## Related Issues
+- GitHub Issue #68: Magentic mode times out at 300s without completing
+- GitHub Issue #65: Demo timing (predecessor, now closed)
+- SPEC_01: Demo Termination (implemented the basic timeout)
+## Bug Analysis
+### Bug 1: Chat History Cleared on Timeout (P0)
+**Location**: `src/app.py:205-206`
+**Current Code**:
+```python
+if event.type == "complete":
+    yield event.message  # BUG: Discards all accumulated progress!
+else:
+    event_md = event.to_markdown()
+    response_parts.append(event_md)
+    yield "\n\n".join(response_parts)
+```
+**Problem**: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did.
+**User Sees**:
+```
+Research timed out. Synthesizing available evidence...
+```
+**User Should See**:
+```
+🚀 STARTED: Starting research (Magentic mode)...
+⏳ THINKING: Multi-agent reasoning in progress...
+🧠 JUDGING: Manager (user_task): Research drug repurposing...
+🧠 JUDGING: Manager (task_ledger): We are working to address...
+🧠 JUDGING: Manager (instruction): Task: Retrieve human clinical...
+⏱️ Research timed out. Synthesizing available evidence...
+```
+**Fix**:
+```python
+if event.type == "complete":
+    response_parts.append(event.message)
+    yield "\n\n".join(response_parts)  # Preserves all progress
+```
+### Bug 2: Timeout Too Short (P1)
+**Location**: `src/orchestrator_magentic.py:48`
+**Current**: `timeout_seconds: float = 300.0` (5 minutes)
+**Problem**: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.
+**Analysis of Per-Agent Latency**:
+| Agent | Typical Latency | Worst Case |
+|-------|-----------------|------------|
+| SearchAgent | 30-60s | 120s (network issues) |
+| HypothesisAgent | 60-90s | 180s (complex reasoning) |
+| JudgeAgent | 30-60s | 120s |
+| ReportAgent | 60-120s | 240s (long synthesis) |
+With `max_rounds=10`: 10 × 4 × 90s = 60 minutes worst case.
+### Bug 3: Timeout Not Configurable (P1)
+**Problem**: The factory doesn't pass timeout config to MagenticOrchestrator.
+**Location**: `src/orchestrator_factory.py:52-55`
+```python
+return orchestrator_cls(
+    max_rounds=config.max_iterations if config else 10,
+    api_key=api_key,
+    # Missing: timeout_seconds
+)
+```
+## Proposed Solutions
+### Fix 1: Preserve Chat History (P0)
+```python
+# src/app.py - Replace lines 205-212
+if event.type == "complete":
+    # Preserve accumulated progress + add completion message
+    response_parts.append(event.message)
+    yield "\n\n".join(response_parts)
+else:
+    event_md = event.to_markdown()
+    response_parts.append(event_md)
+    yield "\n\n".join(response_parts)
+```
+**Test**:
+```python
+@pytest.mark.asyncio
+async def test_timeout_preserves_chat_history(mock_magentic_workflow):
+    """Verify timeout doesn't erase progress events."""
+    # Mock workflow that yields events then times out
+    events = []
+    async for event in research_agent("test", [], "advanced", "sk-test"):
+        events.append(event)
+    # Should contain both progress AND timeout message
+    output = events[-1]  # Final yield
+    assert "STARTED" in output
+    assert "timed out" in output.lower()
+```
+### Fix 2: Increase Default Timeout (P1)
+```python
+# src/orchestrator_magentic.py
+def __init__(
+    self,
+    max_rounds: int = 10,
+    chat_client: OpenAIChatClient | None = None,
+    api_key: str | None = None,
+    timeout_seconds: float = 600.0,  # Changed: 10 minutes (was 5)
+) -> None:
+```
+### Fix 3: Make Timeout Configurable via Environment (P1)
+```python
+# src/utils/config.py
+class Settings(BaseSettings):
+    # ... existing fields ...
+    magentic_timeout: int = Field(
+        default=600,
+        description="Timeout for Magentic mode in seconds",
+    )
+```
+```python
+# src/orchestrator_factory.py
+return orchestrator_cls(
+    max_rounds=config.max_iterations if config else 10,
+    api_key=api_key,
+    timeout_seconds=settings.magentic_timeout,  # NEW
+)
+```
+### Fix 4: Graceful Degradation (P2 - Future)
+```python
+# src/orchestrator_magentic.py - Inside run() loop
+elapsed = time.time() - start_time
+time_remaining = self._timeout_seconds - elapsed
+# If 80% of time elapsed, force synthesis
+if time_remaining < self._timeout_seconds * 0.2:
+    yield AgentEvent(
+        type="synthesizing",
+        message="Time limit approaching, synthesizing available evidence...",
+        iteration=iteration,
+    )
+    # TODO: Inject signal to trigger ReportAgent
+    break
+```
+## Implementation Order
+1. **Fix 1 (P0)**: Chat history preservation - 5 minutes, 1 line change
+2. **Fix 2 (P1)**: Increase default timeout - 5 minutes, 1 line change
+3. **Fix 3 (P1)**: Environment config - 15 minutes, 3 files
+4. **Fix 4 (P2)**: Graceful degradation - 1 hour, research agent-framework signals
+## Acceptance Criteria
+- [ ] Timeout shows ALL progress events, not just timeout message
+- [ ] Default timeout increased to 600s (10 minutes)
+- [ ] Timeout configurable via `MAGENTIC_TIMEOUT` env var
+- [ ] Tests verify chat history preserved on timeout
+- [ ] (P2) System synthesizes early when timeout approaches
+## Files to Modify
+1. `src/app.py` - Fix chat history clearing (lines 205-212)
+2. `src/orchestrator_magentic.py` - Increase default timeout
+3. `src/utils/config.py` - Add `magentic_timeout` setting
+4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator
+5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation
+## Test Plan
+```python
+# tests/unit/test_app_timeout.py
+@pytest.mark.asyncio
+async def test_complete_event_preserves_history():
+    """Complete events should append to history, not replace it."""
+    from src.app import research_agent
+    # This requires mocking the orchestrator to emit events then complete
+    # Verify final output contains ALL events, not just completion message
+    pass
+@pytest.mark.asyncio
+async def test_timeout_configurable():
+    """Verify MAGENTIC_TIMEOUT env var is respected."""
+    import os
+    os.environ["MAGENTIC_TIMEOUT"] = "120"
+    from src.utils.config import Settings
+    settings = Settings()
+    assert settings.magentic_timeout == 120
+```
+## Risk Assessment
+| Fix | Risk | Mitigation |
+|-----|------|------------|
+| Fix 1 | Low | Simple change, well-understood |
+| Fix 2 | Low | Just a default value change |
+| Fix 3 | Medium | New config, needs validation |
+| Fix 4 | High | Requires understanding agent-framework internals |
+## Dependencies
+- Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager.

docs/specs/SPEC_05_ORCHESTRATOR_CLEANUP.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# SPEC 05: Orchestrator Module Cleanup
+## Priority: P3 (Code Hygiene)
+## Problem Statement
+The codebase has an inconsistent orchestrator organization:
+```
+src/
+├── orchestrator/              # EMPTY folder (just . and ..)
+├── orchestrator.py            # Simple mode (15KB, 67% coverage)
+├── orchestrator_factory.py    # Factory pattern (2.5KB, 87% coverage)
+├── orchestrator_hierarchical.py  # Unused (3KB, 0% coverage)
+└── orchestrator_magentic.py   # Advanced mode (11KB, 68% coverage)
+```
+## Related Issues
+- GitHub Issue #67: Clean up empty src/orchestrator/ folder
+## Analysis
+### Empty Folder
+The `src/orchestrator/` folder was created but never populated. All orchestrator implementations remain flat in `src/`.
+### Dead Code
+`orchestrator_hierarchical.py` has **0% test coverage** and appears to be an early prototype that was never integrated:
+- Not imported anywhere in production code
+- Not referenced in any tests
+- Pattern doesn't match current architecture
+### Import Pattern
+All 30+ imports use the flat structure:
+```python
+from src.orchestrator import Orchestrator
+from src.orchestrator_factory import create_orchestrator
+from src.orchestrator_magentic import MagenticOrchestrator
+```
+## Options
+### Option A: Minimal Cleanup (Recommended)
+Delete the empty folder and dead code:
+```bash
+rm -rf src/orchestrator/
+rm src/orchestrator_hierarchical.py
+```
+**Pros**: Zero import changes, minimal risk, quick
+**Cons**: Flat structure remains
+### Option B: Full Consolidation (Future)
+Move everything into a proper module:
+```
+src/orchestrator/
+├── __init__.py        # Re-export for backwards compat
+├── base.py            # Shared protocols/types
+├── simple.py          # From orchestrator.py
+├── magentic.py        # From orchestrator_magentic.py
+└── factory.py         # From orchestrator_factory.py
+```
+**Pros**: Cleaner organization, better separation
+**Cons**: 30+ import changes, risk of breakage, time investment
+### Option C: Hybrid (Pragmatic)
+Delete empty folder + dead code now. Create `src/orchestrator/__init__.py` that re-exports from flat files:
+```python
+# src/orchestrator/__init__.py
+from src.orchestrator import Orchestrator
+from src.orchestrator_factory import create_orchestrator
+from src.orchestrator_magentic import MagenticOrchestrator
+__all__ = ["Orchestrator", "create_orchestrator", "MagenticOrchestrator"]
+```
+**Problem**: This creates confusing import semantics (`src.orchestrator` would be both a module and a file).
+## Recommendation
+**Option A** for now. The flat structure works fine and changing it provides no functional benefit. The empty folder and dead code should be removed.
+Option B can be revisited post-hackathon when there's time for a proper refactor.
+## Implementation
+### Step 1: Remove Empty Folder
+```bash
+rm -rf src/orchestrator/
+```
+### Step 2: Remove Dead Code (Optional)
+```bash
+rm src/orchestrator_hierarchical.py
+```
+If keeping for reference, add a deprecation notice:
+```python
+# src/orchestrator_hierarchical.py
+"""
+DEPRECATED: Unused hierarchical orchestrator prototype.
+Kept for reference only. See orchestrator.py (simple) or
+orchestrator_magentic.py (advanced) for active implementations.
+"""
+```
+### Step 3: Verify
+```bash
+make check  # All 142 tests should pass
+```
+## Acceptance Criteria
+- [ ] Empty `src/orchestrator/` folder deleted
+- [ ] No broken imports (grep for `from src.orchestrator/`)
+- [ ] Tests pass
+- [ ] (Optional) `orchestrator_hierarchical.py` removed or deprecated
+## Files to Modify
+1. `src/orchestrator/` - DELETE (empty folder)
+2. `src/orchestrator_hierarchical.py` - DELETE or add deprecation notice
+## Test Plan
+```bash
+# Verify nothing imports from the folder path
+grep -r "from src.orchestrator/" src tests
+# Should return nothing
+# Verify nothing imports hierarchical
+grep -r "orchestrator_hierarchical" src tests
+# Should return nothing (except possibly this spec)
+# Run full test suite
+make check
+```
+## Risk Assessment
+| Action | Risk | Mitigation |
+|--------|------|------------|
+| Delete empty folder | None | It's empty, nothing uses it |
+| Delete hierarchical.py | Low | 0% coverage, no imports |
+| Full consolidation | Medium | Many import changes |
+## Time Estimate
+- Option A: 5 minutes
+- Option B: 1-2 hours (plus testing)