VibecoderMcSwaggins commited on
Commit
c99c9c2
Β·
1 Parent(s): fd28242

docs: add SPEC_04 (Magentic UX) and SPEC_05 (Orchestrator Cleanup)

Browse files

SPEC_04: Magentic Mode UX Improvements (#68)
- P0: Chat history cleared on timeout (1-line fix)
- P1: Timeout too short (300s β†’ 600s)
- P1: Timeout not configurable (add env var)
- P2: No graceful degradation (future)

SPEC_05: Orchestrator Module Cleanup (#67)
- Empty src/orchestrator/ folder should be deleted
- orchestrator_hierarchical.py is dead code (0% coverage)
- Option A (minimal cleanup) recommended for now

docs/specs/SPEC_04_MAGENTIC_UX.md ADDED
@@ -0,0 +1,235 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC 04: Magentic Mode UX Improvements
2
+
3
+ ## Priority: P1 (Demo Quality)
4
+
5
+ ## Problem Statement
6
+
7
+ Magentic (advanced) mode has several UX issues that degrade the user experience:
8
+
9
+ 1. **P0: Chat history cleared on timeout** - When timeout occurs, all progress events are erased
10
+ 2. **P1: Timeout too short** - 300s default insufficient for complex multi-agent workflows
11
+ 3. **P1: Timeout not configurable** - Users can't adjust based on their needs
12
+ 4. **P2: No graceful degradation** - System doesn't synthesize early when timeout approaches
13
+
14
+ ## Related Issues
15
+
16
+ - GitHub Issue #68: Magentic mode times out at 300s without completing
17
+ - GitHub Issue #65: Demo timing (predecessor, now closed)
18
+ - SPEC_01: Demo Termination (implemented the basic timeout)
19
+
20
+ ## Bug Analysis
21
+
22
+ ### Bug 1: Chat History Cleared on Timeout (P0)
23
+
24
+ **Location**: `src/app.py:205-206`
25
+
26
+ **Current Code**:
27
+ ```python
28
+ if event.type == "complete":
29
+ yield event.message # BUG: Discards all accumulated progress!
30
+ else:
31
+ event_md = event.to_markdown()
32
+ response_parts.append(event_md)
33
+ yield "\n\n".join(response_parts)
34
+ ```
35
+
36
+ **Problem**: The `complete` event (including timeout) yields ONLY the completion message, discarding all the `response_parts` that show what the system actually did.
37
+
38
+ **User Sees**:
39
+ ```
40
+ Research timed out. Synthesizing available evidence...
41
+ ```
42
+
43
+ **User Should See**:
44
+ ```
45
+ πŸš€ STARTED: Starting research (Magentic mode)...
46
+ ⏳ THINKING: Multi-agent reasoning in progress...
47
+ 🧠 JUDGING: Manager (user_task): Research drug repurposing...
48
+ 🧠 JUDGING: Manager (task_ledger): We are working to address...
49
+ 🧠 JUDGING: Manager (instruction): Task: Retrieve human clinical...
50
+ ⏱️ Research timed out. Synthesizing available evidence...
51
+ ```
52
+
53
+ **Fix**:
54
+ ```python
55
+ if event.type == "complete":
56
+ response_parts.append(event.message)
57
+ yield "\n\n".join(response_parts) # Preserves all progress
58
+ ```
59
+
60
+ ### Bug 2: Timeout Too Short (P1)
61
+
62
+ **Location**: `src/orchestrator_magentic.py:48`
63
+
64
+ **Current**: `timeout_seconds: float = 300.0` (5 minutes)
65
+
66
+ **Problem**: Multi-agent workflows with 4 agents (Search, Hypothesis, Judge, Report) and up to 10 rounds can theoretically take 60+ minutes. Even typical runs take 5-10 minutes.
67
+
68
+ **Analysis of Per-Agent Latency**:
69
+ | Agent | Typical Latency | Worst Case |
70
+ |-------|-----------------|------------|
71
+ | SearchAgent | 30-60s | 120s (network issues) |
72
+ | HypothesisAgent | 60-90s | 180s (complex reasoning) |
73
+ | JudgeAgent | 30-60s | 120s |
74
+ | ReportAgent | 60-120s | 240s (long synthesis) |
75
+
76
+ With `max_rounds=10`: 10 Γ— 4 Γ— 90s = 60 minutes worst case.
77
+
78
+ ### Bug 3: Timeout Not Configurable (P1)
79
+
80
+ **Problem**: The factory doesn't pass timeout config to MagenticOrchestrator.
81
+
82
+ **Location**: `src/orchestrator_factory.py:52-55`
83
+ ```python
84
+ return orchestrator_cls(
85
+ max_rounds=config.max_iterations if config else 10,
86
+ api_key=api_key,
87
+ # Missing: timeout_seconds
88
+ )
89
+ ```
90
+
91
+ ## Proposed Solutions
92
+
93
+ ### Fix 1: Preserve Chat History (P0)
94
+
95
+ ```python
96
+ # src/app.py - Replace lines 205-212
97
+ if event.type == "complete":
98
+ # Preserve accumulated progress + add completion message
99
+ response_parts.append(event.message)
100
+ yield "\n\n".join(response_parts)
101
+ else:
102
+ event_md = event.to_markdown()
103
+ response_parts.append(event_md)
104
+ yield "\n\n".join(response_parts)
105
+ ```
106
+
107
+ **Test**:
108
+ ```python
109
+ @pytest.mark.asyncio
110
+ async def test_timeout_preserves_chat_history(mock_magentic_workflow):
111
+ """Verify timeout doesn't erase progress events."""
112
+ # Mock workflow that yields events then times out
113
+ events = []
114
+ async for event in research_agent("test", [], "advanced", "sk-test"):
115
+ events.append(event)
116
+
117
+ # Should contain both progress AND timeout message
118
+ output = events[-1] # Final yield
119
+ assert "STARTED" in output
120
+ assert "timed out" in output.lower()
121
+ ```
122
+
123
+ ### Fix 2: Increase Default Timeout (P1)
124
+
125
+ ```python
126
+ # src/orchestrator_magentic.py
127
+ def __init__(
128
+ self,
129
+ max_rounds: int = 10,
130
+ chat_client: OpenAIChatClient | None = None,
131
+ api_key: str | None = None,
132
+ timeout_seconds: float = 600.0, # Changed: 10 minutes (was 5)
133
+ ) -> None:
134
+ ```
135
+
136
+ ### Fix 3: Make Timeout Configurable via Environment (P1)
137
+
138
+ ```python
139
+ # src/utils/config.py
140
+ class Settings(BaseSettings):
141
+ # ... existing fields ...
142
+ magentic_timeout: int = Field(
143
+ default=600,
144
+ description="Timeout for Magentic mode in seconds",
145
+ )
146
+ ```
147
+
148
+ ```python
149
+ # src/orchestrator_factory.py
150
+ return orchestrator_cls(
151
+ max_rounds=config.max_iterations if config else 10,
152
+ api_key=api_key,
153
+ timeout_seconds=settings.magentic_timeout, # NEW
154
+ )
155
+ ```
156
+
157
+ ### Fix 4: Graceful Degradation (P2 - Future)
158
+
159
+ ```python
160
+ # src/orchestrator_magentic.py - Inside run() loop
161
+ elapsed = time.time() - start_time
162
+ time_remaining = self._timeout_seconds - elapsed
163
+
164
+ # If 80% of time elapsed, force synthesis
165
+ if time_remaining < self._timeout_seconds * 0.2:
166
+ yield AgentEvent(
167
+ type="synthesizing",
168
+ message="Time limit approaching, synthesizing available evidence...",
169
+ iteration=iteration,
170
+ )
171
+ # TODO: Inject signal to trigger ReportAgent
172
+ break
173
+ ```
174
+
175
+ ## Implementation Order
176
+
177
+ 1. **Fix 1 (P0)**: Chat history preservation - 5 minutes, 1 line change
178
+ 2. **Fix 2 (P1)**: Increase default timeout - 5 minutes, 1 line change
179
+ 3. **Fix 3 (P1)**: Environment config - 15 minutes, 3 files
180
+ 4. **Fix 4 (P2)**: Graceful degradation - 1 hour, research agent-framework signals
181
+
182
+ ## Acceptance Criteria
183
+
184
+ - [ ] Timeout shows ALL progress events, not just timeout message
185
+ - [ ] Default timeout increased to 600s (10 minutes)
186
+ - [ ] Timeout configurable via `MAGENTIC_TIMEOUT` env var
187
+ - [ ] Tests verify chat history preserved on timeout
188
+ - [ ] (P2) System synthesizes early when timeout approaches
189
+
190
+ ## Files to Modify
191
+
192
+ 1. `src/app.py` - Fix chat history clearing (lines 205-212)
193
+ 2. `src/orchestrator_magentic.py` - Increase default timeout
194
+ 3. `src/utils/config.py` - Add `magentic_timeout` setting
195
+ 4. `src/orchestrator_factory.py` - Pass timeout to MagenticOrchestrator
196
+ 5. `tests/unit/test_app_timeout.py` - NEW: Test chat history preservation
197
+
198
+ ## Test Plan
199
+
200
+ ```python
201
+ # tests/unit/test_app_timeout.py
202
+
203
+ @pytest.mark.asyncio
204
+ async def test_complete_event_preserves_history():
205
+ """Complete events should append to history, not replace it."""
206
+ from src.app import research_agent
207
+
208
+ # This requires mocking the orchestrator to emit events then complete
209
+ # Verify final output contains ALL events, not just completion message
210
+ pass
211
+
212
+
213
+ @pytest.mark.asyncio
214
+ async def test_timeout_configurable():
215
+ """Verify MAGENTIC_TIMEOUT env var is respected."""
216
+ import os
217
+ os.environ["MAGENTIC_TIMEOUT"] = "120"
218
+
219
+ from src.utils.config import Settings
220
+ settings = Settings()
221
+ assert settings.magentic_timeout == 120
222
+ ```
223
+
224
+ ## Risk Assessment
225
+
226
+ | Fix | Risk | Mitigation |
227
+ |-----|------|------------|
228
+ | Fix 1 | Low | Simple change, well-understood |
229
+ | Fix 2 | Low | Just a default value change |
230
+ | Fix 3 | Medium | New config, needs validation |
231
+ | Fix 4 | High | Requires understanding agent-framework internals |
232
+
233
+ ## Dependencies
234
+
235
+ - Fix 4 requires investigation of `agent-framework-core` to understand how to signal early termination to the workflow manager.
docs/specs/SPEC_05_ORCHESTRATOR_CLEANUP.md ADDED
@@ -0,0 +1,160 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # SPEC 05: Orchestrator Module Cleanup
2
+
3
+ ## Priority: P3 (Code Hygiene)
4
+
5
+ ## Problem Statement
6
+
7
+ The codebase has an inconsistent orchestrator organization:
8
+
9
+ ```
10
+ src/
11
+ β”œβ”€β”€ orchestrator/ # EMPTY folder (just . and ..)
12
+ β”œβ”€β”€ orchestrator.py # Simple mode (15KB, 67% coverage)
13
+ β”œβ”€β”€ orchestrator_factory.py # Factory pattern (2.5KB, 87% coverage)
14
+ β”œβ”€β”€ orchestrator_hierarchical.py # Unused (3KB, 0% coverage)
15
+ └── orchestrator_magentic.py # Advanced mode (11KB, 68% coverage)
16
+ ```
17
+
18
+ ## Related Issues
19
+
20
+ - GitHub Issue #67: Clean up empty src/orchestrator/ folder
21
+
22
+ ## Analysis
23
+
24
+ ### Empty Folder
25
+ The `src/orchestrator/` folder was created but never populated. All orchestrator implementations remain flat in `src/`.
26
+
27
+ ### Dead Code
28
+ `orchestrator_hierarchical.py` has **0% test coverage** and appears to be an early prototype that was never integrated:
29
+ - Not imported anywhere in production code
30
+ - Not referenced in any tests
31
+ - Pattern doesn't match current architecture
32
+
33
+ ### Import Pattern
34
+ All 30+ imports use the flat structure:
35
+ ```python
36
+ from src.orchestrator import Orchestrator
37
+ from src.orchestrator_factory import create_orchestrator
38
+ from src.orchestrator_magentic import MagenticOrchestrator
39
+ ```
40
+
41
+ ## Options
42
+
43
+ ### Option A: Minimal Cleanup (Recommended)
44
+
45
+ Delete the empty folder and dead code:
46
+
47
+ ```bash
48
+ rm -rf src/orchestrator/
49
+ rm src/orchestrator_hierarchical.py
50
+ ```
51
+
52
+ **Pros**: Zero import changes, minimal risk, quick
53
+ **Cons**: Flat structure remains
54
+
55
+ ### Option B: Full Consolidation (Future)
56
+
57
+ Move everything into a proper module:
58
+
59
+ ```
60
+ src/orchestrator/
61
+ β”œβ”€β”€ __init__.py # Re-export for backwards compat
62
+ β”œβ”€β”€ base.py # Shared protocols/types
63
+ β”œβ”€β”€ simple.py # From orchestrator.py
64
+ β”œβ”€β”€ magentic.py # From orchestrator_magentic.py
65
+ └── factory.py # From orchestrator_factory.py
66
+ ```
67
+
68
+ **Pros**: Cleaner organization, better separation
69
+ **Cons**: 30+ import changes, risk of breakage, time investment
70
+
71
+ ### Option C: Hybrid (Pragmatic)
72
+
73
+ Delete empty folder + dead code now. Create `src/orchestrator/__init__.py` that re-exports from flat files:
74
+
75
+ ```python
76
+ # src/orchestrator/__init__.py
77
+ from src.orchestrator import Orchestrator
78
+ from src.orchestrator_factory import create_orchestrator
79
+ from src.orchestrator_magentic import MagenticOrchestrator
80
+
81
+ __all__ = ["Orchestrator", "create_orchestrator", "MagenticOrchestrator"]
82
+ ```
83
+
84
+ **Problem**: This creates confusing import semantics (`src.orchestrator` would be both a module and a file).
85
+
86
+ ## Recommendation
87
+
88
+ **Option A** for now. The flat structure works fine and changing it provides no functional benefit. The empty folder and dead code should be removed.
89
+
90
+ Option B can be revisited post-hackathon when there's time for a proper refactor.
91
+
92
+ ## Implementation
93
+
94
+ ### Step 1: Remove Empty Folder
95
+
96
+ ```bash
97
+ rm -rf src/orchestrator/
98
+ ```
99
+
100
+ ### Step 2: Remove Dead Code (Optional)
101
+
102
+ ```bash
103
+ rm src/orchestrator_hierarchical.py
104
+ ```
105
+
106
+ If keeping for reference, add a deprecation notice:
107
+ ```python
108
+ # src/orchestrator_hierarchical.py
109
+ """
110
+ DEPRECATED: Unused hierarchical orchestrator prototype.
111
+ Kept for reference only. See orchestrator.py (simple) or
112
+ orchestrator_magentic.py (advanced) for active implementations.
113
+ """
114
+ ```
115
+
116
+ ### Step 3: Verify
117
+
118
+ ```bash
119
+ make check # All 142 tests should pass
120
+ ```
121
+
122
+ ## Acceptance Criteria
123
+
124
+ - [ ] Empty `src/orchestrator/` folder deleted
125
+ - [ ] No broken imports (grep for `from src.orchestrator/`)
126
+ - [ ] Tests pass
127
+ - [ ] (Optional) `orchestrator_hierarchical.py` removed or deprecated
128
+
129
+ ## Files to Modify
130
+
131
+ 1. `src/orchestrator/` - DELETE (empty folder)
132
+ 2. `src/orchestrator_hierarchical.py` - DELETE or add deprecation notice
133
+
134
+ ## Test Plan
135
+
136
+ ```bash
137
+ # Verify nothing imports from the folder path
138
+ grep -r "from src.orchestrator/" src tests
139
+ # Should return nothing
140
+
141
+ # Verify nothing imports hierarchical
142
+ grep -r "orchestrator_hierarchical" src tests
143
+ # Should return nothing (except possibly this spec)
144
+
145
+ # Run full test suite
146
+ make check
147
+ ```
148
+
149
+ ## Risk Assessment
150
+
151
+ | Action | Risk | Mitigation |
152
+ |--------|------|------------|
153
+ | Delete empty folder | None | It's empty, nothing uses it |
154
+ | Delete hierarchical.py | Low | 0% coverage, no imports |
155
+ | Full consolidation | Medium | Many import changes |
156
+
157
+ ## Time Estimate
158
+
159
+ - Option A: 5 minutes
160
+ - Option B: 1-2 hours (plus testing)