[01]
Agent Evaluation Harness
Python / Evals / Agents / ML / CI/CD
Built an eval framework at Alix for scoring agent trajectories — tool-use correctness, task completion, regression gates on golden runs, and drift detection across model and prompt changes.
Catches behavioral regressions before deploy with automated trajectory scoring and pass/fail gates.
[02]
Self-Guiding Agent Runtime
Agents / LLM / Python / ML / Reflection
Designed self-guiding agents that plan, execute, reflect, and re-route without hardcoded step sequences. Agents observe intermediate state, critique their own output, and adjust course mid-run.
Handles open-ended tasks where the path isn't known upfront — no fixed DAG required.
[03]
Agent Harness & Tool Orchestration
Harnesses / Agents / Python / TypeScript / MCP
Built a harness layer that wraps agents with structured tool access, retry policies, timeout budgets, and observability hooks. Standardizes how agents call APIs, query data, and hand off between sub-agents.
Single interface for spinning up, monitoring, and tearing down agent sessions across workflows.
[04]
Non-Deterministic Workflow Engine
Workflows / Agents / Python / Orchestration / ML
Orchestrates workflows where each run can branch differently — stochastic agent decisions, parallel exploration paths, replay for debugging, and checkpointing for long-running jobs.
Supports branching, replay, and partial reruns without restarting the entire workflow.
[05]
Multi-Agent Financial Reporting Pipeline
Python / LangChain / Agents / ML / Evals
Designed and deployed a multi-agent financial reporting pipeline at EY using Python, LangChain, and Azure OpenAI — orchestrator, analyst, and reviewer roles with eval checks on output quality.
Cut analyst drafting time by approximately 40% across 3 engagement teams.