feat: implement Evolution Engine for self-improving strategies (Pillar 4) #26

Merged
jihoson merged 1 commits from feature/issue-19-evolution-engine into main 2026-02-04 16:37:22 +09:00
Collaborator

Summary

Implements Pillar 4: 목표 중심의 끊임없는 진화

Complete self-improvement system that learns from mistakes and evolves trading strategies automatically.

  • Failure analysis from decision logs
  • AI-powered strategy generation
  • A/B testing framework with statistical significance
  • Auto-deployment of winning strategies
  • Performance tracking dashboard
  • 24 tests, 90% coverage

Components

1. EvolutionOptimizer (src/evolution/optimizer.py)

  • Queries high-confidence losing trades (≥80% confidence)
  • Identifies failure patterns by time, market, action type
  • Uses Gemini to generate improved strategies
  • Validates strategies with pytest
  • Simulates PR creation for successful strategies

Key methods:

  • analyze_failures() - Pattern identification
  • generate_strategy() - AI strategy generation
  • validate_strategy() - Pytest validation
  • create_strategy_pr() - PR simulation

2. ABTester (src/evolution/ab_test.py)

  • Tracks performance metrics (wins, losses, P&L, win rate)
  • Statistical significance testing (Welch's t-test, p < 0.05)
  • Sharpe ratio calculation
  • Auto-deployment criteria: >60% win rate, >20 trades, positive P&L

Key methods:

  • add_result() - Record trade outcome
  • calculate_metrics() - Performance calculation
  • compare() - Statistical comparison
  • should_deploy() - Deployment decision

3. PerformanceTracker (src/evolution/performance_tracker.py)

  • Comprehensive metrics tracking
  • Daily/weekly performance aggregation
  • Trend detection (improving/declining/stable)
  • Dashboard generation with JSON export

Key methods:

  • record_trade() - Store trade result
  • get_daily_metrics() - Daily aggregation
  • detect_trend() - Performance trend analysis
  • generate_dashboard() - Comprehensive reporting

Integration

Uses existing infrastructure:

  • DecisionLogger.get_losing_decisions() for failure analysis
  • GeminiClient for strategy generation
  • BaseStrategy interface for new strategies
  • Pytest for validation

Tests (tests/test_evolution.py)

24 comprehensive tests covering:

  • Failure analysis with DecisionLogger
  • Pattern identification (time, market, action)
  • Strategy generation and validation
  • A/B testing with statistical significance
  • Performance tracking and dashboard
  • Full evolution pipeline

Coverage: 90% (304 lines, 31 missed)
All 105 tests passing

Usage Example

# Analyze failures and generate improvements
optimizer = EvolutionOptimizer(settings, db_conn)
failures = optimizer.analyze_failures()
new_strategy = await optimizer.generate_strategy(failures)

# A/B test strategies
tester = ABTester("new_strategy")
for trade in trades:
    tester.add_result(trade.pnl)
if tester.should_deploy():
    print("Deploy new strategy!")

# Track performance
tracker = PerformanceTracker(db_conn)
tracker.record_trade("strategy_v1", pnl=1500.0, win=True)
dashboard = tracker.generate_dashboard()

Dependencies

Added scipy>=1.11 for statistical testing

Closes #19
Part of The 4 Pillars: Pillar 4 (Evolution)
Depends on #17 (Decision Logging)

## Summary Implements Pillar 4: 목표 중심의 끊임없는 진화 Complete self-improvement system that learns from mistakes and evolves trading strategies automatically. - ✅ Failure analysis from decision logs - ✅ AI-powered strategy generation - ✅ A/B testing framework with statistical significance - ✅ Auto-deployment of winning strategies - ✅ Performance tracking dashboard - ✅ 24 tests, 90% coverage ## Components ### 1. EvolutionOptimizer (src/evolution/optimizer.py) - Queries high-confidence losing trades (≥80% confidence) - Identifies failure patterns by time, market, action type - Uses Gemini to generate improved strategies - Validates strategies with pytest - Simulates PR creation for successful strategies Key methods: - analyze_failures() - Pattern identification - generate_strategy() - AI strategy generation - validate_strategy() - Pytest validation - create_strategy_pr() - PR simulation ### 2. ABTester (src/evolution/ab_test.py) - Tracks performance metrics (wins, losses, P&L, win rate) - Statistical significance testing (Welch's t-test, p < 0.05) - Sharpe ratio calculation - Auto-deployment criteria: >60% win rate, >20 trades, positive P&L Key methods: - add_result() - Record trade outcome - calculate_metrics() - Performance calculation - compare() - Statistical comparison - should_deploy() - Deployment decision ### 3. PerformanceTracker (src/evolution/performance_tracker.py) - Comprehensive metrics tracking - Daily/weekly performance aggregation - Trend detection (improving/declining/stable) - Dashboard generation with JSON export Key methods: - record_trade() - Store trade result - get_daily_metrics() - Daily aggregation - detect_trend() - Performance trend analysis - generate_dashboard() - Comprehensive reporting ## Integration Uses existing infrastructure: - DecisionLogger.get_losing_decisions() for failure analysis - GeminiClient for strategy generation - BaseStrategy interface for new strategies - Pytest for validation ## Tests (tests/test_evolution.py) 24 comprehensive tests covering: - Failure analysis with DecisionLogger - Pattern identification (time, market, action) - Strategy generation and validation - A/B testing with statistical significance - Performance tracking and dashboard - Full evolution pipeline Coverage: 90% (304 lines, 31 missed) All 105 tests passing ## Usage Example ```python # Analyze failures and generate improvements optimizer = EvolutionOptimizer(settings, db_conn) failures = optimizer.analyze_failures() new_strategy = await optimizer.generate_strategy(failures) # A/B test strategies tester = ABTester("new_strategy") for trade in trades: tester.add_result(trade.pnl) if tester.should_deploy(): print("Deploy new strategy!") # Track performance tracker = PerformanceTracker(db_conn) tracker.record_trade("strategy_v1", pnl=1500.0, win=True) dashboard = tracker.generate_dashboard() ``` ## Dependencies Added scipy>=1.11 for statistical testing ## Related Closes #19 Part of The 4 Pillars: Pillar 4 (Evolution) Depends on #17 (Decision Logging)
agentson added 1 commit 2026-02-04 16:35:36 +09:00
feat: implement evolution engine for self-improving strategies
Some checks failed
CI / test (pull_request) Has been cancelled
ae7195c829
Complete Pillar 4 implementation with comprehensive testing and analysis.

Components:
- EvolutionOptimizer: Analyzes losing decisions from DecisionLogger,
  identifies failure patterns (time, market, action), and uses Gemini
  to generate improved strategies with auto-deployment capability
- ABTester: A/B testing framework with statistical significance testing
  (two-sample t-test), performance comparison, and deployment criteria
  (>60% win rate, >20 trades minimum)
- PerformanceTracker: Tracks strategy win rates, monitors improvement
  trends over time, generates comprehensive dashboards with daily/weekly
  metrics and trend analysis

Key Features:
- Uses DecisionLogger.get_losing_decisions() for failure identification
- Pattern analysis: market distribution, action types, time-of-day patterns
- Gemini integration for AI-powered strategy generation
- Statistical validation using scipy.stats.ttest_ind
- Sharpe ratio calculation for risk-adjusted returns
- Auto-deploy strategies meeting 60% win rate threshold
- Performance dashboard with JSON export capability

Testing:
- 24 comprehensive tests covering all evolution components
- 90% coverage of evolution module (304 lines, 31 missed)
- Integration tests for full evolution pipeline
- All 105 project tests passing with 72% overall coverage

Dependencies:
- Added scipy>=1.11,<2 for statistical analysis

Closes #19

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
jihoson merged commit 53d3637b3e into main 2026-02-04 16:37:22 +09:00
Sign in to join this conversation.
No Reviewers
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: jihoson/The-Ouroboros#26