feat: implement evolution engine for self-improving strategies

Complete Pillar 4 implementation with comprehensive testing and analysis. Components: - EvolutionOptimizer: Analyzes losing decisions from DecisionLogger, identifies failure patterns (time, market, action), and uses Gemini to generate improved strategies with auto-deployment capability - ABTester: A/B testing framework with statistical significance testing (two-sample t-test), performance comparison, and deployment criteria (>60% win rate, >20 trades minimum) - PerformanceTracker: Tracks strategy win rates, monitors improvement trends over time, generates comprehensive dashboards with daily/weekly metrics and trend analysis Key Features: - Uses DecisionLogger.get_losing_decisions() for failure identification - Pattern analysis: market distribution, action types, time-of-day patterns - Gemini integration for AI-powered strategy generation - Statistical validation using scipy.stats.ttest_ind - Sharpe ratio calculation for risk-adjusted returns - Auto-deploy strategies meeting 60% win rate threshold - Performance dashboard with JSON export capability Testing: - 24 comprehensive tests covering all evolution components - 90% coverage of evolution module (304 lines, 31 missed) - Integration tests for full evolution pipeline - All 105 project tests passing with 72% overall coverage Dependencies: - Added scipy>=1.11,<2 for statistical analysis Closes #19 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
feat: integrate decision logger with main trading loop
2026-02-04 16:34:10 +09:00 · 2026-02-04 15:47:53 +09:00 · 2026-02-04 15:47:53 +09:00 · 2026-02-04 15:40:00 +09:00 · 2026-02-04 15:25:13 +09:00 · 2026-02-04 10:15:09 +09:00
17 changed files with 2662 additions and 261 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -174,3 +174,4 @@ cython_debug/
 # PyPI configuration file
 .pypirc
 data/
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,258 +1,98 @@
-# CLAUDE.md
+# The Ouroboros
-This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+AI-powered trading agent for global stock markets with self-evolution capabilities.
-## Git Workflow Policy
+## Quick Start
 **CRITICAL: All code changes MUST follow this workflow. Direct pushes to `main` are ABSOLUTELY PROHIBITED.**
 1. **Create Gitea Issue First** — All features, bug fixes, and policy changes require a Gitea issue before any code is written
 2. **Create Feature Branch** — Branch from `main` using format `feature/issue-{N}-{short-description}`
 3. **Implement Changes** — Write code, tests, and documentation on the feature branch
 4. **Create Pull Request** — Submit PR to `main` branch referencing the issue number
 5. **Review & Merge** — After approval, merge via PR (squash or merge commit)
 **Never commit directly to `main`.** This policy applies to all changes, no exceptions.
 ## Agent Workflow
 **Modern AI development leverages specialized agents for concurrent, efficient task execution.**
 ### Parallel Execution Strategy
 Use **git worktree** or **subagents** (via the Task tool) to handle multiple requirements simultaneously:
 - Each task runs in independent context
 - Parallel branches for concurrent features
 - Isolated test environments prevent interference
 - Faster iteration with distributed workload
 ### Specialized Agent Roles
 Deploy task-specific agents as needed instead of handling everything in the main conversation:
 - **Conversational Agent** (main) — Interface with user, coordinate other agents
 - **Ticket Management Agent** — Create/update Gitea issues, track task status
 - **Design Agent** — Architectural planning, RFC documents, API design
 - **Code Writing Agent** — Implementation following specs
 - **Testing Agent** — Write tests, verify coverage, run test suites
 - **Documentation Agent** — Update docs, docstrings, CLAUDE.md, README
 - **Review Agent** — Code review, lint checks, security audits
 - **Custom Agents** — Created dynamically for specialized tasks (performance analysis, migration scripts, etc.)
 ### When to Use Agents
 **Prefer spawning specialized agents for:**
 1. Complex multi-file changes requiring exploration
 2. Tasks with clear, isolated scope (e.g., "write tests for module X")
 3. Parallel work streams (feature A + bugfix B simultaneously)
 4. Long-running analysis (codebase search, dependency audit)
 5. Tasks requiring different contexts (multiple git worktrees)
 **Use the main conversation for:**
 1. User interaction and clarification
 2. Quick single-file edits
 3. Coordinating agent work
 4. High-level decision making
 ### Implementation
 ```python
 # Example: Spawn parallel test and documentation agents
 task_tool(
    subagent_type="general-purpose",
    prompt="Write comprehensive tests for src/markets/schedule.py",
    description="Write schedule tests"
 )
 task_tool(
    subagent_type="general-purpose",
    prompt="Update README.md with global market feature documentation",
    description="Update README"
 )
 ```
 Use `run_in_background=True` for independent tasks that don't block subsequent work.
 ## Common Command Failures
 **Critical: Learn from failures. Never repeat the same failed command without modification.**
 ### tea CLI (Gitea Command Line Tool)
 #### ❌ TTY Error - Interactive Confirmation Fails
 ```bash
 ~/bin/tea issues create --repo X --title "Y" --description "Z"
 # Error: huh: could not open a new TTY: open /dev/tty: no such device or address
 ```
 **💡 Reason:** tea tries to open `/dev/tty` for interactive confirmation prompts, which is unavailable in non-interactive environments.
 **✅ Solution:** Use `YES=""` environment variable to bypass confirmation
 ```bash
 YES="" ~/bin/tea issues create --repo jihoson/The-Ouroboros --title "Title" --description "Body"
 YES="" ~/bin/tea issues edit <number> --repo jihoson/The-Ouroboros --description "Updated body"
 YES="" ~/bin/tea pulls create --repo jihoson/The-Ouroboros --head feature-branch --base main --title "Title" --description "Body"
 ```
 **📝 Notes:**
 - Always set default login: `~/bin/tea login default local`
 - Use `--repo jihoson/The-Ouroboros` when outside repo directory
 - tea is preferred over direct Gitea API calls for consistency
 #### ❌ Wrong Parameter Name
 ```bash
 tea issues create --body "text"
 # Error: flag provided but not defined: -body
 ```
 **💡 Reason:** Parameter is `--description`, not `--body`.
 **✅ Solution:** Use correct parameter name
 ```bash
 YES="" ~/bin/tea issues create --description "text"
 ```
 ### Gitea API (Direct HTTP Calls)
 #### ❌ Wrong Hostname
 ```bash
 curl http://gitea.local:3000/api/v1/...
 # Error: Could not resolve host: gitea.local
 ```
 **💡 Reason:** Gitea instance runs on `localhost:3000`, not `gitea.local`.
 **✅ Solution:** Use correct hostname (but prefer tea CLI)
 ```bash
 curl http://localhost:3000/api/v1/repos/jihoson/The-Ouroboros/issues \
  -H "Authorization: token $GITEA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"title":"...", "body":"..."}'
 ```
 **📝 Notes:**
 - Prefer `tea` CLI over direct API calls
 - Only use curl for operations tea doesn't support
 ### Git Commands
 #### ❌ User Not Configured
 ```bash
 git commit -m "message"
 # Error: Author identity unknown
 ```
 **💡 Reason:** Git user.name and user.email not set.
 **✅ Solution:** Configure git user
 ```bash
 git config user.name "agentson"
 git config user.email "agentson@localhost"
 ```
 #### ❌ Permission Denied on Push
 ```bash
 git push origin branch
 # Error: User permission denied for writing
 ```
 **💡 Reason:** Repository access token lacks write permissions or user lacks repo write access.
 **✅ Solution:**
 1. Verify user has write access to repository (admin grants this)
 2. Ensure git credential has correct token with `write:repository` scope
 3. Check remote URL uses correct authentication
 ### Python/Pytest
 #### ❌ Module Import Error
 ```bash
 pytest tests/test_foo.py
 # ModuleNotFoundError: No module named 'src'
 ```
 **💡 Reason:** Package not installed in development mode.
 **✅ Solution:** Install package with dev dependencies
 ```bash
 # Setup
 pip install -e ".[dev]"
 cp .env.example .env
 # Edit .env with your KIS and Gemini API credentials
 # Test
 pytest -v --cov=src
 # Run (paper trading)
 python -m src.main --mode=paper
 ```
-#### ❌ Async Test Hangs
+## Documentation
 ```python
 async def test_something():  # Hangs forever
    result = await async_function()
 ```
 **💡 Reason:** Missing pytest-asyncio or wrong configuration.
-**✅ Solution:** Already configured in pyproject.toml
+- **[Workflow Guide](docs/workflow.md)** — Git workflow policy and agent-based development
-```toml
+- **[Command Reference](docs/commands.md)** — Common failures, build commands, troubleshooting
-[tool.pytest.ini_options]
+- **[Architecture](docs/architecture.md)** — System design, components, data flow
-asyncio_mode = "auto"
+- **[Context Tree](docs/context-tree.md)** — L1-L7 hierarchical memory system
-```
+- **[Testing](docs/testing.md)** — Test structure, coverage requirements, writing tests
-No decorator needed for async tests.
+- **[Agent Policies](docs/agents.md)** — Prime directives, constraints, prohibited actions
-## Build & Test Commands
+## Core Principles
 1. **Safety First** — Risk manager is READ-ONLY and enforces circuit breakers
 2. **Test Everything** — 80% coverage minimum, all changes require tests
 3. **Issue-Driven Development** — All work goes through Gitea issues → feature branches → PRs
 4. **Agent Specialization** — Use dedicated agents for design, coding, testing, docs, review
 ## Project Structure
 ```
 src/
 ├── broker/          # KIS API client (domestic + overseas)
 ├── brain/           # Gemini AI decision engine
 ├── core/            # Risk manager (READ-ONLY)
 ├── evolution/       # Self-improvement optimizer
 ├── markets/         # Market schedules and timezone handling
 ├── db.py            # SQLite trade logging
 ├── main.py          # Trading loop orchestrator
 └── config.py        # Settings (from .env)
 tests/               # 54 tests across 4 files
 docs/                # Extended documentation
 ```
 ## Key Commands
 ```bash
-# Install all dependencies (production + dev)
+pytest -v --cov=src              # Run tests with coverage
-pip install ".[dev]"
+ruff check src/ tests/           # Lint
 mypy src/ --strict               # Type check
-# Run full test suite with coverage
+python -m src.main --mode=paper  # Paper trading
-pytest -v --cov=src --cov-report=term-missing
+python -m src.main --mode=live   # Live trading (⚠️ real money)
-# Run a single test file
+# Gitea workflow (requires tea CLI)
-pytest tests/test_risk.py -v
+YES="" ~/bin/tea issues create --repo jihoson/The-Ouroboros --title "..." --description "..."
-
+YES="" ~/bin/tea pulls create --head feature-branch --base main --title "..." --description "..."
 # Run a single test by name
 pytest tests/test_brain.py -k "test_parse_valid_json" -v
 # Lint
 ruff check src/ tests/
 # Type check (strict mode, non-blocking in CI)
 mypy src/ --strict
 # Run the trading agent
 python -m src.main --mode=paper
 # Docker
 docker compose up -d ouroboros          # Run agent
 docker compose --profile test up test   # Run tests in container
 ```
-## Architecture
+## Markets Supported
-Self-evolving AI trading agent for Korean stock markets (KIS API). The main loop in `src/main.py` orchestrates five components in a 60-second cycle per stock:
+- 🇰🇷 Korea (KRX)
 - 🇺🇸 United States (NASDAQ, NYSE, AMEX)
 - 🇯🇵 Japan (TSE)
 - 🇭🇰 Hong Kong (SEHK)
 - 🇨🇳 China (Shanghai, Shenzhen)
 - 🇻🇳 Vietnam (Hanoi, HCM)
-1. **Broker** (`src/broker/kis_api.py`) — Async KIS API client with automatic OAuth token refresh, leaky-bucket rate limiter (10 RPS), and POST body hash-key signing. Uses a custom SSL context with disabled hostname verification for the VTS (virtual trading) endpoint due to a known certificate mismatch.
+Markets auto-detected based on timezone and enabled in `ENABLED_MARKETS` env variable.
-2. **Brain** (`src/brain/gemini_client.py`) — Sends structured prompts to Google Gemini, parses JSON responses into `TradeDecision` objects. Forces HOLD when confidence < threshold (default 80). Falls back to safe HOLD on any parse/API error.
+## Critical Constraints
-3. **Risk Manager** (`src/core/risk_manager.py`) — **READ-ONLY by policy** (see `docs/agents.md`). Circuit breaker halts all trading via `SystemExit` when daily P&L drops below -3.0%. Fat-finger check rejects orders exceeding 30% of available cash.
+⚠️ **Non-Negotiable Rules** (see [docs/agents.md](docs/agents.md)):
-4. **Context Tree** (`src/context/`) — **NEW: Pillar 2 implementation.** 7-tier hierarchical memory (L1-L7) from real-time quotes to generational wisdom. Auto-aggregates daily → weekly → monthly → quarterly → annual → legacy. See [`docs/context-tree.md`](docs/context-tree.md) for details.
+- `src/core/risk_manager.py` is **READ-ONLY** — changes require human approval
 - Circuit breaker at -3.0% P&L — may only be made **stricter**
 - Fat-finger protection: max 30% of cash per order — always enforced
 - Confidence < 80 → force HOLD — cannot be weakened
 - All code changes → corresponding tests → coverage ≥ 80%
-5. **Evolution** (`src/evolution/optimizer.py`) — Analyzes high-confidence losing trades from SQLite, asks Gemini to generate new `BaseStrategy` subclasses, validates them by running the full pytest suite, and simulates PR creation.
+## Contributing
-**Data flow per cycle:** Fetch orderbook + balance → calculate P&L → query context tree → get Gemini decision → validate with risk manager → execute order → log to SQLite + context layers (`src/db.py`).
+See [docs/workflow.md](docs/workflow.md) for the complete development process.
-## Key Constraints (from `docs/agents.md`)
+**TL;DR:**
-
+1. Create issue in Gitea
- `core/risk_manager.py` is **READ-ONLY**. Changes require human approval.
+2. Create feature branch: `feature/issue-N-description`
- Circuit breaker threshold (-3.0%) may only be made stricter, never relaxed.
+3. Implement with tests
- Fat-finger protection (30% max order size) must always be enforced.
+4. Open PR
- Confidence < 80 **must** force HOLD — this rule cannot be weakened.
+5. Merge after review
 - All code changes require corresponding tests. Coverage must stay >= 80%.
 - Generated strategies must pass the full test suite before activation.
 ## Configuration
 Pydantic Settings loaded from `.env` (see `.env.example`). Required vars: `KIS_APP_KEY`, `KIS_APP_SECRET`, `KIS_ACCOUNT_NO` (format `XXXXXXXX-XX`), `GEMINI_API_KEY`. Tests use in-memory SQLite (`DB_PATH=":memory:"`) and dummy credentials via `tests/conftest.py`.
 ## Test Structure
 72 tests across five files. `asyncio_mode = "auto"` in pyproject.toml — async tests need no special decorator. The `settings` fixture in `conftest.py` provides safe defaults with test credentials and in-memory DB.
 - `test_risk.py` (11) — Circuit breaker boundaries, fat-finger edge cases
 - `test_broker.py` (6) — Token lifecycle, rate limiting, hash keys, network errors
 - `test_brain.py` (18) — JSON parsing, confidence threshold, malformed responses, prompt construction
 - `test_market_schedule.py` (19) — Market open/close logic, timezone handling, DST, lunch breaks
 - `test_context.py` (18) — **NEW:** Context tree CRUD, aggregation logic, retention policies, layer metadata
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -0,0 +1,191 @@
 # System Architecture
 ## Overview
 Self-evolving AI trading agent for global stock markets via KIS (Korea Investment & Securities) API. The main loop in `src/main.py` orchestrates four components in a 60-second cycle per stock across multiple markets.
 ## Core Components
 ### 1. Broker (`src/broker/`)
 **KISBroker** (`kis_api.py`) — Async KIS API client for domestic Korean market
 - Automatic OAuth token refresh (valid for 24 hours)
 - Leaky-bucket rate limiter (10 requests per second)
 - POST body hash-key signing for order authentication
 - Custom SSL context with disabled hostname verification for VTS (virtual trading) endpoint due to known certificate mismatch
 **OverseasBroker** (`overseas.py`) — KIS overseas stock API wrapper
 - Reuses KISBroker infrastructure (session, token, rate limiter) via composition
 - Supports 9 global markets: US (NASDAQ/NYSE/AMEX), Japan, Hong Kong, China (Shanghai/Shenzhen), Vietnam (Hanoi/HCM)
 - Different API endpoints for overseas price/balance/order operations
 **Market Schedule** (`src/markets/schedule.py`) — Timezone-aware market management
 - `MarketInfo` dataclass with timezone, trading hours, lunch breaks
 - Automatic DST handling via `zoneinfo.ZoneInfo`
 - `is_market_open()` checks weekends, trading hours, lunch breaks
 - `get_open_markets()` returns currently active markets
 - `get_next_market_open()` finds next market to open and when
 ### 2. Brain (`src/brain/gemini_client.py`)
 **GeminiClient** — AI decision engine powered by Google Gemini
 - Constructs structured prompts from market data
 - Parses JSON responses into `TradeDecision` objects (`action`, `confidence`, `rationale`)
 - Forces HOLD when confidence < threshold (default 80)
 - Falls back to safe HOLD on any parse/API error
 - Handles markdown-wrapped JSON, malformed responses, invalid actions
 ### 3. Risk Manager (`src/core/risk_manager.py`)
 **RiskManager** — Safety circuit breaker and order validation
 ⚠️ **READ-ONLY by policy** (see [`docs/agents.md`](./agents.md))
 - **Circuit Breaker**: Halts all trading via `SystemExit` when daily P&L drops below -3.0%
  - Threshold may only be made stricter, never relaxed
  - Calculated as `(total_eval - purchase_total) / purchase_total * 100`
 - **Fat-Finger Protection**: Rejects orders exceeding 30% of available cash
  - Must always be enforced, cannot be disabled
 ### 4. Evolution (`src/evolution/optimizer.py`)
 **StrategyOptimizer** — Self-improvement loop
 - Analyzes high-confidence losing trades from SQLite
 - Asks Gemini to generate new `BaseStrategy` subclasses
 - Validates generated strategies by running full pytest suite
 - Simulates PR creation for human review
 - Only activates strategies that pass all tests
 ## Data Flow
 ```
 ┌─────────────────────────────────────────────────────────────┐
 │ Main Loop (60s cycle per stock, per market)                │
 └─────────────────────────────────────────────────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Market Schedule Check             │
        │ - Get open markets                │
        │ - Filter by enabled markets       │
        │ - Wait if all closed              │
        └──────────────────┬────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Broker: Fetch Market Data        │
        │ - Domestic: orderbook + balance  │
        │ - Overseas: price + balance      │
        └──────────────────┬────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Calculate P&L                     │
        │ pnl_pct = (eval - cost) / cost   │
        └──────────────────┬────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Brain: Get Decision               │
        │ - Build prompt with market data   │
        │ - Call Gemini API                 │
        │ - Parse JSON response             │
        │ - Return TradeDecision            │
        └──────────────────┬────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Risk Manager: Validate Order      │
        │ - Check circuit breaker           │
        │ - Check fat-finger limit          │
        │ - Raise if validation fails       │
        └──────────────────┬────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Broker: Execute Order             │
        │ - Domestic: send_order()          │
        │ - Overseas: send_overseas_order() │
        └──────────────────┬────────────────┘
                           │
                           ▼
        ┌──────────────────────────────────┐
        │ Database: Log Trade               │
        │ - SQLite (data/trades.db)         │
        │ - Track: action, confidence,      │
        │   rationale, market, exchange     │
        └───────────────────────────────────┘
 ```
 ## Database Schema
 **SQLite** (`src/db.py`)
 ```sql
 CREATE TABLE trades (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp TEXT NOT NULL,
    stock_code TEXT NOT NULL,
    action TEXT NOT NULL,          -- BUY | SELL | HOLD
    confidence INTEGER NOT NULL,   -- 0-100
    rationale TEXT,
    quantity INTEGER,
    price REAL,
    pnl REAL DEFAULT 0.0,
    market TEXT DEFAULT 'KR',       -- KR | US_NASDAQ | JP | etc.
    exchange_code TEXT DEFAULT 'KRX' -- KRX | NASD | NYSE | etc.
 );
 ```
 Auto-migration: Adds `market` and `exchange_code` columns if missing for backward compatibility.
 ## Configuration
 **Pydantic Settings** (`src/config.py`)
 Loaded from `.env` file:
 ```bash
 # Required
 KIS_APP_KEY=your_app_key
 KIS_APP_SECRET=your_app_secret
 KIS_ACCOUNT_NO=XXXXXXXX-XX
 GEMINI_API_KEY=your_gemini_key
 # Optional
 MODE=paper                    # paper | live
 DB_PATH=data/trades.db
 CONFIDENCE_THRESHOLD=80
 MAX_LOSS_PCT=3.0
 MAX_ORDER_PCT=30.0
 ENABLED_MARKETS=KR,US_NASDAQ  # Comma-separated market codes
 ```
 Tests use in-memory SQLite (`DB_PATH=":memory:"`) and dummy credentials via `tests/conftest.py`.
 ## Error Handling
 ### Connection Errors (Broker API)
 - Retry with exponential backoff (2^attempt seconds)
 - Max 3 retries per stock
 - After exhaustion, skip stock and continue with next
 ### API Quota Errors (Gemini)
 - Return safe HOLD decision with confidence=0
 - Log error but don't crash
 - Agent continues trading on next cycle
 ### Circuit Breaker Tripped
 - Immediately halt via `SystemExit`
 - Log critical message
 - Requires manual intervention to restart
 ### Market Closed
 - Wait until next market opens
 - Use `get_next_market_open()` to calculate wait time
 - Sleep until market open time
--- a/docs/commands.md
+++ b/docs/commands.md
@@ -0,0 +1,156 @@
 # Command Reference
 ## Common Command Failures
 **Critical: Learn from failures. Never repeat the same failed command without modification.**
 ### tea CLI (Gitea Command Line Tool)
 #### ❌ TTY Error - Interactive Confirmation Fails
 ```bash
 ~/bin/tea issues create --repo X --title "Y" --description "Z"
 # Error: huh: could not open a new TTY: open /dev/tty: no such device or address
 ```
 **💡 Reason:** tea tries to open `/dev/tty` for interactive confirmation prompts, which is unavailable in non-interactive environments.
 **✅ Solution:** Use `YES=""` environment variable to bypass confirmation
 ```bash
 YES="" ~/bin/tea issues create --repo jihoson/The-Ouroboros --title "Title" --description "Body"
 YES="" ~/bin/tea issues edit <number> --repo jihoson/The-Ouroboros --description "Updated body"
 YES="" ~/bin/tea pulls create --repo jihoson/The-Ouroboros --head feature-branch --base main --title "Title" --description "Body"
 ```
 **📝 Notes:**
 - Always set default login: `~/bin/tea login default local`
 - Use `--repo jihoson/The-Ouroboros` when outside repo directory
 - tea is preferred over direct Gitea API calls for consistency
 #### ❌ Wrong Parameter Name
 ```bash
 tea issues create --body "text"
 # Error: flag provided but not defined: -body
 ```
 **💡 Reason:** Parameter is `--description`, not `--body`.
 **✅ Solution:** Use correct parameter name
 ```bash
 YES="" ~/bin/tea issues create --description "text"
 ```
 ### Gitea API (Direct HTTP Calls)
 #### ❌ Wrong Hostname
 ```bash
 curl http://gitea.local:3000/api/v1/...
 # Error: Could not resolve host: gitea.local
 ```
 **💡 Reason:** Gitea instance runs on `localhost:3000`, not `gitea.local`.
 **✅ Solution:** Use correct hostname (but prefer tea CLI)
 ```bash
 curl http://localhost:3000/api/v1/repos/jihoson/The-Ouroboros/issues \
  -H "Authorization: token $GITEA_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"title":"...", "body":"..."}'
 ```
 **📝 Notes:**
 - Prefer `tea` CLI over direct API calls
 - Only use curl for operations tea doesn't support
 ### Git Commands
 #### ❌ User Not Configured
 ```bash
 git commit -m "message"
 # Error: Author identity unknown
 ```
 **💡 Reason:** Git user.name and user.email not set.
 **✅ Solution:** Configure git user
 ```bash
 git config user.name "agentson"
 git config user.email "agentson@localhost"
 ```
 #### ❌ Permission Denied on Push
 ```bash
 git push origin branch
 # Error: User permission denied for writing
 ```
 **💡 Reason:** Repository access token lacks write permissions or user lacks repo write access.
 **✅ Solution:**
 1. Verify user has write access to repository (admin grants this)
 2. Ensure git credential has correct token with `write:repository` scope
 3. Check remote URL uses correct authentication
 ### Python/Pytest
 #### ❌ Module Import Error
 ```bash
 pytest tests/test_foo.py
 # ModuleNotFoundError: No module named 'src'
 ```
 **💡 Reason:** Package not installed in development mode.
 **✅ Solution:** Install package with dev dependencies
 ```bash
 pip install -e ".[dev]"
 ```
 #### ❌ Async Test Hangs
 ```python
 async def test_something():  # Hangs forever
    result = await async_function()
 ```
 **💡 Reason:** Missing pytest-asyncio or wrong configuration.
 **✅ Solution:** Already configured in pyproject.toml
 ```toml
 [tool.pytest.ini_options]
 asyncio_mode = "auto"
 ```
 No decorator needed for async tests.
 ## Build & Test Commands
 ```bash
 # Install all dependencies (production + dev)
 pip install -e ".[dev]"
 # Run full test suite with coverage
 pytest -v --cov=src --cov-report=term-missing
 # Run a single test file
 pytest tests/test_risk.py -v
 # Run a single test by name
 pytest tests/test_brain.py -k "test_parse_valid_json" -v
 # Lint
 ruff check src/ tests/
 # Type check (strict mode, non-blocking in CI)
 mypy src/ --strict
 # Run the trading agent
 python -m src.main --mode=paper
 # Docker
 docker compose up -d ouroboros          # Run agent
 docker compose --profile test up test   # Run tests in container
 ```
 ## Environment Setup
 ```bash
 # Create .env file from example
 cp .env.example .env
 # Edit .env with your credentials
 # Required: KIS_APP_KEY, KIS_APP_SECRET, KIS_ACCOUNT_NO, GEMINI_API_KEY
 # Verify configuration
 python -c "from src.config import Settings; print(Settings())"
 ```
--- a/docs/testing.md
+++ b/docs/testing.md
@@ -0,0 +1,213 @@
 # Testing Guidelines
 ## Test Structure
 **54 tests** across four files. `asyncio_mode = "auto"` in pyproject.toml — async tests need no special decorator.
 The `settings` fixture in `conftest.py` provides safe defaults with test credentials and in-memory DB.
 ### Test Files
 #### `tests/test_risk.py` (11 tests)
 - Circuit breaker boundaries
 - Fat-finger edge cases
 - P&L calculation edge cases
 - Order validation logic
 **Example:**
 ```python
 def test_circuit_breaker_exact_threshold(risk_manager):
    """Circuit breaker should trip at exactly -3.0%."""
    with pytest.raises(CircuitBreakerTripped):
        risk_manager.validate_order(
            current_pnl_pct=-3.0,
            order_amount=1000,
            total_cash=10000
        )
 ```
 #### `tests/test_broker.py` (6 tests)
 - OAuth token lifecycle
 - Rate limiting enforcement
 - Hash key generation
 - Network error handling
 - SSL context configuration
 **Example:**
 ```python
 async def test_rate_limiter(broker):
    """Rate limiter should delay requests to stay under 10 RPS."""
    start = time.monotonic()
    for _ in range(15):  # 15 requests
        await broker._rate_limiter.acquire()
    elapsed = time.monotonic() - start
    assert elapsed >= 1.0  # Should take at least 1 second
 ```
 #### `tests/test_brain.py` (18 tests)
 - Valid JSON parsing
 - Markdown-wrapped JSON handling
 - Malformed JSON fallback
 - Missing fields handling
 - Invalid action validation
 - Confidence threshold enforcement
 - Empty response handling
 - Prompt construction for different markets
 **Example:**
 ```python
 async def test_confidence_below_threshold_forces_hold(brain):
    """Decisions below confidence threshold should force HOLD."""
    decision = brain.parse_response('{"action":"BUY","confidence":70,"rationale":"test"}')
    assert decision.action == "HOLD"
    assert decision.confidence == 70
 ```
 #### `tests/test_market_schedule.py` (19 tests)
 - Market open/close logic
 - Timezone handling (UTC, Asia/Seoul, America/New_York, etc.)
 - DST (Daylight Saving Time) transitions
 - Weekend handling
 - Lunch break logic
 - Multiple market filtering
 - Next market open calculation
 **Example:**
 ```python
 def test_is_market_open_during_trading_hours():
    """Market should be open during regular trading hours."""
    # KRX: 9:00-15:30 KST, no lunch break
    market = MARKETS["KR"]
    trading_time = datetime(2026, 2, 3, 10, 0, tzinfo=ZoneInfo("Asia/Seoul"))  # Monday 10:00
    assert is_market_open(market, trading_time) is True
 ```
 ## Coverage Requirements
 **Minimum coverage: 80%**
 Check coverage:
 ```bash
 pytest -v --cov=src --cov-report=term-missing
 ```
 Expected output:
 ```
 Name                          Stmts   Miss  Cover   Missing
 -----------------------------------------------------------
 src/brain/gemini_client.py       85      5    94%   165-169
 src/broker/kis_api.py           120     12    90%   ...
 src/core/risk_manager.py         35      2    94%   ...
 src/db.py                        25      1    96%   ...
 src/main.py                     150     80    47%   (excluded from CI)
 src/markets/schedule.py          95      3    97%   ...
 -----------------------------------------------------------
 TOTAL                           510     103   80%
 ```
 **Note:** `main.py` has lower coverage as it contains the main loop which is tested via integration/manual testing.
 ## Test Configuration
 ### `pyproject.toml`
 ```toml
 [tool.pytest.ini_options]
 asyncio_mode = "auto"
 testpaths = ["tests"]
 python_files = ["test_*.py"]
 ```
 ### `tests/conftest.py`
 ```python
@pytest.fixture
 def settings() -> Settings:
    """Provide test settings with safe defaults."""
    return Settings(
        KIS_APP_KEY="test_key",
        KIS_APP_SECRET="test_secret",
        KIS_ACCOUNT_NO="12345678-01",
        GEMINI_API_KEY="test_gemini_key",
        MODE="paper",
        DB_PATH=":memory:",  # In-memory SQLite
        CONFIDENCE_THRESHOLD=80,
        ENABLED_MARKETS="KR",
    )
 ```
 ## Writing New Tests
 ### Naming Convention
 - Test files: `test_<module>.py`
 - Test functions: `test_<feature>_<scenario>()`
 - Use descriptive names that explain what is being tested
 ### Good Test Example
 ```python
 async def test_send_order_with_market_price(broker, settings):
    """Market orders should use price=0 and ORD_DVSN='01'."""
    # Arrange
    stock_code = "005930"
    order_type = "BUY"
    quantity = 10
    # Act
    with patch.object(broker._session, 'post') as mock_post:
        mock_post.return_value.__aenter__.return_value.status = 200
        mock_post.return_value.__aenter__.return_value.json = AsyncMock(
            return_value={"rt_cd": "0", "msg1": "OK"}
        )
        await broker.send_order(stock_code, order_type, quantity, price=0)
    # Assert
    call_args = mock_post.call_args
    body = call_args.kwargs['json']
    assert body['ORD_DVSN'] == '01'  # Market order
    assert body['ORD_UNPR'] == '0'   # Price 0
 ```
 ### Test Checklist
 - [ ] Test passes in isolation (`pytest tests/test_foo.py::test_bar -v`)
 - [ ] Test has clear docstring explaining what it tests
 - [ ] Arrange-Act-Assert structure
 - [ ] Uses appropriate fixtures from conftest.py
 - [ ] Mocks external dependencies (API calls, network)
 - [ ] Tests edge cases and error conditions
 - [ ] Doesn't rely on test execution order
 ## Running Tests
 ```bash
 # All tests
 pytest -v
 # Specific file
 pytest tests/test_risk.py -v
 # Specific test
 pytest tests/test_brain.py::test_parse_valid_json -v
 # With coverage
 pytest -v --cov=src --cov-report=term-missing
 # Stop on first failure
 pytest -x
 # Verbose output with print statements
 pytest -v -s
 ```
 ## CI/CD Integration
 Tests run automatically on:
 - Every commit to feature branches
 - Every PR to main
 - Scheduled daily runs
 **Blocking conditions:**
 - Test failures → PR blocked
 - Coverage < 80% → PR blocked (warning only for main.py)
 **Non-blocking:**
 - `mypy --strict` errors (type hints encouraged but not enforced)
 - `ruff check` warnings (must be acknowledged)
--- a/docs/workflow.md
+++ b/docs/workflow.md
@@ -0,0 +1,75 @@
 # Development Workflow
 ## Git Workflow Policy
 **CRITICAL: All code changes MUST follow this workflow. Direct pushes to `main` are ABSOLUTELY PROHIBITED.**
 1. **Create Gitea Issue First** — All features, bug fixes, and policy changes require a Gitea issue before any code is written
 2. **Create Feature Branch** — Branch from `main` using format `feature/issue-{N}-{short-description}`
 3. **Implement Changes** — Write code, tests, and documentation on the feature branch
 4. **Create Pull Request** — Submit PR to `main` branch referencing the issue number
 5. **Review & Merge** — After approval, merge via PR (squash or merge commit)
 **Never commit directly to `main`.** This policy applies to all changes, no exceptions.
 ## Agent Workflow
 **Modern AI development leverages specialized agents for concurrent, efficient task execution.**
 ### Parallel Execution Strategy
 Use **git worktree** or **subagents** (via the Task tool) to handle multiple requirements simultaneously:
 - Each task runs in independent context
 - Parallel branches for concurrent features
 - Isolated test environments prevent interference
 - Faster iteration with distributed workload
 ### Specialized Agent Roles
 Deploy task-specific agents as needed instead of handling everything in the main conversation:
 - **Conversational Agent** (main) — Interface with user, coordinate other agents
 - **Ticket Management Agent** — Create/update Gitea issues, track task status
 - **Design Agent** — Architectural planning, RFC documents, API design
 - **Code Writing Agent** — Implementation following specs
 - **Testing Agent** — Write tests, verify coverage, run test suites
 - **Documentation Agent** — Update docs, docstrings, CLAUDE.md, README
 - **Review Agent** — Code review, lint checks, security audits
 - **Custom Agents** — Created dynamically for specialized tasks (performance analysis, migration scripts, etc.)
 ### When to Use Agents
 **Prefer spawning specialized agents for:**
 1. Complex multi-file changes requiring exploration
 2. Tasks with clear, isolated scope (e.g., "write tests for module X")
 3. Parallel work streams (feature A + bugfix B simultaneously)
 4. Long-running analysis (codebase search, dependency audit)
 5. Tasks requiring different contexts (multiple git worktrees)
 **Use the main conversation for:**
 1. User interaction and clarification
 2. Quick single-file edits
 3. Coordinating agent work
 4. High-level decision making
 ### Implementation
 ```python
 # Example: Spawn parallel test and documentation agents
 task_tool(
    subagent_type="general-purpose",
    prompt="Write comprehensive tests for src/markets/schedule.py",
    description="Write schedule tests"
 )
 task_tool(
    subagent_type="general-purpose",
    prompt="Update README.md with global market feature documentation",
    description="Update README"
 )
 ```
 Use `run_in_background=True` for independent tasks that don't block subsequent work.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -8,6 +8,7 @@ dependencies = [
    "pydantic>=2.5,<3",
    "pydantic-settings>=2.1,<3",
    "google-genai>=1.0,<2",
    "scipy>=1.11,<2",
 ]
 [project.optional-dependencies]
--- a/src/db.py
+++ b/src/db.py
@@ -55,6 +55,28 @@ def init_db(db_path: str) -> sqlite3.Connection:
        """
    )
    # Decision logging table for comprehensive audit trail
    conn.execute(
        """
        CREATE TABLE IF NOT EXISTS decision_logs (
            decision_id TEXT PRIMARY KEY,
            timestamp TEXT NOT NULL,
            stock_code TEXT NOT NULL,
            market TEXT NOT NULL,
            exchange_code TEXT NOT NULL,
            action TEXT NOT NULL,
            confidence INTEGER NOT NULL,
            rationale TEXT NOT NULL,
            context_snapshot TEXT NOT NULL,
            input_data TEXT NOT NULL,
            outcome_pnl REAL,
            outcome_accuracy INTEGER,
            reviewed INTEGER DEFAULT 0,
            review_notes TEXT
        )
        """
    )
    conn.execute(
        """
        CREATE TABLE IF NOT EXISTS context_metadata (
@@ -71,6 +93,16 @@ def init_db(db_path: str) -> sqlite3.Connection:
    conn.execute("CREATE INDEX IF NOT EXISTS idx_contexts_timeframe ON contexts(timeframe)")
    conn.execute("CREATE INDEX IF NOT EXISTS idx_contexts_updated ON contexts(updated_at)")
    # Create indices for efficient decision log queries
    conn.execute(
        "CREATE INDEX IF NOT EXISTS idx_decision_logs_timestamp ON decision_logs(timestamp)"
    )
    conn.execute(
        "CREATE INDEX IF NOT EXISTS idx_decision_logs_reviewed ON decision_logs(reviewed)"
    )
    conn.execute(
        "CREATE INDEX IF NOT EXISTS idx_decision_logs_confidence ON decision_logs(confidence)"
    )
    conn.commit()
    return conn
--- a/src/evolution/init.py
+++ b/src/evolution/init.py
@@ -0,0 +1,19 @@
 """Evolution engine for self-improving trading strategies."""
 from src.evolution.ab_test import ABTester, ABTestResult, StrategyPerformance
 from src.evolution.optimizer import EvolutionOptimizer
 from src.evolution.performance_tracker import (
    PerformanceDashboard,
    PerformanceTracker,
    StrategyMetrics,
 )
 __all__ = [
    "EvolutionOptimizer",
    "ABTester",
    "ABTestResult",
    "StrategyPerformance",
    "PerformanceTracker",
    "PerformanceDashboard",
    "StrategyMetrics",
 ]
--- a/src/evolution/ab_test.py
+++ b/src/evolution/ab_test.py
@@ -0,0 +1,220 @@
 """A/B Testing framework for strategy comparison.
 Runs multiple strategies in parallel, tracks their performance,
 and uses statistical significance testing to determine winners.
 """
 from __future__ import annotations
 import logging
 from dataclasses import dataclass
 from typing import Any
 import scipy.stats as stats
 logger = logging.getLogger(__name__)
@dataclass
 class StrategyPerformance:
    """Performance metrics for a single strategy."""
    strategy_name: str
    total_trades: int
    wins: int
    losses: int
    total_pnl: float
    avg_pnl: float
    win_rate: float
    sharpe_ratio: float | None = None
@dataclass
 class ABTestResult:
    """Result of an A/B test between two strategies."""
    strategy_a: str
    strategy_b: str
    winner: str | None
    p_value: float
    confidence_level: float
    is_significant: bool
    performance_a: StrategyPerformance
    performance_b: StrategyPerformance
 class ABTester:
    """A/B testing framework for comparing trading strategies."""
    def __init__(self, significance_level: float = 0.05) -> None:
        """Initialize A/B tester.
        Args:
            significance_level: P-value threshold for statistical significance (default 0.05)
        """
        self._significance_level = significance_level
    def calculate_performance(
        self, trades: list[dict[str, Any]], strategy_name: str
    ) -> StrategyPerformance:
        """Calculate performance metrics for a strategy.
        Args:
            trades: List of trade records with pnl values
            strategy_name: Name of the strategy
        Returns:
            StrategyPerformance object with calculated metrics
        """
        if not trades:
            return StrategyPerformance(
                strategy_name=strategy_name,
                total_trades=0,
                wins=0,
                losses=0,
                total_pnl=0.0,
                avg_pnl=0.0,
                win_rate=0.0,
                sharpe_ratio=None,
            )
        total_trades = len(trades)
        wins = sum(1 for t in trades if t.get("pnl", 0) > 0)
        losses = sum(1 for t in trades if t.get("pnl", 0) < 0)
        pnls = [t.get("pnl", 0.0) for t in trades]
        total_pnl = sum(pnls)
        avg_pnl = total_pnl / total_trades if total_trades > 0 else 0.0
        win_rate = (wins / total_trades * 100) if total_trades > 0 else 0.0
        # Calculate Sharpe ratio (risk-adjusted return)
        sharpe_ratio = None
        if len(pnls) > 1:
            mean_return = avg_pnl
            std_return = (
                sum((p - mean_return) ** 2 for p in pnls) / (len(pnls) - 1)
            ) ** 0.5
            if std_return > 0:
                sharpe_ratio = mean_return / std_return
        return StrategyPerformance(
            strategy_name=strategy_name,
            total_trades=total_trades,
            wins=wins,
            losses=losses,
            total_pnl=round(total_pnl, 2),
            avg_pnl=round(avg_pnl, 2),
            win_rate=round(win_rate, 2),
            sharpe_ratio=round(sharpe_ratio, 4) if sharpe_ratio else None,
        )
    def compare_strategies(
        self,
        trades_a: list[dict[str, Any]],
        trades_b: list[dict[str, Any]],
        strategy_a_name: str = "Strategy A",
        strategy_b_name: str = "Strategy B",
    ) -> ABTestResult:
        """Compare two strategies using statistical testing.
        Uses a two-sample t-test to determine if performance difference is significant.
        Args:
            trades_a: List of trades from strategy A
            trades_b: List of trades from strategy B
            strategy_a_name: Name of strategy A
            strategy_b_name: Name of strategy B
        Returns:
            ABTestResult with comparison details
        """
        perf_a = self.calculate_performance(trades_a, strategy_a_name)
        perf_b = self.calculate_performance(trades_b, strategy_b_name)
        # Extract PnL arrays for statistical testing
        pnls_a = [t.get("pnl", 0.0) for t in trades_a]
        pnls_b = [t.get("pnl", 0.0) for t in trades_b]
        # Perform two-sample t-test
        if len(pnls_a) > 1 and len(pnls_b) > 1:
            t_stat, p_value = stats.ttest_ind(pnls_a, pnls_b, equal_var=False)
            is_significant = p_value < self._significance_level
            confidence_level = (1 - p_value) * 100
        else:
            # Not enough data for statistical test
            p_value = 1.0
            is_significant = False
            confidence_level = 0.0
        # Determine winner based on average PnL
        winner = None
        if is_significant:
            if perf_a.avg_pnl > perf_b.avg_pnl:
                winner = strategy_a_name
            elif perf_b.avg_pnl > perf_a.avg_pnl:
                winner = strategy_b_name
        return ABTestResult(
            strategy_a=strategy_a_name,
            strategy_b=strategy_b_name,
            winner=winner,
            p_value=round(p_value, 4),
            confidence_level=round(confidence_level, 2),
            is_significant=is_significant,
            performance_a=perf_a,
            performance_b=perf_b,
        )
    def should_deploy(
        self,
        result: ABTestResult,
        min_win_rate: float = 60.0,
        min_trades: int = 20,
    ) -> bool:
        """Determine if a winning strategy should be deployed.
        Args:
            result: A/B test result
            min_win_rate: Minimum win rate percentage for deployment (default 60%)
            min_trades: Minimum number of trades required (default 20)
        Returns:
            True if the winning strategy meets deployment criteria
        """
        if not result.is_significant or result.winner is None:
            return False
        # Get performance of winning strategy
        if result.winner == result.strategy_a:
            winning_perf = result.performance_a
        else:
            winning_perf = result.performance_b
        # Check deployment criteria
        has_enough_trades = winning_perf.total_trades >= min_trades
        has_good_win_rate = winning_perf.win_rate >= min_win_rate
        is_profitable = winning_perf.avg_pnl > 0
        meets_criteria = has_enough_trades and has_good_win_rate and is_profitable
        if meets_criteria:
            logger.info(
                "Strategy '%s' meets deployment criteria: "
                "win_rate=%.2f%%, trades=%d, avg_pnl=%.2f",
                result.winner,
                winning_perf.win_rate,
                winning_perf.total_trades,
                winning_perf.avg_pnl,
            )
        else:
            logger.info(
                "Strategy '%s' does NOT meet deployment criteria: "
                "win_rate=%.2f%% (min %.2f%%), trades=%d (min %d), avg_pnl=%.2f",
                result.winner if result.winner else "unknown",
                winning_perf.win_rate if result.winner else 0.0,
                min_win_rate,
                winning_perf.total_trades if result.winner else 0,
                min_trades,
                winning_perf.avg_pnl if result.winner else 0.0,
            )
        return meets_criteria
--- a/src/evolution/optimizer.py
+++ b/src/evolution/optimizer.py
@@ -1,10 +1,10 @@
 """Evolution Engine — analyzes trade logs and generates new strategies.
 This module:
-1. Reads trade_logs.db to identify failing patterns
+1. Uses DecisionLogger.get_losing_decisions() to identify failing patterns
-2. Asks Gemini to generate a new strategy class
+2. Analyzes failure patterns by time, market conditions, stock characteristics
-3. Runs pytest on the generated file
+3. Asks Gemini to generate improved strategy recommendations
-4. Creates a simulated PR if tests pass
+4. Generates new strategy classes with enhanced decision-making logic
 """
 from __future__ import annotations
@@ -14,6 +14,7 @@ import logging
 import sqlite3
 import subprocess
 import textwrap
 from collections import Counter
 from datetime import UTC, datetime
 from pathlib import Path
 from typing import Any
@@ -21,6 +22,8 @@ from typing import Any
 from google import genai
 from src.config import Settings
 from src.db import init_db
 from src.logging.decision_logger import DecisionLog, DecisionLogger
 logger = logging.getLogger(__name__)
@@ -53,29 +56,105 @@ class EvolutionOptimizer:
        self._db_path = settings.DB_PATH
        self._client = genai.Client(api_key=settings.GEMINI_API_KEY)
        self._model_name = settings.GEMINI_MODEL
        self._conn = init_db(self._db_path)
        self._decision_logger = DecisionLogger(self._conn)
    # ------------------------------------------------------------------
    # Analysis
    # ------------------------------------------------------------------
    def analyze_failures(self, limit: int = 50) -> list[dict[str, Any]]:
-        """Find trades where high confidence led to losses."""
+        """Find high-confidence decisions that resulted in losses.
-        conn = sqlite3.connect(self._db_path)
+
-        conn.row_factory = sqlite3.Row
+        Uses DecisionLogger.get_losing_decisions() to retrieve failures.
-        try:
+        """
-            rows = conn.execute(
+        losing_decisions = self._decision_logger.get_losing_decisions(
-                """
+            min_confidence=80, min_loss=-100.0
-                SELECT stock_code, action, confidence, pnl, rationale, timestamp
+        )
-                FROM trades
+
-                WHERE confidence >= 80 AND pnl < 0
+        # Limit results
-                ORDER BY pnl ASC
+        if len(losing_decisions) > limit:
-                LIMIT ?
+            losing_decisions = losing_decisions[:limit]
-                """,
+
-                (limit,),
+        # Convert to dict format for analysis
-            ).fetchall()
+        failures = []
-            return [dict(r) for r in rows]
+        for decision in losing_decisions:
-        finally:
+            failures.append({
-            conn.close()
+                "decision_id": decision.decision_id,
                "timestamp": decision.timestamp,
                "stock_code": decision.stock_code,
                "market": decision.market,
                "exchange_code": decision.exchange_code,
                "action": decision.action,
                "confidence": decision.confidence,
                "rationale": decision.rationale,
                "outcome_pnl": decision.outcome_pnl,
                "outcome_accuracy": decision.outcome_accuracy,
                "context_snapshot": decision.context_snapshot,
                "input_data": decision.input_data,
            })
        return failures
    def identify_failure_patterns(
        self, failures: list[dict[str, Any]]
    ) -> dict[str, Any]:
        """Identify patterns in losing decisions.
        Analyzes:
        - Time patterns (hour of day, day of week)
        - Market conditions (volatility, volume)
        - Stock characteristics (price range, market)
        - Common failure modes in rationale
        """
        if not failures:
            return {"pattern_count": 0, "patterns": {}}
        patterns = {
            "markets": Counter(),
            "actions": Counter(),
            "hours": Counter(),
            "avg_confidence": 0.0,
            "avg_loss": 0.0,
            "total_failures": len(failures),
        }
        total_confidence = 0
        total_loss = 0.0
        for failure in failures:
            # Market distribution
            patterns["markets"][failure.get("market", "UNKNOWN")] += 1
            # Action distribution
            patterns["actions"][failure.get("action", "UNKNOWN")] += 1
            # Time pattern (extract hour from ISO timestamp)
            timestamp = failure.get("timestamp", "")
            if timestamp:
                try:
                    dt = datetime.fromisoformat(timestamp)
                    patterns["hours"][dt.hour] += 1
                except (ValueError, AttributeError):
                    pass
            # Aggregate metrics
            total_confidence += failure.get("confidence", 0)
            total_loss += failure.get("outcome_pnl", 0.0)
        patterns["avg_confidence"] = (
            round(total_confidence / len(failures), 2) if failures else 0.0
        )
        patterns["avg_loss"] = (
            round(total_loss / len(failures), 2) if failures else 0.0
        )
        # Convert Counters to regular dicts for JSON serialization
        patterns["markets"] = dict(patterns["markets"])
        patterns["actions"] = dict(patterns["actions"])
        patterns["hours"] = dict(patterns["hours"])
        return patterns
    def get_performance_summary(self) -> dict[str, Any]:
        """Return aggregate performance metrics from trade logs."""
@@ -109,14 +188,25 @@ class EvolutionOptimizer:
    async def generate_strategy(self, failures: list[dict[str, Any]]) -> Path | None:
        """Ask Gemini to generate a new strategy based on failure analysis.
        Integrates failure patterns and market conditions to create improved strategies.
        Returns the path to the generated strategy file, or None on failure.
        """
        # Identify failure patterns first
        patterns = self.identify_failure_patterns(failures)
        prompt = (
            "You are a quantitative trading strategy developer.\n"
-            "Analyze these failed trades and generate an improved strategy.\n\n"
+            "Analyze these failed trades and their patterns, then generate an improved strategy.\n\n"
-            f"Failed trades:\n{json.dumps(failures, indent=2, default=str)}\n\n"
+            f"Failure Patterns:\n{json.dumps(patterns, indent=2)}\n\n"
-            "Generate a Python class that inherits from BaseStrategy.\n"
+            f"Sample Failed Trades (first 5):\n"
-            "The class must have an `evaluate(self, market_data: dict) -> dict` method.\n"
+            f"{json.dumps(failures[:5], indent=2, default=str)}\n\n"
            "Based on these patterns, generate an improved trading strategy.\n"
            "The strategy should:\n"
            "1. Avoid the identified failure patterns\n"
            "2. Consider market-specific conditions\n"
            "3. Adjust confidence based on historical performance\n\n"
            "Generate a Python method body that inherits from BaseStrategy.\n"
            "The method signature is: evaluate(self, market_data: dict) -> dict\n"
            "The method must return a dict with keys: action, confidence, rationale.\n"
            "Respond with ONLY the method body (Python code), no class definition.\n"
        )
@@ -147,10 +237,15 @@ class EvolutionOptimizer:
        # Indent the body for the class method
        indented_body = textwrap.indent(body, "            ")
        # Generate rationale from patterns
        rationale = f"Auto-evolved from {len(failures)} failures. "
        rationale += f"Primary failure markets: {list(patterns.get('markets', {}).keys())}. "
        rationale += f"Average loss: {patterns.get('avg_loss', 0.0)}"
        content = STRATEGY_TEMPLATE.format(
            name=version,
            timestamp=datetime.now(UTC).isoformat(),
-            rationale="Auto-evolved from failure analysis",
+            rationale=rationale,
            class_name=class_name,
            body=indented_body.strip(),
        )
--- a/src/evolution/performance_tracker.py
+++ b/src/evolution/performance_tracker.py
@@ -0,0 +1,303 @@
 """Performance tracking system for strategy monitoring.
 Tracks win rates, monitors improvement over time,
 and provides performance metrics dashboard.
 """
 from __future__ import annotations
 import json
 import logging
 import sqlite3
 from dataclasses import asdict, dataclass
 from datetime import UTC, datetime, timedelta
 from typing import Any
 logger = logging.getLogger(__name__)
@dataclass
 class StrategyMetrics:
    """Performance metrics for a strategy over a time period."""
    strategy_name: str
    period_start: str
    period_end: str
    total_trades: int
    wins: int
    losses: int
    holds: int
    win_rate: float
    avg_pnl: float
    total_pnl: float
    best_trade: float
    worst_trade: float
    avg_confidence: float
@dataclass
 class PerformanceDashboard:
    """Comprehensive performance dashboard."""
    generated_at: str
    overall_metrics: StrategyMetrics
    daily_metrics: list[StrategyMetrics]
    weekly_metrics: list[StrategyMetrics]
    improvement_trend: dict[str, Any]
 class PerformanceTracker:
    """Tracks and monitors strategy performance over time."""
    def __init__(self, db_path: str) -> None:
        """Initialize performance tracker.
        Args:
            db_path: Path to the trade logs database
        """
        self._db_path = db_path
    def get_strategy_metrics(
        self,
        strategy_name: str | None = None,
        start_date: str | None = None,
        end_date: str | None = None,
    ) -> StrategyMetrics:
        """Get performance metrics for a strategy over a time period.
        Args:
            strategy_name: Name of the strategy (None = all strategies)
            start_date: Start date in ISO format (None = beginning of time)
            end_date: End date in ISO format (None = now)
        Returns:
            StrategyMetrics object with performance data
        """
        conn = sqlite3.connect(self._db_path)
        conn.row_factory = sqlite3.Row
        try:
            # Build query with optional filters
            query = """
                SELECT
                    COUNT(*) as total_trades,
                    SUM(CASE WHEN pnl > 0 THEN 1 ELSE 0 END) as wins,
                    SUM(CASE WHEN pnl < 0 THEN 1 ELSE 0 END) as losses,
                    SUM(CASE WHEN action = 'HOLD' THEN 1 ELSE 0 END) as holds,
                    COALESCE(AVG(CASE WHEN pnl IS NOT NULL THEN pnl END), 0) as avg_pnl,
                    COALESCE(SUM(CASE WHEN pnl IS NOT NULL THEN pnl ELSE 0 END), 0) as total_pnl,
                    COALESCE(MAX(pnl), 0) as best_trade,
                    COALESCE(MIN(pnl), 0) as worst_trade,
                    COALESCE(AVG(confidence), 0) as avg_confidence,
                    MIN(timestamp) as period_start,
                    MAX(timestamp) as period_end
                FROM trades
                WHERE 1=1
            """
            params: list[Any] = []
            if start_date:
                query += " AND timestamp >= ?"
                params.append(start_date)
            if end_date:
                query += " AND timestamp <= ?"
                params.append(end_date)
            # Note: Currently trades table doesn't have strategy_name column
            # This is a placeholder for future extension
            row = conn.execute(query, params).fetchone()
            total_trades = row["total_trades"] or 0
            wins = row["wins"] or 0
            win_rate = (wins / total_trades * 100) if total_trades > 0 else 0.0
            return StrategyMetrics(
                strategy_name=strategy_name or "default",
                period_start=row["period_start"] or "",
                period_end=row["period_end"] or "",
                total_trades=total_trades,
                wins=wins,
                losses=row["losses"] or 0,
                holds=row["holds"] or 0,
                win_rate=round(win_rate, 2),
                avg_pnl=round(row["avg_pnl"], 2),
                total_pnl=round(row["total_pnl"], 2),
                best_trade=round(row["best_trade"], 2),
                worst_trade=round(row["worst_trade"], 2),
                avg_confidence=round(row["avg_confidence"], 2),
            )
        finally:
            conn.close()
    def get_daily_metrics(
        self, days: int = 7, strategy_name: str | None = None
    ) -> list[StrategyMetrics]:
        """Get daily performance metrics for the last N days.
        Args:
            days: Number of days to retrieve (default 7)
            strategy_name: Name of the strategy (None = all strategies)
        Returns:
            List of StrategyMetrics, one per day
        """
        metrics = []
        end_date = datetime.now(UTC)
        for i in range(days):
            day_end = end_date - timedelta(days=i)
            day_start = day_end - timedelta(days=1)
            day_metrics = self.get_strategy_metrics(
                strategy_name=strategy_name,
                start_date=day_start.isoformat(),
                end_date=day_end.isoformat(),
            )
            metrics.append(day_metrics)
        return metrics
    def get_weekly_metrics(
        self, weeks: int = 4, strategy_name: str | None = None
    ) -> list[StrategyMetrics]:
        """Get weekly performance metrics for the last N weeks.
        Args:
            weeks: Number of weeks to retrieve (default 4)
            strategy_name: Name of the strategy (None = all strategies)
        Returns:
            List of StrategyMetrics, one per week
        """
        metrics = []
        end_date = datetime.now(UTC)
        for i in range(weeks):
            week_end = end_date - timedelta(weeks=i)
            week_start = week_end - timedelta(weeks=1)
            week_metrics = self.get_strategy_metrics(
                strategy_name=strategy_name,
                start_date=week_start.isoformat(),
                end_date=week_end.isoformat(),
            )
            metrics.append(week_metrics)
        return metrics
    def calculate_improvement_trend(
        self, metrics_history: list[StrategyMetrics]
    ) -> dict[str, Any]:
        """Calculate improvement trend from historical metrics.
        Args:
            metrics_history: List of StrategyMetrics ordered from oldest to newest
        Returns:
            Dictionary with trend analysis
        """
        if len(metrics_history) < 2:
            return {
                "trend": "insufficient_data",
                "win_rate_change": 0.0,
                "pnl_change": 0.0,
                "confidence_change": 0.0,
            }
        oldest = metrics_history[0]
        newest = metrics_history[-1]
        win_rate_change = newest.win_rate - oldest.win_rate
        pnl_change = newest.avg_pnl - oldest.avg_pnl
        confidence_change = newest.avg_confidence - oldest.avg_confidence
        # Determine overall trend
        if win_rate_change > 5.0 and pnl_change > 0:
            trend = "improving"
        elif win_rate_change < -5.0 or pnl_change < 0:
            trend = "declining"
        else:
            trend = "stable"
        return {
            "trend": trend,
            "win_rate_change": round(win_rate_change, 2),
            "pnl_change": round(pnl_change, 2),
            "confidence_change": round(confidence_change, 2),
            "period_count": len(metrics_history),
        }
    def generate_dashboard(
        self, strategy_name: str | None = None
    ) -> PerformanceDashboard:
        """Generate a comprehensive performance dashboard.
        Args:
            strategy_name: Name of the strategy (None = all strategies)
        Returns:
            PerformanceDashboard with all metrics
        """
        # Get overall metrics
        overall_metrics = self.get_strategy_metrics(strategy_name=strategy_name)
        # Get daily metrics (last 7 days)
        daily_metrics = self.get_daily_metrics(days=7, strategy_name=strategy_name)
        # Get weekly metrics (last 4 weeks)
        weekly_metrics = self.get_weekly_metrics(weeks=4, strategy_name=strategy_name)
        # Calculate improvement trend
        improvement_trend = self.calculate_improvement_trend(weekly_metrics[::-1])
        return PerformanceDashboard(
            generated_at=datetime.now(UTC).isoformat(),
            overall_metrics=overall_metrics,
            daily_metrics=daily_metrics,
            weekly_metrics=weekly_metrics,
            improvement_trend=improvement_trend,
        )
    def export_dashboard_json(
        self, dashboard: PerformanceDashboard
    ) -> str:
        """Export dashboard as JSON string.
        Args:
            dashboard: PerformanceDashboard object
        Returns:
            JSON string representation
        """
        data = {
            "generated_at": dashboard.generated_at,
            "overall_metrics": asdict(dashboard.overall_metrics),
            "daily_metrics": [asdict(m) for m in dashboard.daily_metrics],
            "weekly_metrics": [asdict(m) for m in dashboard.weekly_metrics],
            "improvement_trend": dashboard.improvement_trend,
        }
        return json.dumps(data, indent=2)
    def log_dashboard(self, dashboard: PerformanceDashboard) -> None:
        """Log dashboard summary to logger.
        Args:
            dashboard: PerformanceDashboard object
        """
        logger.info("=" * 60)
        logger.info("PERFORMANCE DASHBOARD")
        logger.info("=" * 60)
        logger.info("Generated: %s", dashboard.generated_at)
        logger.info("")
        logger.info("Overall Performance:")
        logger.info("  Total Trades: %d", dashboard.overall_metrics.total_trades)
        logger.info("  Win Rate: %.2f%%", dashboard.overall_metrics.win_rate)
        logger.info("  Average P&L: %.2f", dashboard.overall_metrics.avg_pnl)
        logger.info("  Total P&L: %.2f", dashboard.overall_metrics.total_pnl)
        logger.info("")
        logger.info("Improvement Trend (%s):", dashboard.improvement_trend["trend"])
        logger.info("  Win Rate Change: %+.2f%%", dashboard.improvement_trend["win_rate_change"])
        logger.info("  P&L Change: %+.2f", dashboard.improvement_trend["pnl_change"])
        logger.info("=" * 60)
--- a/src/logging/init.py
+++ b/src/logging/init.py
@@ -0,0 +1,5 @@
 """Decision logging and audit trail for trade decisions."""
 from src.logging.decision_logger import DecisionLog, DecisionLogger
 __all__ = ["DecisionLog", "DecisionLogger"]
--- a/src/logging/decision_logger.py
+++ b/src/logging/decision_logger.py
@@ -0,0 +1,235 @@
 """Decision logging system with context snapshots for comprehensive audit trail."""
 from __future__ import annotations
 import json
 import sqlite3
 import uuid
 from dataclasses import dataclass
 from datetime import UTC, datetime
 from typing import Any
@dataclass
 class DecisionLog:
    """A logged trading decision with context and outcome."""
    decision_id: str
    timestamp: str
    stock_code: str
    market: str
    exchange_code: str
    action: str
    confidence: int
    rationale: str
    context_snapshot: dict[str, Any]
    input_data: dict[str, Any]
    outcome_pnl: float | None = None
    outcome_accuracy: int | None = None
    reviewed: bool = False
    review_notes: str | None = None
 class DecisionLogger:
    """Logs trading decisions with full context for review and evolution."""
    def __init__(self, conn: sqlite3.Connection) -> None:
        """Initialize the decision logger with a database connection."""
        self.conn = conn
    def log_decision(
        self,
        stock_code: str,
        market: str,
        exchange_code: str,
        action: str,
        confidence: int,
        rationale: str,
        context_snapshot: dict[str, Any],
        input_data: dict[str, Any],
    ) -> str:
        """Log a trading decision with full context.
        Args:
            stock_code: Stock symbol
            market: Market code (e.g., "KR", "US_NASDAQ")
            exchange_code: Exchange code (e.g., "KRX", "NASDAQ")
            action: Trading action (BUY/SELL/HOLD)
            confidence: Confidence level (0-100)
            rationale: Reasoning for the decision
            context_snapshot: L1-L7 context snapshot at decision time
            input_data: Market data inputs (price, volume, orderbook, etc.)
        Returns:
            decision_id: Unique identifier for this decision
        """
        decision_id = str(uuid.uuid4())
        timestamp = datetime.now(UTC).isoformat()
        self.conn.execute(
            """
            INSERT INTO decision_logs (
                decision_id, timestamp, stock_code, market, exchange_code,
                action, confidence, rationale, context_snapshot, input_data
            )
            VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
            """,
            (
                decision_id,
                timestamp,
                stock_code,
                market,
                exchange_code,
                action,
                confidence,
                rationale,
                json.dumps(context_snapshot),
                json.dumps(input_data),
            ),
        )
        self.conn.commit()
        return decision_id
    def get_unreviewed_decisions(
        self, min_confidence: int = 80, limit: int | None = None
    ) -> list[DecisionLog]:
        """Get unreviewed decisions with high confidence.
        Args:
            min_confidence: Minimum confidence threshold (default 80)
            limit: Maximum number of results (None = unlimited)
        Returns:
            List of unreviewed DecisionLog objects
        """
        query = """
            SELECT
                decision_id, timestamp, stock_code, market, exchange_code,
                action, confidence, rationale, context_snapshot, input_data,
                outcome_pnl, outcome_accuracy, reviewed, review_notes
            FROM decision_logs
            WHERE reviewed = 0 AND confidence >= ?
            ORDER BY timestamp DESC
        """
        if limit is not None:
            query += f" LIMIT {limit}"
        cursor = self.conn.execute(query, (min_confidence,))
        return [self._row_to_decision_log(row) for row in cursor.fetchall()]
    def mark_reviewed(self, decision_id: str, notes: str) -> None:
        """Mark a decision as reviewed with notes.
        Args:
            decision_id: Decision identifier
            notes: Review notes and insights
        """
        self.conn.execute(
            """
            UPDATE decision_logs
            SET reviewed = 1, review_notes = ?
            WHERE decision_id = ?
            """,
            (notes, decision_id),
        )
        self.conn.commit()
    def update_outcome(
        self, decision_id: str, pnl: float, accuracy: int
    ) -> None:
        """Update the outcome of a decision after trade execution.
        Args:
            decision_id: Decision identifier
            pnl: Actual profit/loss realized
            accuracy: 1 if decision was correct, 0 if wrong
        """
        self.conn.execute(
            """
            UPDATE decision_logs
            SET outcome_pnl = ?, outcome_accuracy = ?
            WHERE decision_id = ?
            """,
            (pnl, accuracy, decision_id),
        )
        self.conn.commit()
    def get_decision_by_id(self, decision_id: str) -> DecisionLog | None:
        """Get a specific decision by ID.
        Args:
            decision_id: Decision identifier
        Returns:
            DecisionLog object or None if not found
        """
        cursor = self.conn.execute(
            """
            SELECT
                decision_id, timestamp, stock_code, market, exchange_code,
                action, confidence, rationale, context_snapshot, input_data,
                outcome_pnl, outcome_accuracy, reviewed, review_notes
            FROM decision_logs
            WHERE decision_id = ?
            """,
            (decision_id,),
        )
        row = cursor.fetchone()
        return self._row_to_decision_log(row) if row else None
    def get_losing_decisions(
        self, min_confidence: int = 80, min_loss: float = -100.0
    ) -> list[DecisionLog]:
        """Get high-confidence decisions that resulted in losses.
        Useful for identifying patterns in failed predictions.
        Args:
            min_confidence: Minimum confidence threshold (default 80)
            min_loss: Minimum loss amount (default -100.0, i.e., loss >= 100)
        Returns:
            List of losing DecisionLog objects
        """
        cursor = self.conn.execute(
            """
            SELECT
                decision_id, timestamp, stock_code, market, exchange_code,
                action, confidence, rationale, context_snapshot, input_data,
                outcome_pnl, outcome_accuracy, reviewed, review_notes
            FROM decision_logs
            WHERE confidence >= ?
              AND outcome_pnl IS NOT NULL
              AND outcome_pnl <= ?
            ORDER BY outcome_pnl ASC
            """,
            (min_confidence, min_loss),
        )
        return [self._row_to_decision_log(row) for row in cursor.fetchall()]
    def _row_to_decision_log(self, row: tuple[Any, ...]) -> DecisionLog:
        """Convert a database row to a DecisionLog object.
        Args:
            row: Database row tuple
        Returns:
            DecisionLog object
        """
        return DecisionLog(
            decision_id=row[0],
            timestamp=row[1],
            stock_code=row[2],
            market=row[3],
            exchange_code=row[4],
            action=row[5],
            confidence=row[6],
            rationale=row[7],
            context_snapshot=json.loads(row[8]),
            input_data=json.loads(row[9]),
            outcome_pnl=row[10],
            outcome_accuracy=row[11],
            reviewed=bool(row[12]),
            review_notes=row[13],
        )
--- a/src/main.py
+++ b/src/main.py
@@ -19,6 +19,7 @@ from src.broker.overseas import OverseasBroker
 from src.config import Settings
 from src.core.risk_manager import CircuitBreakerTripped, RiskManager
 from src.db import init_db, log_trade
 from src.logging.decision_logger import DecisionLogger
 from src.logging_config import setup_logging
 from src.markets.schedule import MarketInfo, get_next_market_open, get_open_markets
@@ -42,6 +43,7 @@ async def trading_cycle(
    brain: GeminiClient,
    risk: RiskManager,
    db_conn: Any,
    decision_logger: DecisionLogger,
    market: MarketInfo,
    stock_code: str,
 ) -> None:
@@ -101,6 +103,39 @@ async def trading_cycle(
        decision.confidence,
    )
    # 2.5. Log decision with context snapshot
    context_snapshot = {
        "L1": {
            "current_price": current_price,
            "foreigner_net": foreigner_net,
        },
        "L2": {
            "total_eval": total_eval,
            "total_cash": total_cash,
            "purchase_total": purchase_total,
            "pnl_pct": pnl_pct,
        },
        # L3-L7 will be populated when context tree is implemented
    }
    input_data = {
        "current_price": current_price,
        "foreigner_net": foreigner_net,
        "total_eval": total_eval,
        "total_cash": total_cash,
        "pnl_pct": pnl_pct,
    }
    decision_logger.log_decision(
        stock_code=stock_code,
        market=market.code,
        exchange_code=market.exchange_code,
        action=decision.action,
        confidence=decision.confidence,
        rationale=decision.rationale,
        context_snapshot=context_snapshot,
        input_data=input_data,
    )
    # 3. Execute if actionable
    if decision.action in ("BUY", "SELL"):
        # Determine order size (simplified: 1 lot)
@@ -151,6 +186,7 @@ async def run(settings: Settings) -> None:
    brain = GeminiClient(settings)
    risk = RiskManager(settings)
    db_conn = init_db(settings.DB_PATH)
    decision_logger = DecisionLogger(db_conn)
    shutdown = asyncio.Event()
@@ -218,6 +254,7 @@ async def run(settings: Settings) -> None:
                                brain,
                                risk,
                                db_conn,
                                decision_logger,
                                market,
                                stock_code,
                            )
--- a/tests/test_decision_logger.py
+++ b/tests/test_decision_logger.py
@@ -0,0 +1,292 @@
 """Tests for decision logging and audit trail."""
 from __future__ import annotations
 import sqlite3
 from datetime import UTC, datetime
 import pytest
 from src.db import init_db
 from src.logging.decision_logger import DecisionLog, DecisionLogger
@pytest.fixture
 def db_conn() -> sqlite3.Connection:
    """Provide an in-memory database with initialized schema."""
    conn = init_db(":memory:")
    return conn
@pytest.fixture
 def logger(db_conn: sqlite3.Connection) -> DecisionLogger:
    """Provide a DecisionLogger instance."""
    return DecisionLogger(db_conn)
 def test_log_decision_creates_record(logger: DecisionLogger, db_conn: sqlite3.Connection) -> None:
    """Test that log_decision creates a database record."""
    context_snapshot = {
        "L1": {"quote": {"price": 100.0, "volume": 1000}},
        "L2": {"orderbook": {"bid": [99.0], "ask": [101.0]}},
    }
    input_data = {"price": 100.0, "volume": 1000, "foreigner_net": 500}
    decision_id = logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=85,
        rationale="Strong upward momentum",
        context_snapshot=context_snapshot,
        input_data=input_data,
    )
    # Verify decision_id is a valid UUID
    assert decision_id is not None
    assert len(decision_id) == 36  # UUID v4 format
    # Verify record exists in database
    cursor = db_conn.execute(
        "SELECT decision_id, action, confidence FROM decision_logs WHERE decision_id = ?",
        (decision_id,),
    )
    row = cursor.fetchone()
    assert row is not None
    assert row[0] == decision_id
    assert row[1] == "BUY"
    assert row[2] == 85
 def test_log_decision_stores_context_snapshot(logger: DecisionLogger) -> None:
    """Test that context snapshot is stored as JSON."""
    context_snapshot = {
        "L1": {"real_time": "data"},
        "L3": {"daily": "aggregate"},
        "L7": {"legacy": "wisdom"},
    }
    input_data = {"price": 50000.0, "volume": 2000}
    decision_id = logger.log_decision(
        stock_code="035420",
        market="KR",
        exchange_code="KRX",
        action="HOLD",
        confidence=75,
        rationale="Waiting for clearer signal",
        context_snapshot=context_snapshot,
        input_data=input_data,
    )
    # Retrieve and verify context snapshot
    decision = logger.get_decision_by_id(decision_id)
    assert decision is not None
    assert decision.context_snapshot == context_snapshot
    assert decision.input_data == input_data
 def test_get_unreviewed_decisions(logger: DecisionLogger) -> None:
    """Test retrieving unreviewed decisions with confidence filter."""
    # Log multiple decisions with varying confidence
    logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=90,
        rationale="High confidence buy",
        context_snapshot={},
        input_data={},
    )
    logger.log_decision(
        stock_code="000660",
        market="KR",
        exchange_code="KRX",
        action="SELL",
        confidence=75,
        rationale="Low confidence sell",
        context_snapshot={},
        input_data={},
    )
    logger.log_decision(
        stock_code="035420",
        market="KR",
        exchange_code="KRX",
        action="HOLD",
        confidence=85,
        rationale="Medium confidence hold",
        context_snapshot={},
        input_data={},
    )
    # Get unreviewed decisions with default threshold (80)
    unreviewed = logger.get_unreviewed_decisions()
    assert len(unreviewed) == 2  # Only confidence >= 80
    assert all(d.confidence >= 80 for d in unreviewed)
    assert all(not d.reviewed for d in unreviewed)
    # Get with lower threshold
    unreviewed_all = logger.get_unreviewed_decisions(min_confidence=70)
    assert len(unreviewed_all) == 3
 def test_mark_reviewed(logger: DecisionLogger) -> None:
    """Test marking a decision as reviewed."""
    decision_id = logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=85,
        rationale="Test decision",
        context_snapshot={},
        input_data={},
    )
    # Initially unreviewed
    decision = logger.get_decision_by_id(decision_id)
    assert decision is not None
    assert not decision.reviewed
    assert decision.review_notes is None
    # Mark as reviewed
    review_notes = "Good decision, captured bullish momentum correctly"
    logger.mark_reviewed(decision_id, review_notes)
    # Verify updated
    decision = logger.get_decision_by_id(decision_id)
    assert decision is not None
    assert decision.reviewed
    assert decision.review_notes == review_notes
    # Should not appear in unreviewed list
    unreviewed = logger.get_unreviewed_decisions()
    assert all(d.decision_id != decision_id for d in unreviewed)
 def test_update_outcome(logger: DecisionLogger) -> None:
    """Test updating decision outcome with P&L and accuracy."""
    decision_id = logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=90,
        rationale="Expecting price increase",
        context_snapshot={},
        input_data={},
    )
    # Initially no outcome
    decision = logger.get_decision_by_id(decision_id)
    assert decision is not None
    assert decision.outcome_pnl is None
    assert decision.outcome_accuracy is None
    # Update outcome (profitable trade)
    logger.update_outcome(decision_id, pnl=5000.0, accuracy=1)
    # Verify updated
    decision = logger.get_decision_by_id(decision_id)
    assert decision is not None
    assert decision.outcome_pnl == 5000.0
    assert decision.outcome_accuracy == 1
 def test_get_losing_decisions(logger: DecisionLogger) -> None:
    """Test retrieving high-confidence losing decisions."""
    # Profitable decision
    id1 = logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=85,
        rationale="Correct prediction",
        context_snapshot={},
        input_data={},
    )
    logger.update_outcome(id1, pnl=3000.0, accuracy=1)
    # High-confidence loss
    id2 = logger.log_decision(
        stock_code="000660",
        market="KR",
        exchange_code="KRX",
        action="SELL",
        confidence=90,
        rationale="Wrong prediction",
        context_snapshot={},
        input_data={},
    )
    logger.update_outcome(id2, pnl=-2000.0, accuracy=0)
    # Low-confidence loss (should be ignored)
    id3 = logger.log_decision(
        stock_code="035420",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=70,
        rationale="Low confidence, wrong",
        context_snapshot={},
        input_data={},
    )
    logger.update_outcome(id3, pnl=-1500.0, accuracy=0)
    # Get high-confidence losing decisions
    losers = logger.get_losing_decisions(min_confidence=80, min_loss=-1000.0)
    assert len(losers) == 1
    assert losers[0].decision_id == id2
    assert losers[0].outcome_pnl == -2000.0
    assert losers[0].confidence == 90
 def test_get_decision_by_id_not_found(logger: DecisionLogger) -> None:
    """Test that get_decision_by_id returns None for non-existent ID."""
    decision = logger.get_decision_by_id("non-existent-uuid")
    assert decision is None
 def test_unreviewed_limit(logger: DecisionLogger) -> None:
    """Test that get_unreviewed_decisions respects limit parameter."""
    # Create 5 unreviewed decisions
    for i in range(5):
        logger.log_decision(
            stock_code=f"00{i}",
            market="KR",
            exchange_code="KRX",
            action="HOLD",
            confidence=85,
            rationale=f"Decision {i}",
            context_snapshot={},
            input_data={},
        )
    # Get only 3
    unreviewed = logger.get_unreviewed_decisions(limit=3)
    assert len(unreviewed) == 3
 def test_decision_log_dataclass() -> None:
    """Test DecisionLog dataclass creation."""
    now = datetime.now(UTC).isoformat()
    log = DecisionLog(
        decision_id="test-uuid",
        timestamp=now,
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=85,
        rationale="Test",
        context_snapshot={"L1": "data"},
        input_data={"price": 100.0},
    )
    assert log.decision_id == "test-uuid"
    assert log.action == "BUY"
    assert log.confidence == 85
    assert log.reviewed is False
    assert log.outcome_pnl is None
--- a/tests/test_evolution.py
+++ b/tests/test_evolution.py
@@ -0,0 +1,686 @@
 """Tests for the Evolution Engine components.
 Tests cover:
 - EvolutionOptimizer: failure analysis and strategy generation
 - ABTester: A/B testing and statistical comparison
 - PerformanceTracker: metrics tracking and dashboard
 """
 from __future__ import annotations
 import json
 import sqlite3
 import tempfile
 from datetime import UTC, datetime, timedelta
 from pathlib import Path
 from unittest.mock import AsyncMock, MagicMock, Mock, patch
 import pytest
 from src.config import Settings
 from src.db import init_db, log_trade
 from src.evolution.ab_test import ABTester, ABTestResult, StrategyPerformance
 from src.evolution.optimizer import EvolutionOptimizer
 from src.evolution.performance_tracker import (
    PerformanceDashboard,
    PerformanceTracker,
    StrategyMetrics,
 )
 from src.logging.decision_logger import DecisionLogger
 # ------------------------------------------------------------------
 # Fixtures
 # ------------------------------------------------------------------
@pytest.fixture
 def db_conn() -> sqlite3.Connection:
    """Provide an in-memory database with initialized schema."""
    return init_db(":memory:")
@pytest.fixture
 def settings() -> Settings:
    """Provide test settings."""
    return Settings(
        KIS_APP_KEY="test_key",
        KIS_APP_SECRET="test_secret",
        KIS_ACCOUNT_NO="12345678-01",
        GEMINI_API_KEY="test_gemini_key",
        GEMINI_MODEL="gemini-pro",
        DB_PATH=":memory:",
    )
@pytest.fixture
 def optimizer(settings: Settings) -> EvolutionOptimizer:
    """Provide an EvolutionOptimizer instance."""
    return EvolutionOptimizer(settings)
@pytest.fixture
 def decision_logger(db_conn: sqlite3.Connection) -> DecisionLogger:
    """Provide a DecisionLogger instance."""
    return DecisionLogger(db_conn)
@pytest.fixture
 def ab_tester() -> ABTester:
    """Provide an ABTester instance."""
    return ABTester(significance_level=0.05)
@pytest.fixture
 def performance_tracker(settings: Settings) -> PerformanceTracker:
    """Provide a PerformanceTracker instance."""
    return PerformanceTracker(db_path=":memory:")
 # ------------------------------------------------------------------
 # EvolutionOptimizer Tests
 # ------------------------------------------------------------------
 def test_analyze_failures_uses_decision_logger(optimizer: EvolutionOptimizer) -> None:
    """Test that analyze_failures uses DecisionLogger.get_losing_decisions()."""
    # Add some losing decisions to the database
    logger = optimizer._decision_logger
    # High-confidence loss
    id1 = logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=85,
        rationale="Expected growth",
        context_snapshot={"L1": {"price": 70000}},
        input_data={"price": 70000, "volume": 1000},
    )
    logger.update_outcome(id1, pnl=-2000.0, accuracy=0)
    # Another high-confidence loss
    id2 = logger.log_decision(
        stock_code="000660",
        market="KR",
        exchange_code="KRX",
        action="SELL",
        confidence=90,
        rationale="Expected drop",
        context_snapshot={"L1": {"price": 100000}},
        input_data={"price": 100000, "volume": 500},
    )
    logger.update_outcome(id2, pnl=-1500.0, accuracy=0)
    # Low-confidence loss (should be ignored)
    id3 = logger.log_decision(
        stock_code="035420",
        market="KR",
        exchange_code="KRX",
        action="HOLD",
        confidence=70,
        rationale="Uncertain",
        context_snapshot={},
        input_data={},
    )
    logger.update_outcome(id3, pnl=-500.0, accuracy=0)
    # Analyze failures
    failures = optimizer.analyze_failures(limit=10)
    # Should get 2 failures (confidence >= 80)
    assert len(failures) == 2
    assert all(f["confidence"] >= 80 for f in failures)
    assert all(f["outcome_pnl"] <= -100.0 for f in failures)
 def test_analyze_failures_empty_database(optimizer: EvolutionOptimizer) -> None:
    """Test analyze_failures with no losing decisions."""
    failures = optimizer.analyze_failures()
    assert failures == []
 def test_identify_failure_patterns(optimizer: EvolutionOptimizer) -> None:
    """Test identification of failure patterns."""
    failures = [
        {
            "decision_id": "1",
            "timestamp": "2024-01-15T09:30:00+00:00",
            "stock_code": "005930",
            "market": "KR",
            "exchange_code": "KRX",
            "action": "BUY",
            "confidence": 85,
            "rationale": "Test",
            "outcome_pnl": -1000.0,
            "outcome_accuracy": 0,
            "context_snapshot": {},
            "input_data": {},
        },
        {
            "decision_id": "2",
            "timestamp": "2024-01-15T14:30:00+00:00",
            "stock_code": "000660",
            "market": "KR",
            "exchange_code": "KRX",
            "action": "SELL",
            "confidence": 90,
            "rationale": "Test",
            "outcome_pnl": -2000.0,
            "outcome_accuracy": 0,
            "context_snapshot": {},
            "input_data": {},
        },
        {
            "decision_id": "3",
            "timestamp": "2024-01-15T09:45:00+00:00",
            "stock_code": "035420",
            "market": "US_NASDAQ",
            "exchange_code": "NASDAQ",
            "action": "BUY",
            "confidence": 80,
            "rationale": "Test",
            "outcome_pnl": -500.0,
            "outcome_accuracy": 0,
            "context_snapshot": {},
            "input_data": {},
        },
    ]
    patterns = optimizer.identify_failure_patterns(failures)
    assert patterns["total_failures"] == 3
    assert patterns["markets"]["KR"] == 2
    assert patterns["markets"]["US_NASDAQ"] == 1
    assert patterns["actions"]["BUY"] == 2
    assert patterns["actions"]["SELL"] == 1
    assert 9 in patterns["hours"]  # 09:30 and 09:45
    assert 14 in patterns["hours"]  # 14:30
    assert patterns["avg_confidence"] == 85.0
    assert patterns["avg_loss"] == -1166.67
 def test_identify_failure_patterns_empty(optimizer: EvolutionOptimizer) -> None:
    """Test pattern identification with no failures."""
    patterns = optimizer.identify_failure_patterns([])
    assert patterns["pattern_count"] == 0
    assert patterns["patterns"] == {}
@pytest.mark.asyncio
 async def test_generate_strategy_creates_file(optimizer: EvolutionOptimizer, tmp_path: Path) -> None:
    """Test that generate_strategy creates a strategy file."""
    failures = [
        {
            "decision_id": "1",
            "timestamp": "2024-01-15T09:30:00+00:00",
            "stock_code": "005930",
            "market": "KR",
            "action": "BUY",
            "confidence": 85,
            "outcome_pnl": -1000.0,
            "context_snapshot": {},
            "input_data": {},
        }
    ]
    # Mock Gemini response
    mock_response = Mock()
    mock_response.text = """
    # Simple strategy
    price = market_data.get("current_price", 0)
    if price > 50000:
        return {"action": "BUY", "confidence": 70, "rationale": "Price above threshold"}
    return {"action": "HOLD", "confidence": 50, "rationale": "Waiting"}
    """
    with patch.object(optimizer._client.aio.models, "generate_content", new=AsyncMock(return_value=mock_response)):
        with patch("src.evolution.optimizer.STRATEGIES_DIR", tmp_path):
            strategy_path = await optimizer.generate_strategy(failures)
    assert strategy_path is not None
    assert strategy_path.exists()
    assert strategy_path.suffix == ".py"
    assert "class Strategy_" in strategy_path.read_text()
    assert "def evaluate" in strategy_path.read_text()
@pytest.mark.asyncio
 async def test_generate_strategy_handles_api_error(optimizer: EvolutionOptimizer) -> None:
    """Test that generate_strategy handles Gemini API errors gracefully."""
    failures = [{"decision_id": "1", "timestamp": "2024-01-15T09:30:00+00:00"}]
    with patch.object(
        optimizer._client.aio.models,
        "generate_content",
        side_effect=Exception("API Error"),
    ):
        strategy_path = await optimizer.generate_strategy(failures)
    assert strategy_path is None
 def test_get_performance_summary() -> None:
    """Test getting performance summary from trades table."""
    # Create a temporary database with trades
    import tempfile
    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
        tmp_path = tmp.name
    conn = init_db(tmp_path)
    log_trade(conn, "005930", "BUY", 85, "Test win", quantity=10, price=70000, pnl=1000.0)
    log_trade(conn, "000660", "SELL", 90, "Test loss", quantity=5, price=100000, pnl=-500.0)
    log_trade(conn, "035420", "BUY", 80, "Test win", quantity=8, price=50000, pnl=800.0)
    conn.close()
    # Create settings with temp database path
    settings = Settings(
        KIS_APP_KEY="test_key",
        KIS_APP_SECRET="test_secret",
        KIS_ACCOUNT_NO="12345678-01",
        GEMINI_API_KEY="test_gemini_key",
        GEMINI_MODEL="gemini-pro",
        DB_PATH=tmp_path,
    )
    optimizer = EvolutionOptimizer(settings)
    summary = optimizer.get_performance_summary()
    assert summary["total_trades"] == 3
    assert summary["wins"] == 2
    assert summary["losses"] == 1
    assert summary["total_pnl"] == 1300.0
    assert summary["avg_pnl"] == 433.33
    # Clean up
    Path(tmp_path).unlink()
 def test_validate_strategy_success(optimizer: EvolutionOptimizer, tmp_path: Path) -> None:
    """Test strategy validation when tests pass."""
    strategy_file = tmp_path / "test_strategy.py"
    strategy_file.write_text("# Valid strategy file")
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = Mock(returncode=0, stdout="", stderr="")
        result = optimizer.validate_strategy(strategy_file)
    assert result is True
    assert strategy_file.exists()
 def test_validate_strategy_failure(optimizer: EvolutionOptimizer, tmp_path: Path) -> None:
    """Test strategy validation when tests fail."""
    strategy_file = tmp_path / "test_strategy.py"
    strategy_file.write_text("# Invalid strategy file")
    with patch("subprocess.run") as mock_run:
        mock_run.return_value = Mock(returncode=1, stdout="FAILED", stderr="")
        result = optimizer.validate_strategy(strategy_file)
    assert result is False
    # File should be deleted on failure
    assert not strategy_file.exists()
 # ------------------------------------------------------------------
 # ABTester Tests
 # ------------------------------------------------------------------
 def test_calculate_performance_basic(ab_tester: ABTester) -> None:
    """Test basic performance calculation."""
    trades = [
        {"pnl": 1000.0},
        {"pnl": -500.0},
        {"pnl": 800.0},
        {"pnl": 200.0},
    ]
    perf = ab_tester.calculate_performance(trades, "TestStrategy")
    assert perf.strategy_name == "TestStrategy"
    assert perf.total_trades == 4
    assert perf.wins == 3
    assert perf.losses == 1
    assert perf.total_pnl == 1500.0
    assert perf.avg_pnl == 375.0
    assert perf.win_rate == 75.0
    assert perf.sharpe_ratio is not None
 def test_calculate_performance_empty(ab_tester: ABTester) -> None:
    """Test performance calculation with no trades."""
    perf = ab_tester.calculate_performance([], "EmptyStrategy")
    assert perf.total_trades == 0
    assert perf.wins == 0
    assert perf.losses == 0
    assert perf.total_pnl == 0.0
    assert perf.avg_pnl == 0.0
    assert perf.win_rate == 0.0
    assert perf.sharpe_ratio is None
 def test_compare_strategies_significant_difference(ab_tester: ABTester) -> None:
    """Test strategy comparison with significant performance difference."""
    # Strategy A: consistently profitable
    trades_a = [{"pnl": 1000.0} for _ in range(30)]
    # Strategy B: consistently losing
    trades_b = [{"pnl": -500.0} for _ in range(30)]
    result = ab_tester.compare_strategies(trades_a, trades_b, "Strategy A", "Strategy B")
    # scipy returns np.True_ instead of Python bool
    assert bool(result.is_significant) is True
    assert result.winner == "Strategy A"
    assert result.p_value < 0.05
    assert result.performance_a.avg_pnl > result.performance_b.avg_pnl
 def test_compare_strategies_no_difference(ab_tester: ABTester) -> None:
    """Test strategy comparison with no significant difference."""
    # Both strategies have similar performance
    trades_a = [{"pnl": 100.0}, {"pnl": -50.0}, {"pnl": 80.0}]
    trades_b = [{"pnl": 90.0}, {"pnl": -60.0}, {"pnl": 85.0}]
    result = ab_tester.compare_strategies(trades_a, trades_b, "Strategy A", "Strategy B")
    # With small samples and similar performance, likely not significant
    assert result.winner is None or not result.is_significant
 def test_should_deploy_meets_criteria(ab_tester: ABTester) -> None:
    """Test deployment decision when criteria are met."""
    # Create a winning result that meets criteria
    trades_a = [{"pnl": 1000.0} for _ in range(25)]  # 100% win rate
    trades_b = [{"pnl": -500.0} for _ in range(25)]
    result = ab_tester.compare_strategies(trades_a, trades_b, "Winner", "Loser")
    should_deploy = ab_tester.should_deploy(result, min_win_rate=60.0, min_trades=20)
    assert should_deploy is True
 def test_should_deploy_insufficient_trades(ab_tester: ABTester) -> None:
    """Test deployment decision with insufficient trades."""
    trades_a = [{"pnl": 1000.0} for _ in range(10)]  # Only 10 trades
    trades_b = [{"pnl": -500.0} for _ in range(10)]
    result = ab_tester.compare_strategies(trades_a, trades_b, "Winner", "Loser")
    should_deploy = ab_tester.should_deploy(result, min_win_rate=60.0, min_trades=20)
    assert should_deploy is False
 def test_should_deploy_low_win_rate(ab_tester: ABTester) -> None:
    """Test deployment decision with low win rate."""
    # Mix of wins and losses, below 60% win rate
    trades_a = [{"pnl": 100.0}] * 10 + [{"pnl": -100.0}] * 15  # 40% win rate
    trades_b = [{"pnl": -500.0} for _ in range(25)]
    result = ab_tester.compare_strategies(trades_a, trades_b, "LowWinner", "Loser")
    should_deploy = ab_tester.should_deploy(result, min_win_rate=60.0, min_trades=20)
    assert should_deploy is False
 def test_should_deploy_not_significant(ab_tester: ABTester) -> None:
    """Test deployment decision when difference is not significant."""
    # Use more varied data to ensure statistical insignificance
    trades_a = [{"pnl": 100.0}, {"pnl": -50.0}] * 12 + [{"pnl": 100.0}]
    trades_b = [{"pnl": 95.0}, {"pnl": -45.0}] * 12 + [{"pnl": 95.0}]
    result = ab_tester.compare_strategies(trades_a, trades_b, "A", "B")
    should_deploy = ab_tester.should_deploy(result, min_win_rate=60.0, min_trades=20)
    # Not significant or not profitable enough
    # Even if significant, win rate is 50% which is below 60% threshold
    assert should_deploy is False
 # ------------------------------------------------------------------
 # PerformanceTracker Tests
 # ------------------------------------------------------------------
 def test_get_strategy_metrics(db_conn: sqlite3.Connection) -> None:
    """Test getting strategy metrics."""
    # Add some trades
    log_trade(db_conn, "005930", "BUY", 85, "Win 1", quantity=10, price=70000, pnl=1000.0)
    log_trade(db_conn, "000660", "SELL", 90, "Loss 1", quantity=5, price=100000, pnl=-500.0)
    log_trade(db_conn, "035420", "BUY", 80, "Win 2", quantity=8, price=50000, pnl=800.0)
    log_trade(db_conn, "005930", "HOLD", 75, "Hold", quantity=0, price=70000, pnl=0.0)
    tracker = PerformanceTracker(db_path=":memory:")
    # Manually set connection for testing
    tracker._db_path = db_conn
    # Need to use the same connection
    with patch("sqlite3.connect", return_value=db_conn):
        metrics = tracker.get_strategy_metrics()
    assert metrics.total_trades == 4
    assert metrics.wins == 2
    assert metrics.losses == 1
    assert metrics.holds == 1
    assert metrics.win_rate == 50.0
    assert metrics.total_pnl == 1300.0
 def test_calculate_improvement_trend_improving(performance_tracker: PerformanceTracker) -> None:
    """Test improvement trend calculation for improving strategy."""
    metrics = [
        StrategyMetrics(
            strategy_name="test",
            period_start="2024-01-01",
            period_end="2024-01-07",
            total_trades=10,
            wins=5,
            losses=5,
            holds=0,
            win_rate=50.0,
            avg_pnl=100.0,
            total_pnl=1000.0,
            best_trade=500.0,
            worst_trade=-300.0,
            avg_confidence=75.0,
        ),
        StrategyMetrics(
            strategy_name="test",
            period_start="2024-01-08",
            period_end="2024-01-14",
            total_trades=10,
            wins=7,
            losses=3,
            holds=0,
            win_rate=70.0,
            avg_pnl=200.0,
            total_pnl=2000.0,
            best_trade=600.0,
            worst_trade=-200.0,
            avg_confidence=80.0,
        ),
    ]
    trend = performance_tracker.calculate_improvement_trend(metrics)
    assert trend["trend"] == "improving"
    assert trend["win_rate_change"] == 20.0
    assert trend["pnl_change"] == 100.0
    assert trend["confidence_change"] == 5.0
 def test_calculate_improvement_trend_declining(performance_tracker: PerformanceTracker) -> None:
    """Test improvement trend calculation for declining strategy."""
    metrics = [
        StrategyMetrics(
            strategy_name="test",
            period_start="2024-01-01",
            period_end="2024-01-07",
            total_trades=10,
            wins=7,
            losses=3,
            holds=0,
            win_rate=70.0,
            avg_pnl=200.0,
            total_pnl=2000.0,
            best_trade=600.0,
            worst_trade=-200.0,
            avg_confidence=80.0,
        ),
        StrategyMetrics(
            strategy_name="test",
            period_start="2024-01-08",
            period_end="2024-01-14",
            total_trades=10,
            wins=4,
            losses=6,
            holds=0,
            win_rate=40.0,
            avg_pnl=-50.0,
            total_pnl=-500.0,
            best_trade=300.0,
            worst_trade=-400.0,
            avg_confidence=70.0,
        ),
    ]
    trend = performance_tracker.calculate_improvement_trend(metrics)
    assert trend["trend"] == "declining"
    assert trend["win_rate_change"] == -30.0
    assert trend["pnl_change"] == -250.0
 def test_calculate_improvement_trend_insufficient_data(performance_tracker: PerformanceTracker) -> None:
    """Test improvement trend with insufficient data."""
    metrics = [
        StrategyMetrics(
            strategy_name="test",
            period_start="2024-01-01",
            period_end="2024-01-07",
            total_trades=10,
            wins=5,
            losses=5,
            holds=0,
            win_rate=50.0,
            avg_pnl=100.0,
            total_pnl=1000.0,
            best_trade=500.0,
            worst_trade=-300.0,
            avg_confidence=75.0,
        )
    ]
    trend = performance_tracker.calculate_improvement_trend(metrics)
    assert trend["trend"] == "insufficient_data"
    assert trend["win_rate_change"] == 0.0
    assert trend["pnl_change"] == 0.0
 def test_export_dashboard_json(performance_tracker: PerformanceTracker) -> None:
    """Test exporting dashboard as JSON."""
    overall_metrics = StrategyMetrics(
        strategy_name="test",
        period_start="2024-01-01",
        period_end="2024-01-31",
        total_trades=100,
        wins=60,
        losses=40,
        holds=10,
        win_rate=60.0,
        avg_pnl=150.0,
        total_pnl=15000.0,
        best_trade=1000.0,
        worst_trade=-500.0,
        avg_confidence=80.0,
    )
    dashboard = PerformanceDashboard(
        generated_at=datetime.now(UTC).isoformat(),
        overall_metrics=overall_metrics,
        daily_metrics=[],
        weekly_metrics=[],
        improvement_trend={"trend": "improving", "win_rate_change": 10.0},
    )
    json_output = performance_tracker.export_dashboard_json(dashboard)
    # Verify it's valid JSON
    data = json.loads(json_output)
    assert "generated_at" in data
    assert "overall_metrics" in data
    assert data["overall_metrics"]["total_trades"] == 100
    assert data["overall_metrics"]["win_rate"] == 60.0
 def test_generate_dashboard() -> None:
    """Test generating a complete dashboard."""
    # Create tracker with temp database
    with tempfile.NamedTemporaryFile(suffix=".db", delete=False) as tmp:
        tmp_path = tmp.name
    # Initialize with data
    conn = init_db(tmp_path)
    log_trade(conn, "005930", "BUY", 85, "Win", quantity=10, price=70000, pnl=1000.0)
    log_trade(conn, "000660", "SELL", 90, "Loss", quantity=5, price=100000, pnl=-500.0)
    conn.close()
    tracker = PerformanceTracker(db_path=tmp_path)
    dashboard = tracker.generate_dashboard()
    assert isinstance(dashboard, PerformanceDashboard)
    assert dashboard.overall_metrics.total_trades == 2
    assert len(dashboard.daily_metrics) == 7
    assert len(dashboard.weekly_metrics) == 4
    assert "trend" in dashboard.improvement_trend
    # Clean up
    Path(tmp_path).unlink()
 # ------------------------------------------------------------------
 # Integration Tests
 # ------------------------------------------------------------------
@pytest.mark.asyncio
 async def test_full_evolution_pipeline(optimizer: EvolutionOptimizer, tmp_path: Path) -> None:
    """Test the complete evolution pipeline."""
    # Add losing decisions
    logger = optimizer._decision_logger
    id1 = logger.log_decision(
        stock_code="005930",
        market="KR",
        exchange_code="KRX",
        action="BUY",
        confidence=85,
        rationale="Expected growth",
        context_snapshot={},
        input_data={},
    )
    logger.update_outcome(id1, pnl=-2000.0, accuracy=0)
    # Mock Gemini and subprocess
    mock_response = Mock()
    mock_response.text = 'return {"action": "HOLD", "confidence": 50, "rationale": "Test"}'
    with patch.object(optimizer._client.aio.models, "generate_content", new=AsyncMock(return_value=mock_response)):
        with patch("src.evolution.optimizer.STRATEGIES_DIR", tmp_path):
            with patch("subprocess.run") as mock_run:
                mock_run.return_value = Mock(returncode=0, stdout="", stderr="")
                result = await optimizer.evolve()
    assert result is not None
    assert "title" in result
    assert "branch" in result
    assert "status" in result
Author	SHA1	Message	Date
agentson	ae7195c829	feat: implement evolution engine for self-improving strategies Some checks failed CI / test (pull_request) Has been cancelled Details Complete Pillar 4 implementation with comprehensive testing and analysis. Components: - EvolutionOptimizer: Analyzes losing decisions from DecisionLogger, identifies failure patterns (time, market, action), and uses Gemini to generate improved strategies with auto-deployment capability - ABTester: A/B testing framework with statistical significance testing (two-sample t-test), performance comparison, and deployment criteria (>60% win rate, >20 trades minimum) - PerformanceTracker: Tracks strategy win rates, monitors improvement trends over time, generates comprehensive dashboards with daily/weekly metrics and trend analysis Key Features: - Uses DecisionLogger.get_losing_decisions() for failure identification - Pattern analysis: market distribution, action types, time-of-day patterns - Gemini integration for AI-powered strategy generation - Statistical validation using scipy.stats.ttest_ind - Sharpe ratio calculation for risk-adjusted returns - Auto-deploy strategies meeting 60% win rate threshold - Performance dashboard with JSON export capability Testing: - 24 comprehensive tests covering all evolution components - 90% coverage of evolution module (304 lines, 31 missed) - Integration tests for full evolution pipeline - All 105 project tests passing with 72% overall coverage Dependencies: - Added scipy>=1.11,<2 for statistical analysis Closes #19 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-04 16:34:10 +09:00
agentson	2f9efdad64	feat: integrate decision logger with main trading loop Some checks failed CI / test (pull_request) Has been cancelled Details - Add DecisionLogger to main.py trading cycle - Log all decisions with context snapshot (L1-L2 layers) - Capture market data and balance info in context - Add comprehensive tests (9 tests, 100% coverage) - All tests passing (63 total) Implements issue #17 acceptance criteria: - ✅ decision_logs table with proper schema - ✅ DecisionLogger class with all required methods - ✅ Automatic logging in trading loop - ✅ Tests achieve 100% coverage of decision_logger.py - ⚠️ Context snapshot uses L1-L2 data (L3-L7 pending issue #15) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-04 15:47:53 +09:00
agentson	6551d7af79	WIP: Add decision logging infrastructure - Add decision_logs table to database schema - Create decision logger module with comprehensive logging - Prepare for decision tracking and audit trail Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-04 15:47:53 +09:00
jihoson	7515a5a314	Merge pull request 'feat: implement L1-L7 context tree for multi-layered memory management' (#16 ) from feature/issue-15-context-tree into main Some checks failed CI / test (push) Has been cancelled Details Reviewed-on: #16	2026-02-04 15:40:00 +09:00
agentson	254b543c89	Merge main into feature/issue-15-context-tree Some checks failed CI / test (pull_request) Has been cancelled Details Resolved conflicts in CLAUDE.md by: - Keeping main's refactored structure (docs split into separate files) - Added Context Tree documentation link to docs section - Preserved all constraints and guidelines from main Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-04 15:25:13 +09:00
jihoson	2becbddb4a	Merge pull request 'refactor: split CLAUDE.md into focused documentation structure' (#14 ) from feature/issue-13-docs-refactor into main Some checks failed CI / test (push) Has been cancelled Details Reviewed-on: #14	2026-02-04 10:15:09 +09:00
agentson	05e8986ff5	refactor: split CLAUDE.md into focused documentation structure Some checks failed CI / test (pull_request) Has been cancelled Details - Restructure docs into topic-specific files to minimize context - Create docs/workflow.md (Git + Agent workflow) - Create docs/commands.md (Common failures + build commands) - Create docs/architecture.md (System design + data flow) - Create docs/testing.md (Test structure + guidelines) - Rewrite CLAUDE.md as concise hub with links to detailed docs - Update .gitignore to exclude data/ directory Benefits: - Reduced context size for AI assistants - Faster reference lookups - Better maintainability - Topic-focused documentation Closes #13 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>	2026-02-04 10:13:48 +09:00