feat: implement token efficiency optimization for issue #24
Implement comprehensive token efficiency system to reduce LLM costs:
- Add prompt_optimizer.py: Token counting, compression, abbreviations
- Add context_selector.py: Smart L1-L7 context layer selection
- Add summarizer.py: Historical data aggregation and summarization
- Add cache.py: TTL-based response caching with hit rate tracking
- Enhance gemini_client.py: Integrate optimization, caching, metrics
Key features:
- Compressed prompts with abbreviations (40-50% reduction)
- Smart context selection (L7 for normal, L6-L5 for strategic)
- Response caching for HOLD decisions and high-confidence calls
- Token usage tracking and metrics (avg tokens, cache hit rate)
- Comprehensive test coverage (34 tests, 84-93% coverage)
Metrics tracked:
- Total tokens used
- Avg tokens per decision
- Cache hit rate
- Cost per decision
All tests passing (191 total, 76% overall coverage).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>