Implement Token Efficiency - Context optimization #24

Closed
opened 2026-02-04 16:14:29 +09:00 by agentson · 0 comments
Collaborator

Goal

Behavioral Rule: Token Efficiency

Optimize LLM token usage to reduce costs and latency while maintaining decision quality.

Background

투자와 무관한 개인 프로필(잡담)은 제거하여 리소스를 아낄 것.

Tasks

1. Prompt Optimization

  • Remove irrelevant context from prompts
  • Template-based prompts with variable slots
  • Compress historical data before sending
  • Use abbreviations for repeated terms

2. Smart Context Selection

  • Don't send all L1-L7 every time
  • L7 (real-time) for normal decisions
  • L6-L5 (daily/weekly) for strategic decisions
  • L4-L1 (monthly/legacy) only for major events

3. Context Summarization

  • Summarize old context instead of raw data
  • Key metrics only (averages, trends)
  • Rolling window (keep last N days detailed)
  • Aggregate older data

4. Response Caching

  • Cache common Gemini responses
  • TTL-based cache invalidation
  • Cache hit rate monitoring
  • Cache per market conditions

5. Prompt Compression

  • Token counting before API call
  • Automatic truncation if too long
  • Priority-based context inclusion
  • A/B test compressed vs full prompts

6. Metrics and Monitoring

  • Track tokens per decision
  • Cost per decision
  • Identify expensive prompts
  • Optimization opportunities

Implementation Files

  • src/brain/prompt_optimizer.py - Prompt optimization
  • src/brain/context_selector.py - Smart context selection
  • src/brain/cache.py - Response caching
  • src/context/summarizer.py - Context summarization
  • tests/test_token_efficiency.py - Tests

Target Metrics

  • Reduce average tokens/decision by 50%
  • Maintain >85% decision quality
  • Cache hit rate >30%
  • Cost/decision < $0.01

Acceptance Criteria

  • Prompt templates implemented
  • Smart context selection (L1-L7)
  • Context summarization working
  • Response caching with >20% hit rate
  • Token usage reduced by >40%
  • Decision quality maintained
  • Tests achieve ≥80% coverage

Priority

LOW - Cost optimization, not critical for functionality

Behavioral Rule: Token Efficiency
Impacts: Operational costs, latency
Synergy with: Context Tree

## Goal Behavioral Rule: Token Efficiency Optimize LLM token usage to reduce costs and latency while maintaining decision quality. ## Background 투자와 무관한 개인 프로필(잡담)은 제거하여 리소스를 아낄 것. ## Tasks ### 1. Prompt Optimization - Remove irrelevant context from prompts - Template-based prompts with variable slots - Compress historical data before sending - Use abbreviations for repeated terms ### 2. Smart Context Selection - Don't send all L1-L7 every time - L7 (real-time) for normal decisions - L6-L5 (daily/weekly) for strategic decisions - L4-L1 (monthly/legacy) only for major events ### 3. Context Summarization - Summarize old context instead of raw data - Key metrics only (averages, trends) - Rolling window (keep last N days detailed) - Aggregate older data ### 4. Response Caching - Cache common Gemini responses - TTL-based cache invalidation - Cache hit rate monitoring - Cache per market conditions ### 5. Prompt Compression - Token counting before API call - Automatic truncation if too long - Priority-based context inclusion - A/B test compressed vs full prompts ### 6. Metrics and Monitoring - Track tokens per decision - Cost per decision - Identify expensive prompts - Optimization opportunities ## Implementation Files - `src/brain/prompt_optimizer.py` - Prompt optimization - `src/brain/context_selector.py` - Smart context selection - `src/brain/cache.py` - Response caching - `src/context/summarizer.py` - Context summarization - `tests/test_token_efficiency.py` - Tests ## Target Metrics - Reduce average tokens/decision by 50% - Maintain >85% decision quality - Cache hit rate >30% - Cost/decision < $0.01 ## Acceptance Criteria - [ ] Prompt templates implemented - [ ] Smart context selection (L1-L7) - [ ] Context summarization working - [ ] Response caching with >20% hit rate - [ ] Token usage reduced by >40% - [ ] Decision quality maintained - [ ] Tests achieve ≥80% coverage ## Priority **LOW** - Cost optimization, not critical for functionality ## Related Behavioral Rule: Token Efficiency Impacts: Operational costs, latency Synergy with: Context Tree
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: jihoson/The-Ouroboros#24