# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Commands for Development ### Build and Test Commands ```bash # Build the binary go build -o kvs . # Run with default config (auto-generates config.yaml) ./kvs start config.yaml # Run with custom config ./kvs start /path/to/config.yaml # Check running instances ./kvs status # Stop instance ./kvs stop config # Run comprehensive integration tests ./integration_test.sh # Create test conflict data for debugging go run test_conflict.go data1 data2 # Build and test in one go go build -o kvs . && ./integration_test.sh ``` ### Process Management Commands ```bash # Start as background daemon ./kvs start # .yaml extension optional # Stop daemon ./kvs stop # Graceful SIGTERM shutdown # Restart daemon ./kvs restart # Stop then start # Show status ./kvs status # All instances ./kvs status # Specific instance # Run in foreground (for debugging) ./kvs # Logs to stdout, blocks terminal # View daemon logs tail -f ~/.kvs/logs/kvs_.yaml.log # Global state directories ~/.kvs/pids/ # PID files (works from any directory) ~/.kvs/logs/ # Daemon log files ``` ### Development Workflow ```bash # Format and check code go fmt ./... go vet ./... # Run dependencies management go mod tidy # Check build without artifacts go build . # Test specific cluster scenarios ./kvs start node1.yaml ./kvs start node2.yaml # Wait for cluster formation sleep 5 # Test data operations curl -X PUT http://localhost:8081/kv/test/data -H "Content-Type: application/json" -d '{"test":"data"}' curl http://localhost:8082/kv/test/data # Should replicate within ~30 seconds # Check daemon status ./kvs status # View logs tail -f ~/.kvs/logs/kvs_node1.yaml.log # Cleanup ./kvs stop node1 ./kvs stop node2 ``` ## Architecture Overview ### High-Level Structure KVS is a **distributed, eventually consistent key-value store** built around three core systems: 1. **Gossip Protocol** (`cluster/gossip.go`) - Decentralized membership management and failure detection 2. **Merkle Tree Sync** (`cluster/sync.go`, `cluster/merkle.go`) - Efficient data synchronization and conflict resolution 3. **Modular Server** (`server/`) - HTTP API with pluggable feature modules ### Key Architectural Patterns #### Modular Package Design - **`auth/`** - Complete JWT authentication system with POSIX-inspired permissions - **`cluster/`** - Distributed systems logic (gossip, sync, merkle trees) - **`daemon/`** - Process management (daemonization, PID files, lifecycle) - **`storage/`** - BadgerDB abstraction with compression and revision history - **`server/`** - HTTP handlers, routing, and lifecycle management - **`features/`** - Utility functions for TTL, rate limiting, tamper logging, backup - **`types/`** - Centralized type definitions for all components - **`config/`** - Configuration loading with auto-generation - **`utils/`** - Cryptographic hashing utilities #### Core Data Model ```go // Primary storage format type StoredValue struct { UUID string `json:"uuid"` // Unique version identifier Timestamp int64 `json:"timestamp"` // Unix timestamp (milliseconds) Data json.RawMessage `json:"data"` // Actual user JSON payload } ``` #### Critical System Interactions **Conflict Resolution Flow:** 1. Merkle trees detect divergent data between nodes (`cluster/merkle.go`) 2. Sync service fetches conflicting keys (`cluster/sync.go:fetchAndCompareData`) 3. Sophisticated conflict resolution logic in `resolveConflict()`: - Same timestamp → Apply "oldest-node rule" (earliest `joined_timestamp` wins) - Tie-breaker → UUID comparison for deterministic results - Winner's data automatically replicated to losing nodes **Authentication & Authorization:** - JWT tokens with scoped permissions (`auth/jwt.go`) - POSIX-inspired 12-bit permission system (`types/types.go:52-75`) - Resource ownership metadata with TTL support (`types/ResourceMetadata`) **Storage Strategy:** - **Main keys**: Direct path mapping (`users/john/profile`) - **Index keys**: `_ts:{timestamp}:{path}` for time-based queries - **Compression**: Optional ZSTD compression (`storage/compression.go`) - **Revisions**: Optional revision history (`storage/revision.go`) ### Configuration Architecture The system uses feature toggles extensively (`types/Config:271-280`): ```yaml auth_enabled: true # JWT authentication system tamper_logging_enabled: true # Cryptographic audit trail clustering_enabled: true # Gossip protocol and sync rate_limiting_enabled: true # Per-client rate limiting revision_history_enabled: true # Automatic versioning # Anonymous access control (Issue #5 - when auth_enabled: true) allow_anonymous_read: false # Allow unauthenticated read access to KV endpoints allow_anonymous_write: false # Allow unauthenticated write access to KV endpoints ``` **Security Note**: DELETE operations always require authentication when `auth_enabled: true`, regardless of anonymous access settings. ### Testing Strategy #### Integration Test Suite (`integration_test.sh`) - **Build verification** - Ensures binary compiles correctly - **Basic functionality** - Single-node CRUD operations - **Cluster formation** - 2-node gossip protocol and data replication - **Conflict resolution** - Automated conflict detection and resolution using `test_conflict.go` - **Authentication middleware** - Comprehensive security testing (Issue #4): - Admin endpoints properly reject unauthenticated requests - Admin endpoints work with valid JWT tokens - KV endpoints respect anonymous access configuration - Automatic root account creation and token extraction The test suite uses sophisticated retry logic and timing to handle the eventually consistent nature of the system. #### Conflict Testing Utility (`test_conflict.go`) Creates two BadgerDB instances with intentionally conflicting data (same path, same timestamp, different UUIDs) to test the conflict resolution algorithm. ### Development Notes #### Key Constraints - **Eventually Consistent**: All operations succeed locally first, then replicate - **Local-First Truth**: Nodes operate independently and sync in background - **No Transactions**: Each key operation is atomic and independent - **Hierarchical Keys**: Support for path-like structures (`/home/room/closet/socks`) #### Critical Timing Considerations - **Gossip intervals**: 1-2 minutes for membership updates - **Sync intervals**: 5 minutes for regular data sync, 2 minutes for catch-up - **Conflict resolution**: Typically resolves within 10-30 seconds after detection - **Bootstrap sync**: Up to 30 days of historical data for new nodes #### Main Entry Point Flow 1. `main.go` parses command-line arguments for subcommands (`start`, `stop`, `status`, `restart`) 2. For daemon mode: `daemon.Daemonize()` spawns background process and manages PID files 3. For server mode: loads config (auto-generates default if missing) 4. `server.NewServer()` initializes all subsystems 5. Graceful shutdown handling with `SIGINT`/`SIGTERM` 6. All business logic delegated to modular packages #### Daemon Architecture - **PID Management**: Global PID files stored in `~/.kvs/pids/` for cross-directory access - **Logging**: Daemon logs written to `~/.kvs/logs/{config-name}.log` - **Process Lifecycle**: Spawns detached process via `exec.Command()` with `Setsid: true` - **Config Normalization**: Supports both `node1` and `node1.yaml` formats - **Stale PID Detection**: Checks process existence via `Signal(0)` before operations This architecture enables easy feature addition, comprehensive testing, and reliable operation in distributed environments while maintaining simplicity for single-node deployments.