Files
kalzu-value-store/CLAUDE.md
ryyst 64680a6ece docs: document daemon process management commands
Update README.md and CLAUDE.md to document new process management:
- Add "Process Management" section with daemon commands
- Update all examples to use `./kvs start/stop/status` instead of `&` and `pkill`
- Document global PID/log directories (~/.kvs/)
- Update cluster setup examples
- Update development workflow
- Add daemon package to project structure

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-10-05 23:10:25 +03:00

7.8 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Commands for Development

Build and Test Commands

# Build the binary
go build -o kvs .

# Run with default config (auto-generates config.yaml)
./kvs start config.yaml

# Run with custom config
./kvs start /path/to/config.yaml

# Check running instances
./kvs status

# Stop instance
./kvs stop config

# Run comprehensive integration tests
./integration_test.sh

# Create test conflict data for debugging
go run test_conflict.go data1 data2

# Build and test in one go
go build -o kvs . && ./integration_test.sh

Process Management Commands

# Start as background daemon
./kvs start <config.yaml>       # .yaml extension optional

# Stop daemon
./kvs stop <config>             # Graceful SIGTERM shutdown

# Restart daemon
./kvs restart <config>          # Stop then start

# Show status
./kvs status                    # All instances
./kvs status <config>           # Specific instance

# Run in foreground (for debugging)
./kvs <config.yaml>             # Logs to stdout, blocks terminal

# View daemon logs
tail -f ~/.kvs/logs/kvs_<config>.yaml.log

# Global state directories
~/.kvs/pids/                    # PID files (works from any directory)
~/.kvs/logs/                    # Daemon log files

Development Workflow

# Format and check code
go fmt ./...
go vet ./...

# Run dependencies management
go mod tidy

# Check build without artifacts
go build .

# Test specific cluster scenarios
./kvs start node1.yaml
./kvs start node2.yaml

# Wait for cluster formation
sleep 5

# Test data operations
curl -X PUT http://localhost:8081/kv/test/data -H "Content-Type: application/json" -d '{"test":"data"}'
curl http://localhost:8082/kv/test/data  # Should replicate within ~30 seconds

# Check daemon status
./kvs status

# View logs
tail -f ~/.kvs/logs/kvs_node1.yaml.log

# Cleanup
./kvs stop node1
./kvs stop node2

Architecture Overview

High-Level Structure

KVS is a distributed, eventually consistent key-value store built around three core systems:

  1. Gossip Protocol (cluster/gossip.go) - Decentralized membership management and failure detection
  2. Merkle Tree Sync (cluster/sync.go, cluster/merkle.go) - Efficient data synchronization and conflict resolution
  3. Modular Server (server/) - HTTP API with pluggable feature modules

Key Architectural Patterns

Modular Package Design

  • auth/ - Complete JWT authentication system with POSIX-inspired permissions
  • cluster/ - Distributed systems logic (gossip, sync, merkle trees)
  • daemon/ - Process management (daemonization, PID files, lifecycle)
  • storage/ - BadgerDB abstraction with compression and revision history
  • server/ - HTTP handlers, routing, and lifecycle management
  • features/ - Utility functions for TTL, rate limiting, tamper logging, backup
  • types/ - Centralized type definitions for all components
  • config/ - Configuration loading with auto-generation
  • utils/ - Cryptographic hashing utilities

Core Data Model

// Primary storage format
type StoredValue struct {
    UUID      string          `json:"uuid"`      // Unique version identifier
    Timestamp int64           `json:"timestamp"` // Unix timestamp (milliseconds)  
    Data      json.RawMessage `json:"data"`      // Actual user JSON payload
}

Critical System Interactions

Conflict Resolution Flow:

  1. Merkle trees detect divergent data between nodes (cluster/merkle.go)
  2. Sync service fetches conflicting keys (cluster/sync.go:fetchAndCompareData)
  3. Sophisticated conflict resolution logic in resolveConflict():
    • Same timestamp → Apply "oldest-node rule" (earliest joined_timestamp wins)
    • Tie-breaker → UUID comparison for deterministic results
    • Winner's data automatically replicated to losing nodes

Authentication & Authorization:

  • JWT tokens with scoped permissions (auth/jwt.go)
  • POSIX-inspired 12-bit permission system (types/types.go:52-75)
  • Resource ownership metadata with TTL support (types/ResourceMetadata)

Storage Strategy:

  • Main keys: Direct path mapping (users/john/profile)
  • Index keys: _ts:{timestamp}:{path} for time-based queries
  • Compression: Optional ZSTD compression (storage/compression.go)
  • Revisions: Optional revision history (storage/revision.go)

Configuration Architecture

The system uses feature toggles extensively (types/Config:271-280):

auth_enabled: true              # JWT authentication system
tamper_logging_enabled: true    # Cryptographic audit trail  
clustering_enabled: true        # Gossip protocol and sync
rate_limiting_enabled: true     # Per-client rate limiting
revision_history_enabled: true  # Automatic versioning

# Anonymous access control (Issue #5 - when auth_enabled: true)
allow_anonymous_read: false     # Allow unauthenticated read access to KV endpoints
allow_anonymous_write: false    # Allow unauthenticated write access to KV endpoints

Security Note: DELETE operations always require authentication when auth_enabled: true, regardless of anonymous access settings.

Testing Strategy

Integration Test Suite (integration_test.sh)

  • Build verification - Ensures binary compiles correctly
  • Basic functionality - Single-node CRUD operations
  • Cluster formation - 2-node gossip protocol and data replication
  • Conflict resolution - Automated conflict detection and resolution using test_conflict.go
  • Authentication middleware - Comprehensive security testing (Issue #4):
    • Admin endpoints properly reject unauthenticated requests
    • Admin endpoints work with valid JWT tokens
    • KV endpoints respect anonymous access configuration
    • Automatic root account creation and token extraction

The test suite uses sophisticated retry logic and timing to handle the eventually consistent nature of the system.

Conflict Testing Utility (test_conflict.go)

Creates two BadgerDB instances with intentionally conflicting data (same path, same timestamp, different UUIDs) to test the conflict resolution algorithm.

Development Notes

Key Constraints

  • Eventually Consistent: All operations succeed locally first, then replicate
  • Local-First Truth: Nodes operate independently and sync in background
  • No Transactions: Each key operation is atomic and independent
  • Hierarchical Keys: Support for path-like structures (/home/room/closet/socks)

Critical Timing Considerations

  • Gossip intervals: 1-2 minutes for membership updates
  • Sync intervals: 5 minutes for regular data sync, 2 minutes for catch-up
  • Conflict resolution: Typically resolves within 10-30 seconds after detection
  • Bootstrap sync: Up to 30 days of historical data for new nodes

Main Entry Point Flow

  1. main.go parses command-line arguments for subcommands (start, stop, status, restart)
  2. For daemon mode: daemon.Daemonize() spawns background process and manages PID files
  3. For server mode: loads config (auto-generates default if missing)
  4. server.NewServer() initializes all subsystems
  5. Graceful shutdown handling with SIGINT/SIGTERM
  6. All business logic delegated to modular packages

Daemon Architecture

  • PID Management: Global PID files stored in ~/.kvs/pids/ for cross-directory access
  • Logging: Daemon logs written to ~/.kvs/logs/{config-name}.log
  • Process Lifecycle: Spawns detached process via exec.Command() with Setsid: true
  • Config Normalization: Supports both node1 and node1.yaml formats
  • Stale PID Detection: Checks process existence via Signal(0) before operations

This architecture enables easy feature addition, comprehensive testing, and reliable operation in distributed environments while maintaining simplicity for single-node deployments.