# KVS Development Phase 2: Implementation Specification ## Executive Summary This document specifies the next development phase for the KVS (Key-Value Store) distributed database. Phase 2 adds authentication, authorization, data management improvements, and basic security features while maintaining backward compatibility with the existing Merkle tree-based replication system. ## 1. Authentication & Authorization System ### 1.1 Core Components **Users** - Identified by UUID (generated server-side) - Nickname stored as SHA3-512 hash - Can belong to multiple groups - Storage key: `user:` **Groups** - Identified by UUID (generated server-side) - Group name stored as SHA3-512 hash - Contains list of member user UUIDs - Storage key: `group:` **API Tokens** - JWT tokens with SHA3-512 hashed storage - 1-hour default expiration (configurable) - Storage key: `token:` ### 1.2 Permission Model **POSIX-inspired ACL framework** with 12-bit permissions: - 4 bits each for Owner/Group/Others - Operations: Create(C), Delete(D), Write(W), Read(R) - Default permissions: Owner(1111), Group(0110), Others(0010) - Stored as integer bitmask in resource metadata **Resource Metadata Schema**: ```json { "owner_uuid": "string", "group_uuid": "string", "permissions": 3826, // 12-bit integer "ttl": "24h" } ``` ### 1.3 API Endpoints **User Management** ``` POST /api/users Body: {"nickname": "string"} Returns: {"uuid": "string"} GET /api/users/{uuid} PUT /api/users/{uuid} Body: {"nickname": "string", "groups": ["uuid1", "uuid2"]} DELETE /api/users/{uuid} ``` **Group Management** ``` POST /api/groups Body: {"groupname": "string", "members": ["uuid1", "uuid2"]} Returns: {"uuid": "string"} GET /api/groups/{uuid} PUT /api/groups/{uuid} Body: {"members": ["uuid1", "uuid2"]} DELETE /api/groups/{uuid} ``` **Token Management** ``` POST /api/tokens Body: {"user_uuid": "string", "scopes": ["read", "write"]} Returns: {"token": "jwt-string", "expires_at": "timestamp"} ``` All endpoints require `Authorization: Bearer ` header. ### 1.4 Implementation Requirements - Use `golang.org/x/crypto/sha3` for all hashing - Store token SHA3-512 hash in BadgerDB with TTL - Implement `CheckPermission(userUUID, resourceKey, operation) bool` function - Include user/group data in existing Merkle tree replication - Create migration script for existing data (add default metadata) ## 2. Database Enhancements ### 2.1 ZSTD Compression **Configuration**: ```yaml database: compression_enabled: true compression_level: 3 # 1-19, balance performance/ratio ``` **Implementation**: - Use `github.com/klauspost/compress/zstd` - Compress all JSON values before BadgerDB storage - Decompress on read operations - Optional: Batch recompression of existing data on startup ### 2.2 TTL (Time-To-Live) **Features**: - Per-key TTL support via resource metadata - Global default TTL configuration (optional) - Automatic expiration via BadgerDB's native TTL - TTL applied to main data and revision keys **API Integration**: ```json // In PUT/POST requests { "data": {...}, "ttl": "24h" // Go duration format } ``` ### 2.3 Revision History **Storage Pattern**: - Main data: `data:` - Revisions: `data::rev:1`, `data::rev:2`, `data::rev:3` - Metadata: `data::metadata` includes `"revisions": [1,2,3]` **Rotation Logic**: - On write: rev:1→rev:2, rev:2→rev:3, new→rev:1, delete rev:3 - Store up to 3 revisions per key **API Endpoints**: ``` GET /api/data/{key}/history Returns: {"revisions": [{"number": 1, "timestamp": "..."}]} GET /api/data/{key}/history/{revision} Returns: StoredValue for specific revision ``` ### 2.4 Backup System **Configuration**: ```yaml backups: enabled: true schedule: "0 0 * * *" # Daily midnight path: "/backups" retention: 7 # days ``` **Implementation**: - Use `github.com/robfig/cron/v3` for scheduling - Create ZSTD-compressed BadgerDB snapshots - Filename format: `kvs-backup-YYYY-MM-DD.zstd` - Automatic cleanup of old backups - Status API: `GET /api/backup/status` ### 2.5 JSON Size Limits **Configuration**: ```yaml database: max_json_size: 1048576 # 1MB default ``` **Implementation**: - Check size before compression/storage - Return HTTP 413 if exceeded - Apply to main data and revisions - Log oversized attempts ## 3. Security Features ### 3.1 Rate Limiting **Configuration**: ```yaml rate_limit: requests: 100 window: "1m" ``` **Implementation**: - Per-user rate limiting using BadgerDB counters - Key pattern: `ratelimit::` - Return HTTP 429 when limit exceeded - Counters have TTL equal to window duration ### 3.2 Tamper-Evident Logs **Log Entry Schema**: ```json { "timestamp": "2025-09-11T17:29:00Z", "action": "data_write", // Configurable actions "user_uuid": "string", "resource": "string", "signature": "sha3-512 hash" // Hash of all fields } ``` **Storage**: - Key: `log::` - Compressed with ZSTD - Hourly Merkle tree roots: `log:merkle:` - Include in cluster replication **Configurable Actions**: ```yaml tamper_logs: actions: ["data_write", "user_create", "auth_failure"] ``` ## 4. Implementation Phases ### Phase 2.1: Core Authentication 1. Implement user/group storage schema 2. Add SHA3-512 hashing utilities 3. Create basic CRUD APIs for users/groups 4. Implement JWT token generation/validation 5. Add authorization middleware ### Phase 2.2: Data Features 1. Add ZSTD compression to BadgerDB operations 2. Implement TTL support in resource metadata 3. Build revision history system 4. Add JSON size validation ### Phase 2.3: Security & Operations 1. Implement rate limiting middleware 2. Add tamper-evident logging system 3. Build backup scheduling system 4. Create migration scripts for existing data ### Phase 2.4: Integration & Testing 1. Integrate auth with existing replication 2. End-to-end testing of all features 3. Performance benchmarking 4. Documentation updates ## 5. Configuration Example ```yaml node_id: "node1" bind_address: "127.0.0.1" port: 8080 data_dir: "./data" database: compression_enabled: true compression_level: 3 max_json_size: 1048576 default_ttl: "0" # No default TTL backups: enabled: true schedule: "0 0 * * *" path: "/backups" retention: 7 rate_limit: requests: 100 window: "1m" tamper_logs: actions: ["data_write", "user_create", "auth_failure"] ``` ## 6. Migration Strategy 1. **Backward Compatibility**: All existing APIs remain functional 2. **Optional Features**: New features can be disabled via configuration ## 7. Dependencies **New Libraries**: - `golang.org/x/crypto/sha3` - SHA3-512 hashing - `github.com/klauspost/compress/zstd` - Compression - `github.com/robfig/cron/v3` - Backup scheduling - `github.com/golang-jwt/jwt/v4` - JWT tokens (recommended) **Existing Libraries** (no changes): - `github.com/dgraph-io/badger/v4` - `github.com/google/uuid` - `github.com/gorilla/mux` - `github.com/sirupsen/logrus`