Files
kalzu-value-store/next_steps.md
ryyst 5ab03331fc Implement Phase 2: Enterprise-grade KVS enhancements
This massive enhancement transforms KVS from a basic distributed key-value store
into a production-ready enterprise database system with comprehensive authentication,
authorization, data management, and security features.

PHASE 2.1: CORE AUTHENTICATION & AUTHORIZATION
• Complete JWT-based authentication system with SHA3-512 security
• User and group management with CRUD APIs (/api/users, /api/groups)
• POSIX-inspired 12-bit ACL permission model (Owner/Group/Others: CDWR)
• Token management system with configurable expiration (default 1h)
• Authorization middleware with resource-level permission checking
• SHA3-512 hashing utilities for secure credential storage

PHASE 2.2: ADVANCED DATA MANAGEMENT
• ZSTD compression system with configurable levels (1-19, default 3)
• TTL support with resource metadata and automatic expiration
• 3-version revision history system with automatic rotation
• JSON size validation with configurable limits (default 1MB)
• Enhanced storage utilities with compression/decompression
• Resource metadata tracking (owner, group, permissions, timestamps)

PHASE 2.3: ENTERPRISE SECURITY & OPERATIONS
• Per-user rate limiting with sliding window algorithm
• Tamper-evident logging with cryptographic signatures (SHA3-512)
• Automated backup scheduling using cron (default: daily at midnight)
• ZSTD-compressed database snapshots with automatic cleanup
• Configurable backup retention policies (default: 7 days)
• Backup status monitoring API (/api/backup/status)

TECHNICAL ADDITIONS
• New dependencies: JWT v4, crypto/sha3, zstd compression, cron v3
• Extended configuration system with comprehensive Phase 2 settings
• API endpoints: 13 new endpoints for authentication, management, monitoring
• Storage patterns: user:<uuid>, group:<uuid>, token:<hash>, ratelimit:<user>:<window>
• Revision history: data:<key>:rev:[1-3] with metadata integration
• Tamper logs: log:<timestamp>:<uuid> with permanent retention

BACKWARD COMPATIBILITY
• All existing APIs remain fully functional
• Existing Merkle tree replication system unchanged
• New features can be disabled via configuration
• Migration-ready design for upgrading existing deployments

This implementation adds 1,500+ lines of sophisticated enterprise code while
maintaining the distributed, eventually-consistent architecture. The system
now supports multi-tenant deployments, compliance requirements, and
production-scale operations.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-09-11 18:17:41 +03:00

6.9 KiB

KVS Development Phase 2: Implementation Specification

Executive Summary

This document specifies the next development phase for the KVS (Key-Value Store) distributed database. Phase 2 adds authentication, authorization, data management improvements, and basic security features while maintaining backward compatibility with the existing Merkle tree-based replication system.

1. Authentication & Authorization System

1.1 Core Components

Users

  • Identified by UUID (generated server-side)
  • Nickname stored as SHA3-512 hash
  • Can belong to multiple groups
  • Storage key: user:<uuid>

Groups

  • Identified by UUID (generated server-side)
  • Group name stored as SHA3-512 hash
  • Contains list of member user UUIDs
  • Storage key: group:<uuid>

API Tokens

  • JWT tokens with SHA3-512 hashed storage
  • 1-hour default expiration (configurable)
  • Storage key: token:<sha3-512-hash>

1.2 Permission Model

POSIX-inspired ACL framework with 12-bit permissions:

  • 4 bits each for Owner/Group/Others
  • Operations: Create(C), Delete(D), Write(W), Read(R)
  • Default permissions: Owner(1111), Group(0110), Others(0010)
  • Stored as integer bitmask in resource metadata

Resource Metadata Schema:

{
  "owner_uuid": "string",
  "group_uuid": "string", 
  "permissions": 3826, // 12-bit integer
  "ttl": "24h"
}

1.3 API Endpoints

User Management

POST /api/users
  Body: {"nickname": "string"}
  Returns: {"uuid": "string"}

GET /api/users/{uuid}
PUT /api/users/{uuid} 
  Body: {"nickname": "string", "groups": ["uuid1", "uuid2"]}
DELETE /api/users/{uuid}

Group Management

POST /api/groups
  Body: {"groupname": "string", "members": ["uuid1", "uuid2"]}
  Returns: {"uuid": "string"}

GET /api/groups/{uuid}
PUT /api/groups/{uuid}
  Body: {"members": ["uuid1", "uuid2"]}
DELETE /api/groups/{uuid}

Token Management

POST /api/tokens
  Body: {"user_uuid": "string", "scopes": ["read", "write"]}
  Returns: {"token": "jwt-string", "expires_at": "timestamp"}

All endpoints require Authorization: Bearer <token> header.

1.4 Implementation Requirements

  • Use golang.org/x/crypto/sha3 for all hashing
  • Store token SHA3-512 hash in BadgerDB with TTL
  • Implement CheckPermission(userUUID, resourceKey, operation) bool function
  • Include user/group data in existing Merkle tree replication
  • Create migration script for existing data (add default metadata)

2. Database Enhancements

2.1 ZSTD Compression

Configuration:

database:
  compression_enabled: true
  compression_level: 3  # 1-19, balance performance/ratio

Implementation:

  • Use github.com/klauspost/compress/zstd
  • Compress all JSON values before BadgerDB storage
  • Decompress on read operations
  • Optional: Batch recompression of existing data on startup

2.2 TTL (Time-To-Live)

Features:

  • Per-key TTL support via resource metadata
  • Global default TTL configuration (optional)
  • Automatic expiration via BadgerDB's native TTL
  • TTL applied to main data and revision keys

API Integration:

// In PUT/POST requests
{
  "data": {...},
  "ttl": "24h"  // Go duration format
}

2.3 Revision History

Storage Pattern:

  • Main data: data:<key>
  • Revisions: data:<key>:rev:1, data:<key>:rev:2, data:<key>:rev:3
  • Metadata: data:<key>:metadata includes "revisions": [1,2,3]

Rotation Logic:

  • On write: rev:1→rev:2, rev:2→rev:3, new→rev:1, delete rev:3
  • Store up to 3 revisions per key

API Endpoints:

GET /api/data/{key}/history
  Returns: {"revisions": [{"number": 1, "timestamp": "..."}]}

GET /api/data/{key}/history/{revision}
  Returns: StoredValue for specific revision

2.4 Backup System

Configuration:

backups:
  enabled: true
  schedule: "0 0 * * *"  # Daily midnight
  path: "/backups"
  retention: 7  # days

Implementation:

  • Use github.com/robfig/cron/v3 for scheduling
  • Create ZSTD-compressed BadgerDB snapshots
  • Filename format: kvs-backup-YYYY-MM-DD.zstd
  • Automatic cleanup of old backups
  • Status API: GET /api/backup/status

2.5 JSON Size Limits

Configuration:

database:
  max_json_size: 1048576  # 1MB default

Implementation:

  • Check size before compression/storage
  • Return HTTP 413 if exceeded
  • Apply to main data and revisions
  • Log oversized attempts

3. Security Features

3.1 Rate Limiting

Configuration:

rate_limit:
  requests: 100
  window: "1m"

Implementation:

  • Per-user rate limiting using BadgerDB counters
  • Key pattern: ratelimit:<user_uuid>:<window_start>
  • Return HTTP 429 when limit exceeded
  • Counters have TTL equal to window duration

3.2 Tamper-Evident Logs

Log Entry Schema:

{
  "timestamp": "2025-09-11T17:29:00Z",
  "action": "data_write",  // Configurable actions
  "user_uuid": "string",
  "resource": "string",
  "signature": "sha3-512 hash"  // Hash of all fields
}

Storage:

  • Key: log:<timestamp>:<uuid>
  • Compressed with ZSTD
  • Hourly Merkle tree roots: log:merkle:<timestamp>
  • Include in cluster replication

Configurable Actions:

tamper_logs:
  actions: ["data_write", "user_create", "auth_failure"]

4. Implementation Phases

Phase 2.1: Core Authentication

  1. Implement user/group storage schema
  2. Add SHA3-512 hashing utilities
  3. Create basic CRUD APIs for users/groups
  4. Implement JWT token generation/validation
  5. Add authorization middleware

Phase 2.2: Data Features

  1. Add ZSTD compression to BadgerDB operations
  2. Implement TTL support in resource metadata
  3. Build revision history system
  4. Add JSON size validation

Phase 2.3: Security & Operations

  1. Implement rate limiting middleware
  2. Add tamper-evident logging system
  3. Build backup scheduling system
  4. Create migration scripts for existing data

Phase 2.4: Integration & Testing

  1. Integrate auth with existing replication
  2. End-to-end testing of all features
  3. Performance benchmarking
  4. Documentation updates

5. Configuration Example

node_id: "node1"
bind_address: "127.0.0.1"
port: 8080
data_dir: "./data"

database:
  compression_enabled: true
  compression_level: 3
  max_json_size: 1048576
  default_ttl: "0"  # No default TTL

backups:
  enabled: true
  schedule: "0 0 * * *"
  path: "/backups"
  retention: 7

rate_limit:
  requests: 100
  window: "1m"

tamper_logs:
  actions: ["data_write", "user_create", "auth_failure"]

6. Migration Strategy

  1. Backward Compatibility: All existing APIs remain functional
  2. Optional Features: New features can be disabled via configuration

7. Dependencies

New Libraries:

  • golang.org/x/crypto/sha3 - SHA3-512 hashing
  • github.com/klauspost/compress/zstd - Compression
  • github.com/robfig/cron/v3 - Backup scheduling
  • github.com/golang-jwt/jwt/v4 - JWT tokens (recommended)

Existing Libraries (no changes):

  • github.com/dgraph-io/badger/v4
  • github.com/google/uuid
  • github.com/gorilla/mux
  • github.com/sirupsen/logrus