diff --git a/--help b/--help deleted file mode 100644 index 7ed33ae..0000000 --- a/--help +++ /dev/null @@ -1,33 +0,0 @@ -node_id: GALACTICA -bind_address: 127.0.0.1 -port: 8080 -data_dir: ./data -seed_nodes: [] -read_only: false -log_level: info -gossip_interval_min: 60 -gossip_interval_max: 120 -sync_interval: 300 -catchup_interval: 120 -bootstrap_max_age_hours: 720 -throttle_delay_ms: 100 -fetch_delay_ms: 50 -compression_enabled: true -compression_level: 3 -default_ttl: "0" -max_json_size: 1048576 -rate_limit_requests: 100 -rate_limit_window: 1m -tamper_log_actions: - - data_write - - user_create - - auth_failure -backup_enabled: true -backup_schedule: 0 0 * * * -backup_path: ./backups -backup_retention: 7 -auth_enabled: true -tamper_logging_enabled: true -clustering_enabled: true -rate_limiting_enabled: true -revision_history_enabled: true diff --git a/design_v2.md b/design_v2.md deleted file mode 100644 index 325dae6..0000000 --- a/design_v2.md +++ /dev/null @@ -1,323 +0,0 @@ -# Gossip in GO, lazy syncing K/J database - -## Software Design Document: Clustered Key-Value Store - -### 1. Introduction - -#### 1.1 Goals -This document outlines the design for a minimalistic, clustered key-value database system written in Go. The primary goals are: -* **Eventual Consistency:** Prioritize availability and partition tolerance over strong consistency. -* **Local-First Truth:** Local operations should be fast, with replication happening in the background. -* **Gossip-Style Membership:** Decentralized mechanism for nodes to discover and track each other. -* **Hierarchical Keys:** Support for structured keys (e.g., `/home/room/closet/socks`). -* **Minimalistic Footprint:** Efficient resource usage on servers. -* **Simple Configuration & Operation:** Easy to deploy and manage. -* **Read-Only Mode:** Ability for nodes to restrict external writes. -* **Gradual Bootstrapping:** New nodes integrate smoothly without overwhelming the cluster. -* **Sophisticated Conflict Resolution:** Handle timestamp collisions using majority vote, with oldest node as tie-breaker. - -#### 1.2 Non-Goals -* Strong (linearizable/serializable) consistency. -* Complex querying or indexing beyond key-based lookups and timestamp-filtered UUID lists. -* Transaction support across multiple keys. - -### 2. Architecture Overview - -The system will consist of independent Go services (nodes) that communicate via HTTP/REST. Each node will embed a BadgerDB instance for local data storage and manage its own membership list through a gossip protocol. External clients interact with any available node, which then participates in the cluster's eventual consistency model. - -**Key Architectural Principles:** -* **Decentralized:** No central coordinator or leader. -* **Peer-to-Peer:** Nodes communicate directly with each other for replication and membership. -* **API-Driven:** All interactions, both external (clients) and internal (replication), occur over a RESTful HTTP API. - -``` -+----------------+ +----------------+ +----------------+ -| Node A | | Node B | | Node C | -| (Go Service) | | (Go Service) | | (Go Service) | -| | | | | | -| +------------+ | | +------------+ | | +------------+ | -| | HTTP Server| | <---- | | HTTP Server| | <---- | | HTTP Server| | -| | (API) | | ---> | | (API) | | ---> | | (API) | | -| +------------+ | | +------------+ | | +------------+ | -| | | | | | | | | -| +------------+ | | +------------+ | | +------------+ | -| | Gossip | | <---> | | Gossip | | <---> | | Gossip | | -| | Manager | | | | Manager | | | | Manager | | -| +------------+ | | +------------+ | | +------------+ | -| | | | | | | | | -| +------------+ | | +------------+ | | +------------+ | -| | Replication| | <---> | | Replication| | <---> | | Replication| | -| | Logic | | | | Logic | | | | Logic | | -| +------------+ | | +------------+ | | +------------+ | -| | | | | | | | | -| +------------+ | | +------------+ | | +------------+ | -| | BadgerDB | | | | BadgerDB | | | | BadgerDB | | -| | (Local KV) | | | | (Local KV) | | | | (Local KV) | | -| +------------+ | | +------------+ | | +------------+ | -+----------------+ +----------------+ +----------------+ - ^ - | - +----- External Clients (Interact with any Node's API) -``` - -### 3. Data Model - -#### 3.1 Logical Data Structure -Data is logically stored as a key-value pair, where the key is a hierarchical path and the value is a JSON object. Each pair also carries metadata for consistency and conflict resolution. - -* **Logical Key:** `string` (e.g., `/home/room/closet/socks`) -* **Logical Value:** `JSON object` (e.g., `{"count":7,"colors":["blue","red","black"]}`) - -#### 3.2 Internal Storage Structure (BadgerDB) -BadgerDB is a flat key-value store. To accommodate hierarchical keys and metadata, the following mapping will be used: - -* **BadgerDB Key:** The full logical key path, with the leading `/kv/` prefix removed. Path segments will be separated by `/`. **No leading `/` will be stored in the BadgerDB key.** - * Example: For logical key `/kv/home/room/closet/socks`, the BadgerDB key will be `home/room/closet/socks`. - -* **BadgerDB Value:** A marshaled JSON object containing the `uuid`, `timestamp`, and the actual `data` JSON object. This allows for consistent versioning and conflict resolution. - - ```json - // Example BadgerDB Value (marshaled JSON string) - { - "uuid": "a1b2c3d4-e5f6-7890-1234-567890abcdef", - "timestamp": 1672531200000, // Unix timestamp in milliseconds - "data": { - "count": 7, - "colors": ["blue", "red", "black"] - } - } - ``` * **`uuid` (string):** A UUIDv4, unique identifier for this specific version of the data. - * **`timestamp` (int64):** Unix timestamp representing the time of the last modification. **This will be in milliseconds since epoch**, providing higher precision and reducing collision risk. This is the primary mechanism for conflict resolution ("newest data wins"). - * **`data` (JSON object):** The actual user-provided JSON payload. - -### 4. API Endpoints - -All endpoints will communicate over HTTP/1.1 and utilize JSON for request/response bodies. - -#### 4.1 `/kv/` Endpoints (Data Operations - External/Internal) - -These endpoints are for direct key-value manipulation by external clients and are also used internally by nodes when fetching full data during replication. - -* **`GET /kv/{path}`** - * **Description:** Retrieves the JSON object associated with the given hierarchical key path. - * **Request:** No body. - * **Responses:** - * `200 OK`: `Content-Type: application/json` with the stored JSON object. - * `404 Not Found`: If the key does not exist. - * `500 Internal Server Error`: For server-side issues (e.g., BadgerDB error). - * **Example:** `GET /kv/home/room/closet/socks` -> `{"count":7,"colors":["blue","red","black"]}` - -* **`PUT /kv/{path}`** - * **Description:** Creates or updates a JSON object at the given path. This operation will internally generate a new UUIDv4 and assign the current Unix timestamp (milliseconds) to the stored value. - * **Request:** - * `Content-Type: application/json` - * Body: The JSON object to store. - * **Responses:** - * `200 OK` (Update) or `201 Created` (New): On success, returns `{"uuid": "new-uuid", "timestamp": new-timestamp_ms}`. - * `400 Bad Request`: If the request body is not valid JSON. - * `403 Forbidden`: If the node is in "read-only" mode and the request's origin is not a recognized cluster member (checked via IP/hostname). - * `500 Internal Server Error`: For server-side issues. - * **Example:** `PUT /kv/settings/theme` with body `{"color":"dark","font_size":14}` -> `{"uuid": "...", "timestamp": ...}` - -* **`DELETE /kv/{path}`** - * **Description:** Deletes the key-value pair at the given path. - * **Request:** No body. - * **Responses:** - * `204 No Content`: On successful deletion. - * `404 Not Found`: If the key does not exist. - * `403 Forbidden`: If the node is in "read-only" mode and the request is not from a recognized cluster member. - * `500 Internal Server Error`: For server-side issues. - -#### 4.2 `/members/` Endpoints (Membership & Internal Replication) - -These endpoints are primarily for internal communication between cluster nodes, managing membership and facilitating data synchronization. - -* **`GET /members/`** - * **Description:** Returns a list of known active members in the cluster. This list is maintained locally by each node based on the gossip protocol. - * **Request:** No body. - * **Responses:** - * `200 OK`: `Content-Type: application/json` with a JSON array of member details. - ```json - [ - {"id": "node-alpha", "address": "192.168.1.10:8080", "last_seen": 1672531200000, "joined_timestamp": 1672530000000}, - {"id": "node-beta", "address": "192.168.1.11:8080", "last_seen": 1672531205000, "joined_timestamp": 1672530100000} - ] - ``` - * `id` (string): Unique identifier for the node. - * `address` (string): `host:port` of the node's API endpoint. - * `last_seen` (int64): Unix timestamp (milliseconds) of when this node was last successfully contacted or heard from. - * `joined_timestamp` (int64): Unix timestamp (milliseconds) of when this node first joined the cluster. This is crucial for tie-breaking conflicts. - * `500 Internal Server Error`: For server-side issues. - -* **`POST /members/join`** - * **Description:** Used by a new node to announce its presence and attempt to join the cluster. Existing nodes use this to update their member list and respond with their current view of the cluster. - * **Request:** - * `Content-Type: application/json` - * Body: - ```json - {"id": "node-gamma", "address": "192.168.1.12:8080", "joined_timestamp": 1672532000000} - ``` - * `joined_timestamp` will be set by the joining node (its startup time). - * **Responses:** - * `200 OK`: Acknowledgment, returning the current list of known members to the joining node (same format as `GET /members/`). - * `400 Bad Request`: If the request body is malformed. - * `500 Internal Server Error`: For server-side issues. - -* **`DELETE /members/leave` (Optional, for graceful shutdown)** - * **Description:** A member can proactively announce its departure from the cluster. This allows other nodes to quickly mark it as inactive. - * **Request:** - * `Content-Type: application/json` - * Body: `{"id": "node-gamma"}` - * **Responses:** - * `204 No Content`: On successful processing. - * `400 Bad Request`: If the request body is malformed. - * `500 Internal Server Error`: For server-side issues. - -* **`POST /members/pairs_by_time` (Internal/Replication Endpoint)** - * **Description:** Used by other cluster members to request a list of key paths, their UUIDs, and their timestamps within a specified time range, optionally filtered by a key prefix. This is critical for both gradual bootstrapping and the regular 5-minute synchronization. - * **Request:** - * `Content-Type: application/json` - * Body: - ```json - { - "start_timestamp": 1672531200000, // Unix milliseconds (inclusive) - "end_timestamp": 1672617600000, // Unix milliseconds (exclusive), or 0 for "up to now" - "limit": 15, // Max number of pairs to return - "prefix": "home/room/" // Optional: filter by BadgerDB key prefix - } - ``` - * `start_timestamp`: Earliest timestamp for data to be included. - * `end_timestamp`: Latest timestamp (exclusive). If `0` or omitted, it implies "up to the current time". - * `limit`: **Fixed at 15** for this design, to control batch size during sync. - * `prefix`: Optional, to filter keys by a common BadgerDB key prefix. - * **Responses:** - * `200 OK`: `Content-Type: application/json` with a JSON array of objects: - ```json - [ - {"path": "home/room/closet/socks", "uuid": "...", "timestamp": 1672531200000}, - {"path": "users/john/profile", "uuid": "...", "timestamp": 1672531205000} - ] - ``` - * `204 No Content`: If no data matches the criteria. - * `400 Bad Request`: If request body is malformed or timestamps are invalid. - * `500 Internal Server Error`: For server-side issues. - -### 5. BadgerDB Integration - -BadgerDB will be used as the embedded, local, single-node key-value store. - -* **Key Storage:** As described in section 3.2, the HTTP path (without `/kv/` prefix and no leading `/`) will directly map to the BadgerDB key. -* **Value Storage:** Values will be marshaled JSON objects (`uuid`, `timestamp`, `data`). -* **Timestamp Indexing (for `pairs_by_time`):** To efficiently query by timestamp, a manual secondary index will be maintained. Each `PUT` operation will write two BadgerDB entries: - 1. The primary data entry: `{badger_key}` -> `{uuid, timestamp, data}`. - 2. A secondary timestamp index entry: `_ts:{timestamp_ms}:{badger_key}` -> `{uuid}`. - * The `_ts` prefix ensures these index keys are grouped and don't conflict with data keys. - * The timestamp (milliseconds) ensures lexicographical sorting by time. - * The `badger_key` in the index key allows for uniqueness and points back to the main data. - * The value can simply be the `uuid` or even an empty string if only the key is needed. Storing the `uuid` here is useful for direct lookups. -* **`DELETE` Operations:** A `DELETE /kv/{path}` will remove both the primary data entry and its corresponding secondary index entry from BadgerDB. - -### 6. Clustering and Consistency - -#### 6.1 Membership Management (Gossip Protocol) -* Each node maintains a local list of known cluster members (Node ID, Address, Last Seen Timestamp, Joined Timestamp). -* Every node will randomly pick a time **between 1-2 minutes** after its last check-up to initiate a gossip round. -* In a gossip round, the node randomly selects a subset of its healthy known members (e.g., 1-3 nodes) and performs a "gossip exchange": - 1. It sends its current local member list to the selected peers. - 2. Peers merge the received list with their own, updating `last_seen` timestamps for existing members and adding new ones. - 3. If a node fails to respond to multiple gossip attempts, it is eventually marked as "suspected down" and then "dead" after a configurable timeout. - -#### 6.2 Data Replication (Periodic Syncs) -* The system uses two types of data synchronization: - 1. **Regular 5-Minute Sync:** Catching up on recent changes. - 2. **Catch-Up Sync (2-Minute Cycles):** For nodes that detect they are significantly behind. - -* **Regular 5-Minute Sync:** - * Every **5 minutes**, each node initiates a data synchronization cycle. - * It selects a random healthy peer. - * It sends `POST /members/pairs_by_time` to the peer, requesting **the 15 latest UUIDs** (by setting `limit: 15` and `end_timestamp: current_time_ms`, with `start_timestamp: 0` or a very old value to ensure enough items are considered). - * The remote node responds with its 15 latest (path, uuid, timestamp) pairs. - * The local node compares these with its own latest 15. If it finds any data it doesn't have, or an older version of data it does have, it will fetch the full data via `GET /kv/{path}` and update its local store. - * If the local node detects it's significantly behind (e.g., many of the remote node's latest 15 UUIDs are missing or much newer locally, indicating a large gap), it triggers the **Catch-Up Sync**. - -* **Catch-Up Sync (2-Minute Cycles):** - * This mode is activated when a node determines it's behind its peers (e.g., during the 5-minute sync or bootstrapping). - * It runs every **2 minutes** (ensuring it doesn't align with the 5-minute sync). - * The node identifies the `oldest_known_timestamp_among_peers_latest_15` from its last regular sync. - * It then sends `POST /members/pairs_by_time` to a random healthy peer, requesting **15 UUIDs older than that timestamp** (e.g., `end_timestamp: oldest_known_timestamp_ms`, `limit: 15`, `start_timestamp: 0` or further back). - * It continuously iterates backwards in time in 2-minute cycles, progressively asking for older sets of 15 UUIDs until it has caught up to a reasonable historical depth (e.g., configured `BOOTSTRAP_MAX_AGE_HOURS`). - * **History Depth:** The system aims to keep track of **at least 3 revisions per path** for conflict resolution and eventually for versioning. The `BOOTSTRAP_MAX_AGE_MILLIS` (defaulting to 30 days) governs how far back in time a node will actively fetch during a full sync. - -#### 6.3 Conflict Resolution -When two nodes have different versions of the same key (same BadgerDB key), the conflict resolution logic is applied: - -1. **Timestamp Wins:** The data with the **most recent `timestamp` (Unix milliseconds)** is considered the correct version. -2. **Timestamp Collision (Tie-Breaker):** If two conflicting versions have the **exact same `timestamp`**: - * **Majority Vote:** The system will query a quorum of healthy peers (`GET /kv/{path}` or an internal check for UUID/timestamp) to see which UUID/timestamp pair the majority holds. The version held by the majority wins. - * **Oldest Node Priority (Tie-Breaker for Majority):** If there's an even number of nodes, and thus a tie in the majority vote (e.g., 2 nodes say version A, 2 nodes say version B), the version held by the node with the **oldest `joined_timestamp`** (i.e., the oldest active member in the cluster) takes precedence. This provides a deterministic tie-breaker. - * *Implementation Note:* For majority vote, a node might need to request the `{"uuid", "timestamp"}` pairs for a specific `path` from multiple peers. This implies an internal query mechanism or aggregating responses from `pairs_by_time` for the specific key. - -### 7. Bootstrapping New Nodes (Gradual Full Sync) - -This process is initiated when a new node starts up and has no existing data or member list. - -1. **Seed Node Configuration:** The new node must be configured with a list of initial `seed_nodes` (e.g., `["host1:port", "host2:port"]`). -2. **Join Request:** The new node attempts to `POST /members/join` to one of its configured seed nodes, providing its own `id`, `address`, and its `joined_timestamp` (its startup time). -3. **Member List Discovery:** Upon a successful join, the seed node responds with its current list of known cluster members. The new node populates its local member list. -4. **Gradual Data Synchronization Loop (Catch-Up Mode):** - * The new node sets its `current_end_timestamp = current_time_ms`. - * It defines a `sync_batch_size` (e.g., 15 UUIDs per request, as per `pairs_by_time` `limit`). - * It also defines a `throttle_delay` (e.g., 100ms between `pairs_by_time` requests to different peers) and a `fetch_delay` (e.g., 50ms between individual `GET /kv/{path}` requests for full data). - * **Loop backwards in time:** - * The node determines the `oldest_timestamp_fetched` from its *last* batch of `sync_batch_size` items. Initially, this would be `current_time_ms`. - * Randomly pick a healthy peer from its member list. - * Send `POST /members/pairs_by_time` to the peer with `end_timestamp: oldest_timestamp_fetched`, `limit: sync_batch_size`, and `start_timestamp: 0`. This asks for 15 items *older than* the oldest one just processed. - * Process the received `{"path", "uuid", "timestamp"}` pairs: - * For each remote pair, it fetches its local version from BadgerDB. - * **Conflict Resolution:** Apply the logic from section 6.3. If local data is missing or older, initiate a `GET /kv/{path}` to fetch the full data and store it. - * **Throttling:** - * Wait for `throttle_delay` after each `pairs_by_time` request. - * Wait for `fetch_delay` after each individual `GET /kv/{path}` request for full data. - * **Termination:** The loop continues until the `oldest_timestamp_fetched` goes below the configured `BOOTSTRAP_MAX_AGE_MILLIS` (defaulting to 30 days ago, configurable value). The node may also terminate if multiple consecutive `pairs_by_time` queries return no new (older) data. -5. **Full Participation:** Once the gradual sync is complete, the node fully participates in the regular 5-minute replication cycles and accepts external client writes (if not in read-only mode). During the sync, the node will operate in a `syncing` mode, rejecting external client writes with `503 Service Unavailable`. - -### 8. Operational Modes - -* **Normal Mode:** Full read/write capabilities, participates in all replication and gossip activities. -* **Read-Only Mode:** - * Node will reject `PUT` and `DELETE` requests from **external clients** with a `403 Forbidden` status. - * It will **still accept** `PUT` and `DELETE` operations that originate from **recognized cluster members** during replication, allowing it to remain eventually consistent. - * `GET` requests are always allowed. - * This mode is primarily for reducing write load or protecting data on specific nodes. -* **Syncing Mode (Internal during Bootstrap):** - * While a new node is undergoing its initial gradual sync, it operates in this internal mode. - * External `PUT`/`DELETE` requests will be **rejected with `503 Service Unavailable`**. - * Internal replication from other members is fully active. - -### 9. Logging - -A structured logging library (e.g., `zap` or `logrus`) will be used. - -* **Log Levels:** Support for `DEBUG`, `INFO`, `WARN`, `ERROR`, `FATAL`. Configurable. -* **Log Format:** JSON for easy parsing by log aggregators. -* **Key Events to Log:** - * **Startup/Shutdown:** Server start/stop, configuration loaded. - * **API Requests:** Incoming HTTP request details (method, path, client IP, status code, duration). - * **BadgerDB Operations:** Errors during put/get/delete, database open/close, secondary index operations. - * **Membership:** Node joined/left, gossip rounds initiated/received, member status changes (up, suspected, down), tie-breaker decisions. - * **Replication:** Sync cycle start/end, type of sync (regular/catch-up), number of keys compared, number of keys fetched, conflict resolutions (including details of timestamp collision resolution). - * **Errors:** Data serialization/deserialization, network errors, unhandled exceptions. - * **Operational Mode Changes:** Entering/exiting read-only mode, syncing mode. - -### 10. Future Work (Rough Order of Priority) - -These items are considered out of scope for the initial design but are planned for future versions. - -* **Authentication/Authorization (Before First Release):** Implement robust authentication for API endpoints (e.g., API keys, mTLS) and potentially basic authorization for access to `kv` paths. -* **Client Libraries/Functions (Bash, Python, Go):** Develop official client libraries or helper functions to simplify interaction with the API for common programming environments. -* **Data Compression (gzip):** Implement Gzip compression for data values stored in BadgerDB to reduce storage footprint and potentially improve I/O performance. -* **Data Revisions & Simple Backups:** - * Hold **at least 3 revisions per path**. This would involve a mechanism to store previous versions of data when a `PUT` occurs, potentially using a separate BadgerDB key namespace (e.g., `_rev:{badger_key}:{timestamp_of_revision}`). - * The current `GET /kv/{path}` would continue to return only the latest. A new API might be introduced to fetch specific historical revisions. - * Simple backup strategies could leverage these revisions or BadgerDB's native snapshot capabilities. -* **Monitoring & Metrics (Grafana Support in v3):** Integrate with a metrics system like Prometheus, exposing key performance indicators (e.g., request rates, error rates, replication lag, BadgerDB stats) for visualization in dashboards like Grafana. diff --git a/next_steps.md b/next_steps.md deleted file mode 100644 index 1735324..0000000 --- a/next_steps.md +++ /dev/null @@ -1,291 +0,0 @@ -# KVS Development Phase 2: Implementation Specification - -## Executive Summary - -This document specifies the next development phase for the KVS (Key-Value Store) distributed database. Phase 2 adds authentication, authorization, data management improvements, and basic security features while maintaining backward compatibility with the existing Merkle tree-based replication system. - -## 1. Authentication & Authorization System - -### 1.1 Core Components - -**Users** -- Identified by UUID (generated server-side) -- Nickname stored as SHA3-512 hash -- Can belong to multiple groups -- Storage key: `user:` - -**Groups** -- Identified by UUID (generated server-side) -- Group name stored as SHA3-512 hash -- Contains list of member user UUIDs -- Storage key: `group:` - -**API Tokens** -- JWT tokens with SHA3-512 hashed storage -- 1-hour default expiration (configurable) -- Storage key: `token:` - -### 1.2 Permission Model - -**POSIX-inspired ACL framework** with 12-bit permissions: -- 4 bits each for Owner/Group/Others -- Operations: Create(C), Delete(D), Write(W), Read(R) -- Default permissions: Owner(1111), Group(0110), Others(0010) -- Stored as integer bitmask in resource metadata - -**Resource Metadata Schema**: -```json -{ - "owner_uuid": "string", - "group_uuid": "string", - "permissions": 3826, // 12-bit integer - "ttl": "24h" -} -``` - -### 1.3 API Endpoints - -**User Management** -``` -POST /api/users - Body: {"nickname": "string"} - Returns: {"uuid": "string"} - -GET /api/users/{uuid} -PUT /api/users/{uuid} - Body: {"nickname": "string", "groups": ["uuid1", "uuid2"]} -DELETE /api/users/{uuid} -``` - -**Group Management** -``` -POST /api/groups - Body: {"groupname": "string", "members": ["uuid1", "uuid2"]} - Returns: {"uuid": "string"} - -GET /api/groups/{uuid} -PUT /api/groups/{uuid} - Body: {"members": ["uuid1", "uuid2"]} -DELETE /api/groups/{uuid} -``` - -**Token Management** -``` -POST /api/tokens - Body: {"user_uuid": "string", "scopes": ["read", "write"]} - Returns: {"token": "jwt-string", "expires_at": "timestamp"} -``` - -All endpoints require `Authorization: Bearer ` header. - -### 1.4 Implementation Requirements - -- Use `golang.org/x/crypto/sha3` for all hashing -- Store token SHA3-512 hash in BadgerDB with TTL -- Implement `CheckPermission(userUUID, resourceKey, operation) bool` function -- Include user/group data in existing Merkle tree replication -- Create migration script for existing data (add default metadata) - -## 2. Database Enhancements - -### 2.1 ZSTD Compression - -**Configuration**: -```yaml -database: - compression_enabled: true - compression_level: 3 # 1-19, balance performance/ratio -``` - -**Implementation**: -- Use `github.com/klauspost/compress/zstd` -- Compress all JSON values before BadgerDB storage -- Decompress on read operations -- Optional: Batch recompression of existing data on startup - -### 2.2 TTL (Time-To-Live) - -**Features**: -- Per-key TTL support via resource metadata -- Global default TTL configuration (optional) -- Automatic expiration via BadgerDB's native TTL -- TTL applied to main data and revision keys - -**API Integration**: -```json -// In PUT/POST requests -{ - "data": {...}, - "ttl": "24h" // Go duration format -} -``` - -### 2.3 Revision History - -**Storage Pattern**: -- Main data: `data:` -- Revisions: `data::rev:1`, `data::rev:2`, `data::rev:3` -- Metadata: `data::metadata` includes `"revisions": [1,2,3]` - -**Rotation Logic**: -- On write: rev:1→rev:2, rev:2→rev:3, new→rev:1, delete rev:3 -- Store up to 3 revisions per key - -**API Endpoints**: -``` -GET /api/data/{key}/history - Returns: {"revisions": [{"number": 1, "timestamp": "..."}]} - -GET /api/data/{key}/history/{revision} - Returns: StoredValue for specific revision -``` - -### 2.4 Backup System - -**Configuration**: -```yaml -backups: - enabled: true - schedule: "0 0 * * *" # Daily midnight - path: "/backups" - retention: 7 # days -``` - -**Implementation**: -- Use `github.com/robfig/cron/v3` for scheduling -- Create ZSTD-compressed BadgerDB snapshots -- Filename format: `kvs-backup-YYYY-MM-DD.zstd` -- Automatic cleanup of old backups -- Status API: `GET /api/backup/status` - -### 2.5 JSON Size Limits - -**Configuration**: -```yaml -database: - max_json_size: 1048576 # 1MB default -``` - -**Implementation**: -- Check size before compression/storage -- Return HTTP 413 if exceeded -- Apply to main data and revisions -- Log oversized attempts - -## 3. Security Features - -### 3.1 Rate Limiting - -**Configuration**: -```yaml -rate_limit: - requests: 100 - window: "1m" -``` - -**Implementation**: -- Per-user rate limiting using BadgerDB counters -- Key pattern: `ratelimit::` -- Return HTTP 429 when limit exceeded -- Counters have TTL equal to window duration - -### 3.2 Tamper-Evident Logs - -**Log Entry Schema**: -```json -{ - "timestamp": "2025-09-11T17:29:00Z", - "action": "data_write", // Configurable actions - "user_uuid": "string", - "resource": "string", - "signature": "sha3-512 hash" // Hash of all fields -} -``` - -**Storage**: -- Key: `log::` -- Compressed with ZSTD -- Hourly Merkle tree roots: `log:merkle:` -- Include in cluster replication - -**Configurable Actions**: -```yaml -tamper_logs: - actions: ["data_write", "user_create", "auth_failure"] -``` - -## 4. Implementation Phases - -### Phase 2.1: Core Authentication -1. Implement user/group storage schema -2. Add SHA3-512 hashing utilities -3. Create basic CRUD APIs for users/groups -4. Implement JWT token generation/validation -5. Add authorization middleware - -### Phase 2.2: Data Features -1. Add ZSTD compression to BadgerDB operations -2. Implement TTL support in resource metadata -3. Build revision history system -4. Add JSON size validation - -### Phase 2.3: Security & Operations -1. Implement rate limiting middleware -2. Add tamper-evident logging system -3. Build backup scheduling system -4. Create migration scripts for existing data - -### Phase 2.4: Integration & Testing -1. Integrate auth with existing replication -2. End-to-end testing of all features -3. Performance benchmarking -4. Documentation updates - -## 5. Configuration Example - -```yaml -node_id: "node1" -bind_address: "127.0.0.1" -port: 8080 -data_dir: "./data" - -database: - compression_enabled: true - compression_level: 3 - max_json_size: 1048576 - default_ttl: "0" # No default TTL - -backups: - enabled: true - schedule: "0 0 * * *" - path: "/backups" - retention: 7 - -rate_limit: - requests: 100 - window: "1m" - -tamper_logs: - actions: ["data_write", "user_create", "auth_failure"] -``` - -## 6. Migration Strategy - -1. **Backward Compatibility**: All existing APIs remain functional -2. **Optional Features**: New features can be disabled via configuration - - -## 7. Dependencies - -**New Libraries**: -- `golang.org/x/crypto/sha3` - SHA3-512 hashing -- `github.com/klauspost/compress/zstd` - Compression -- `github.com/robfig/cron/v3` - Backup scheduling -- `github.com/golang-jwt/jwt/v4` - JWT tokens (recommended) - -**Existing Libraries** (no changes): -- `github.com/dgraph-io/badger/v4` -- `github.com/google/uuid` -- `github.com/gorilla/mux` -- `github.com/sirupsen/logrus` - diff --git a/refactor.md b/refactor.md deleted file mode 100644 index 9f3e770..0000000 --- a/refactor.md +++ /dev/null @@ -1,68 +0,0 @@ -# Refactoring Proposal for KVS Main.go - -After analyzing your 3,990-line main.go file, I've identified clear functional areas that can be separated into manageable modules. -Here's my comprehensive refactoring proposal: - -Proposed File Structure - -kvs/ -├── main.go # Entry point + minimal server setup -├── config/ -│ └── config.go # Configuration structures and loading -├── types/ -│ └── types.go # All data structures and type definitions -├── auth/ -│ ├── auth.go # Authentication & authorization logic -│ ├── jwt.go # JWT token management -│ ├── middleware.go # Auth middleware -│ └── permissions.go # Permission checking utilities -├── storage/ -│ ├── storage.go # BadgerDB operations and utilities -│ ├── compression.go # ZSTD compression/decompression -│ ├── ttl.go # TTL and metadata management -│ └── revision.go # Revision history system -├── cluster/ -│ ├── gossip.go # Gossip protocol implementation -│ ├── members.go # Member management -│ ├── sync.go # Data synchronization -│ └── merkle.go # Merkle tree operations -├── server/ -│ ├── server.go # Server struct and core methods -│ ├── handlers.go # HTTP request handlers -│ ├── routes.go # Route setup -│ └── lifecycle.go # Server startup/shutdown logic -├── features/ -│ ├── ratelimit.go # Rate limiting middleware and utilities -│ ├── tamperlog.go # Tamper-evident logging -│ └── backup.go # Backup system -└── utils/ - └── hash.go # Hashing utilities (SHA3, etc.) - -Key Benefits - -1. Clear Separation of Concerns: Each package handles a specific responsibility -2. Better Testability: Smaller, focused functions are easier to unit test -3. Improved Maintainability: Changes to one feature don't affect others -4. Go Best Practices: Follows standard Go project layout conventions -5. Reduced Coupling: Clear interfaces between components - -Functional Areas Identified - -1. Configuration (~100 lines): Config structs, defaults, loading -2. Types (~400 lines): All data structures and constants -3. Authentication (~800 lines): User/Group/Token management, JWT, middleware -4. Storage (~600 lines): BadgerDB operations, compression, TTL, revisions -5. Clustering (~1,200 lines): Gossip, members, sync, Merkle trees -6. Server (~600 lines): Server struct, handlers, routes, lifecycle -7. Features (~200 lines): Rate limiting, tamper logging, backup -8. Utilities (~90 lines): Hashing and other utilities - -Migration Strategy - -1. Start with the most independent modules (types, config, utils) -2. Move storage and authentication components -3. Extract clustering logic -4. Refactor server components last -5. Create commits for each major module migration - -The refactoring will maintain zero functional changes - purely cosmetic restructuring for better code organization.