Add _ls and _tree Endpoints for Hierarchical Key Listing Using Merkle Tree #7

Open
opened 2025-09-22 15:25:22 +03:00 by MrKalzu · 0 comments
Contributor

Add _ls and _tree Endpoints for Hierarchical Key Listing Using Merkle Tree

KVS supports hierarchical keys (e.g., /home/room/closet/socks), which is great for organizing data like a file system. However, there's currently no built-in way for clients to discover or list subkeys under a given prefix/path. This makes it hard to build intuitive tools or UIs that need to navigate the keyspace, such as a web-based explorer or CLI client.

For example:

  • A client might want to list all direct children under /home/room/ (e.g., closet, bed, door).
  • Or recursively traverse the entire subtree for a full directory-like view.

Without this, users must either:

  • Maintain their own index outside of KVS (inefficient and error-prone).
  • Or scan the entire keyspace via full-range queries (slow and resource-intensive, especially in a large or distributed cluster).

Describe the solution you'd like

Add two new read-only endpoints that leverage the existing Merkle tree infrastructure for efficient prefix-based key listing. This aligns with KVS's modular design, eventual consistency model, and Merkle-based sync (no need for full DB scans—traverse the tree to identify relevant leaf nodes in O(log N) time).

Proposed Endpoints

  1. Direct Children Listing (_ls or _list):

    • Endpoint: GET /kv/{path}/_ls (or GET /kv/{path}/_list for clarity).
    • Purpose: Returns a sorted list of direct subkeys under the given path/prefix (non-recursive).
    • Query Params (optional):
      • limit: Max number of keys to return (default: 100, max: 1000).
      • include_metadata: If true, include basic metadata like timestamps (default: false).
    • Response (JSON):
      {
        "path": "/home/room",
        "children": [
          { "subkey": "closet", "timestamp": 1695280000000 },
          { "subkey": "bed", "timestamp": 1695279000000 }
        ],
        "total": 2,
        "truncated": false
      }
      
    • Behavior:
      • Treat {path} as a prefix (e.g., /home/room/ → keys starting with /home/room/ but not /home/room/sub/).
      • Use the Merkle tree to find leaf nodes in the prefix range [prefix, prefix~] (where ~ is the next lexicographical prefix).
      • Skip index keys (e.g., _ts:*).
      • Respect auth: Use existing middleware (e.g., read scope if auth_enabled: true).
      • In read-only/syncing modes: Allow if not modifying data.
  2. Recursive Tree View (_tree):

    • Endpoint: GET /kv/{path}/_tree.
    • Purpose: Returns a recursive tree structure of all subkeys under the given path (depth-first or breadth-first, configurable).
    • Query Params (optional):
      • depth: Max recursion depth (default: unlimited, but suggest 5 for safety).
      • limit: Max total keys (default: 500, max: 5000).
      • include_metadata: Include timestamps/UUIDs (default: false).
      • format: json (default) or nested (tree-like JSON).
    • Response (JSON, nested format):
      {
        "path": "/home/room",
        "children": [
          {
            "subkey": "closet",
            "children": [
              { "subkey": "socks", "timestamp": 1695281000000 }
            ],
            "timestamp": 1695280000000
          },
          {
            "subkey": "bed",
            "timestamp": 1695279000000
          }
        ],
        "total": 3,
        "truncated": false
      }
      
    • Behavior:
      • Build on _ls logic: Recursively query sub-prefixes via Merkle tree traversal.
      • Prune at depth or limit to avoid overload.
      • Same auth and mode rules as _ls.

Integration with Existing Systems

  • Merkle Tree Usage: Extend cluster/merkle.go (e.g., add GetKeysInRange(startKey, endKey) []string method) to traverse nodes covering the prefix range without fetching full values. Reuse buildMerkleTreeFromPairs and filterPairsByRange from handlers.go.
  • Range Query Reuse: Build on existing KVRangeRequest/KVRangeResponse in types.go and getKVRangeHandler (strip values to return just keys for efficiency).
  • Auth & Permissions: Apply via authService.Middleware (e.g., read scope). Respect allow_anonymous_read.
  • Config Toggle: Add key_listing_enabled: true to types.Config for optional disable (e.g., for security in public clusters).
  • Distributed Consistency: Since Merkle trees are synced, listings will be eventually consistent across nodes. Add a consistent: true query param to force a quick Merkle refresh if needed.

Example Usage

# List direct children under /home/room/
curl "http://localhost:8080/kv/home/room/_ls?limit=10"

# Recursive tree (limited depth)
curl "http://localhost:8080/kv/home/_tree?depth=2&include_metadata=true"

Additional context

  • Priority: Medium—enhances UX for hierarchical use cases without core changes.
  • Effort Estimate: Low-Medium (reuse Merkle/range infra; ~1-2 days).
  • Related Code:
    • cluster/merkle.go: Extend for key-only range traversal.
    • server/handlers.go: Add handlers like getKeyListHandler and getKeyTreeHandler.
    • server/routes.go: Wire up routes under KV section.
    • Tests: Extend integration_test.sh with prefix listing assertions.
  • Docs Update: Add to README.md under REST API > Data Operations.
# Add `_ls` and `_tree` Endpoints for Hierarchical Key Listing Using Merkle Tree ## Is your feature request related to a problem? Please describe. KVS supports hierarchical keys (e.g., `/home/room/closet/socks`), which is great for organizing data like a file system. However, there's currently no built-in way for clients to discover or list subkeys under a given prefix/path. This makes it hard to build intuitive tools or UIs that need to navigate the keyspace, such as a web-based explorer or CLI client. For example: - A client might want to list all direct children under `/home/room/` (e.g., `closet`, `bed`, `door`). - Or recursively traverse the entire subtree for a full directory-like view. Without this, users must either: - Maintain their own index outside of KVS (inefficient and error-prone). - Or scan the entire keyspace via full-range queries (slow and resource-intensive, especially in a large or distributed cluster). ## Describe the solution you'd like Add two new read-only endpoints that leverage the existing Merkle tree infrastructure for efficient prefix-based key listing. This aligns with KVS's modular design, eventual consistency model, and Merkle-based sync (no need for full DB scans—traverse the tree to identify relevant leaf nodes in O(log N) time). ### Proposed Endpoints 1. **Direct Children Listing (`_ls` or `_list`)**: - **Endpoint**: `GET /kv/{path}/_ls` (or `GET /kv/{path}/_list` for clarity). - **Purpose**: Returns a sorted list of **direct subkeys** under the given path/prefix (non-recursive). - **Query Params** (optional): - `limit`: Max number of keys to return (default: 100, max: 1000). - `include_metadata`: If true, include basic metadata like timestamps (default: false). - **Response** (JSON): ```json { "path": "/home/room", "children": [ { "subkey": "closet", "timestamp": 1695280000000 }, { "subkey": "bed", "timestamp": 1695279000000 } ], "total": 2, "truncated": false } ``` - **Behavior**: - Treat `{path}` as a prefix (e.g., `/home/room/` → keys starting with `/home/room/` but not `/home/room/sub/`). - Use the Merkle tree to find leaf nodes in the prefix range `[prefix, prefix~]` (where `~` is the next lexicographical prefix). - Skip index keys (e.g., `_ts:*`). - Respect auth: Use existing middleware (e.g., `read` scope if `auth_enabled: true`). - In read-only/syncing modes: Allow if not modifying data. 2. **Recursive Tree View (`_tree`)**: - **Endpoint**: `GET /kv/{path}/_tree`. - **Purpose**: Returns a **recursive tree structure** of all subkeys under the given path (depth-first or breadth-first, configurable). - **Query Params** (optional): - `depth`: Max recursion depth (default: unlimited, but suggest 5 for safety). - `limit`: Max total keys (default: 500, max: 5000). - `include_metadata`: Include timestamps/UUIDs (default: false). - `format`: `json` (default) or `nested` (tree-like JSON). - **Response** (JSON, nested format): ```json { "path": "/home/room", "children": [ { "subkey": "closet", "children": [ { "subkey": "socks", "timestamp": 1695281000000 } ], "timestamp": 1695280000000 }, { "subkey": "bed", "timestamp": 1695279000000 } ], "total": 3, "truncated": false } ``` - **Behavior**: - Build on `_ls` logic: Recursively query sub-prefixes via Merkle tree traversal. - Prune at `depth` or `limit` to avoid overload. - Same auth and mode rules as `_ls`. ### Integration with Existing Systems - **Merkle Tree Usage**: Extend `cluster/merkle.go` (e.g., add `GetKeysInRange(startKey, endKey) []string` method) to traverse nodes covering the prefix range without fetching full values. Reuse `buildMerkleTreeFromPairs` and `filterPairsByRange` from `handlers.go`. - **Range Query Reuse**: Build on existing `KVRangeRequest`/`KVRangeResponse` in `types.go` and `getKVRangeHandler` (strip values to return just keys for efficiency). - **Auth & Permissions**: Apply via `authService.Middleware` (e.g., `read` scope). Respect `allow_anonymous_read`. - **Config Toggle**: Add `key_listing_enabled: true` to `types.Config` for optional disable (e.g., for security in public clusters). - **Distributed Consistency**: Since Merkle trees are synced, listings will be eventually consistent across nodes. Add a `consistent: true` query param to force a quick Merkle refresh if needed. ### Example Usage ```bash # List direct children under /home/room/ curl "http://localhost:8080/kv/home/room/_ls?limit=10" # Recursive tree (limited depth) curl "http://localhost:8080/kv/home/_tree?depth=2&include_metadata=true" ``` ## Additional context - **Priority**: Medium—enhances UX for hierarchical use cases without core changes. - **Effort Estimate**: Low-Medium (reuse Merkle/range infra; ~1-2 days). - **Related Code**: - `cluster/merkle.go`: Extend for key-only range traversal. - `server/handlers.go`: Add handlers like `getKeyListHandler` and `getKeyTreeHandler`. - `server/routes.go`: Wire up routes under KV section. - Tests: Extend `integration_test.sh` with prefix listing assertions. - **Docs Update**: Add to `README.md` under REST API > Data Operations.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: ryyst/kalzu-value-store#7
No description provided.