Secure Cluster Communication with PSK and Node Identity Problem Statement #13

Open
opened 2025-09-22 22:25:35 +03:00 by MrKalzu · 0 comments
Contributor

Currently, KVS cluster membership and replication synchronization endpoints (/members/, /merkle_tree/, /kv_range) are not protected by the authentication middleware. This leaves internal cluster communication vulnerable to unauthorized access and manipulation, even when external client authentication (auth_enabled: true) is active. This gap poses a significant security risk, as any entity capable of reaching these endpoints can potentially join the cluster, extract data, or inject malicious information.
Proposed Solution: PSK-Authenticated Cluster Communication with Secure Bootstrapping

We will implement a security model for inter-node communication that combines a Global Cluster Secret (PSK) for authentication with the existing NodeID for identification. A secure bootstrapping mechanism, leveraging existing JWT tokens, will allow new nodes to obtain the Global Cluster Secret automatically.

Crucial Prerequisite: TLS Encryption
All inter-node communication MUST be encrypted using TLS (HTTPS). Transmitting any secrets (including the Global Cluster Secret) over unencrypted HTTP is unacceptable and negates the purpose of this security enhancement. This can be achieved either by configuring KVS nodes to serve HTTPS directly or by placing them behind a reverse proxy (e.g., Nginx, Traefik) that handles TLS termination for all inter-node traffic.
Core Components & How They Work

  1. Global Cluster Secret (GCS)

    Purpose: A single, strong, randomly generated secret key shared by all legitimate nodes in the cluster. It serves as the primary authentication credential for internal cluster communication, verifying that a request originates from a trusted cluster member.

    Configuration: The GCS will be configurable through multiple sources to facilitate automated deployments:

     Configuration File: cluster_secret: "your-strong-secret"
    
     Environment Variable: KVS_CLUSTER_SECRET
    
     File Path: cluster_secret_file: "/path/to/secret.key" (KVS reads the secret from this file)
    
     HTTPS URL: cluster_secret_url: "https://secrets.example.com/kvs_secret" (KVS fetches the secret from this URL)
    
     DNS TXT Record: cluster_secret_dns: "kvs-cluster.example.com" (KVS queries the TXT record for this domain)
    

    Usage: For every outgoing internal cluster request (e.g., /members/join, /members/gossip, /merkle_tree/root), the GCS will be included in a custom HTTP header, e.g., X-KVS-Cluster-Secret.

  2. Node Identity (NodeID)

    Purpose: The existing node_id from the configuration will be used to uniquely identify the specific node making an internal request. While the GCS authenticates membership, the NodeID provides granular identification for logging, auditing, and potential future node-specific authorization.

    Usage: The NodeID will be included in another custom HTTP header, e.g., X-KVS-Node-ID, alongside the GCS.

  3. Secure Bootstrapping Mechanism

    Purpose: To enable new nodes to securely obtain the Global Cluster Secret without manual intervention or having the GCS hardcoded in their initial configuration.

    Mechanism:

     A new API endpoint (e.g., POST /auth/cluster-bootstrap) will be introduced on existing cluster nodes.
    
     This endpoint will be protected by the standard JWT authentication middleware, requiring a token with sufficient administrative privileges (e.g., admin:cluster:bootstrap scope).
    
     A new node, during its Bootstrap() phase, will make a one-time request to this endpoint on a seed node. This request will include its NodeID and be authenticated using a temporary, admin-scoped JWT (e.g., the initial root token, or a specifically generated short-lived token).
    
     The endpoint will respond with the Global Cluster Secret.
    
     The new node will then store this GCS in its runtime configuration and use it for all subsequent internal cluster communications.
    

Detailed Implementation Plan
Phase 1: Configuration & TLS Enforcement

Update types.Config (types/types.go):
code Go

IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

    
type Config struct {
    // ... existing fields ...
    ClusterSecret      string `yaml:"cluster_secret"`
    ClusterSecretFile  string `yaml:"cluster_secret_file"`
    ClusterSecretURL   string `yaml:"cluster_secret_url"`
    ClusterSecretDNS   string `yaml:"cluster_secret_dns"`
    // ...
}

  

Update config.Load() (config/config.go):

    Implement logic to load the ClusterSecret from the specified sources in order of precedence (e.g., cluster_secret > KVS_CLUSTER_SECRET env var > cluster_secret_file > cluster_secret_url > cluster_secret_dns).

    Add a check: If ClusteringEnabled is true, but no ClusterSecret could be loaded, the server should fail to start with a clear error message.

TLS Enforcement:

    Documentation: Update README.md to explicitly state that TLS is mandatory for all inter-node communication when clustering is enabled and a ClusterSecret is configured. Provide guidance on setting up reverse proxies or configuring built-in TLS (if KVS ever supports it).

    Runtime Check (Optional but Recommended): Consider adding a runtime check in server.NewServer() to warn or fail if ClusteringEnabled is true and the bind_address is http:// instead of https:// (assuming KVS could support built-in TLS or if you enforce reverse proxy usage).

Phase 2: Cluster Authentication Middleware

Create clusterAuthMiddleware (server/server.go or server/cluster_middleware.go):
code Go

IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

func (s *Server) clusterAuthMiddleware(next http.HandlerFunc) http.HandlerFunc {
return func(w http.ResponseWriter, r *http.Request) {
if !s.config.ClusteringEnabled {
// If clustering is disabled, these endpoints should not be accessible.
http.Error(w, "Clustering is disabled", http.StatusForbidden)
return
}

    providedSecret := r.Header.Get("X-KVS-Cluster-Secret")
    providedNodeID := r.Header.Get("X-KVS-Node-ID") // For logging/auditing

    if providedSecret == "" || providedNodeID == "" {
        s.logger.WithFields(logrus.Fields{
            "remote_addr": r.RemoteAddr,
            "missing_secret": providedSecret == "",
            "missing_node_id": providedNodeID == "",
        }).Warn("Missing cluster authentication headers on internal endpoint")
        http.Error(w, "Unauthorized: Missing cluster authentication headers", http.StatusUnauthorized)
        return
    }

    if providedSecret != s.config.ClusterSecret {
        s.logger.WithFields(logrus.Fields{
            "remote_addr": r.RemoteAddr,
            "provided_node_id": providedNodeID,
        }).Warn("Unauthorized: Invalid cluster secret provided")
        http.Error(w, "Unauthorized: Invalid cluster secret", http.StatusUnauthorized)
        return
    }

    // Authentication successful, pass NodeID to context for handlers/logging
    ctx := context.WithValue(r.Context(), "cluster_node_id", providedNodeID)
    next(w, r.WithContext(ctx))
}

}

Apply Middleware (server/routes.go):

Wrap all existing cluster-related handlers (/members/*, /merkle_tree/*, /kv_range) with s.clusterAuthMiddleware.

code Go

IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

    
// In server/routes.go, inside setupRoutes():
if s.config.ClusteringEnabled {
    clusterAuth := s.clusterAuthMiddleware // Alias for brevity

    router.Handle("/members/", clusterAuth(s.getMembersHandler)).Methods("GET")
    router.Handle("/members/join", clusterAuth(s.joinMemberHandler)).Methods("POST")
    // ... apply to all other cluster endpoints ...
}

Phase 3: Outgoing Request Modification

Modify HTTP Clients in Cluster Services:

    Update all outgoing HTTP requests from cluster/bootstrap.go, cluster/gossip.go, and cluster/sync.go to include the X-KVS-Cluster-Secret and X-KVS-Node-ID headers.

    The NodeID can be retrieved from s.config.NodeID.

    The ClusterSecret can be retrieved from s.config.ClusterSecret.
code Go

IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

    
// Example modification in cluster/gossip.go (gossipWithPeer function):
req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData))
if err != nil { /* handle error */ }
req.Header.Set("Content-Type", "application/json")
req.Header.Set("X-KVS-Cluster-Secret", s.config.ClusterSecret) // Add this
req.Header.Set("X-KVS-Node-ID", s.config.NodeID)             // Add this

resp, err := client.Do(req)
// ... rest of the code ...

  

    Ensure the http.Client used has a reasonable timeout.

Phase 4: Secure Bootstrapping Endpoint

Add admin:cluster:bootstrap Scope (types/types.go):

    Define a new scope for this administrative action.

New Handler getClusterCredentialsHandler (server/handlers.go):

    This handler will serve the ClusterSecret.
code Go

IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

// server/handlers.go
func (s *Server) getClusterCredentialsHandler(w http.ResponseWriter, r *http.Request) {
// This endpoint should already be protected by JWT middleware requiring admin:cluster:bootstrap scope
// No request body needed, as the secret is global.
resp := map[string]string{
"cluster_secret": s.config.ClusterSecret,
}
w.Header().Set("Content-Type", "application/json")
json.NewEncoder(w).Encode(resp)
s.logger.WithField("remote_addr", r.RemoteAddr).Info("Served cluster credentials to new node")
}

Add Route (server/routes.go):

Add a new route for the bootstrapping endpoint, protected by the authService.Middleware.

code Go
IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

// In server/routes.go, inside setupRoutes():
if s.config.AuthEnabled {
// ... other auth routes ...
router.Handle("/auth/cluster-bootstrap", s.authService.Middleware(
[]string{"admin:cluster:bootstrap"}, nil, "",
)(s.getClusterCredentialsHandler)).Methods("GET") // Or POST if a body is needed for future extensions
}

Modify BootstrapService (cluster/bootstrap.go):

Before attempting to attemptJoin(), the Bootstrap() function on a new node needs to securely fetch the ClusterSecret.

code Go

IGNORE_WHEN_COPYING_START
IGNORE_WHEN_COPYING_END

    
// In cluster/bootstrap.go, Bootstrap() function:
func (s *BootstrapService) Bootstrap() {
    // ... existing checks ...

    // NEW: Securely fetch ClusterSecret if not already configured directly
    if s.config.ClusterSecret == "" {
        s.logger.Info("ClusterSecret not configured directly, attempting to fetch via bootstrap endpoint")
        if err := s.fetchClusterSecretFromSeed(); err != nil {
            s.logger.WithError(err).Error("Failed to fetch ClusterSecret, running as standalone")
            s.setMode("normal")
            return
        }
        s.logger.Info("Successfully fetched ClusterSecret")
    }

    // ... rest of existing Bootstrap logic (attemptJoin, performGradualSync) ...
}

// NEW: fetchClusterSecretFromSeed function
func (s *BootstrapService) fetchClusterSecretFromSeed() error {
    if len(s.config.SeedNodes) == 0 {
        return fmt.Errorf("no seed nodes configured to fetch ClusterSecret")
    }

    // Use the initial root token (or any admin token) for this one-time secure call
    // This implies the initial root token needs to be passed to the new node's config
    // OR the new node needs a way to obtain a temporary admin token.
    // For simplicity, let's assume the initial root token is provided as `bootstrap_token` in config
    if s.config.BootstrapToken == "" { // Add BootstrapToken to types.Config
        return fmt.Errorf("bootstrap_token not provided in config to fetch ClusterSecret")
    }

    client := &http.Client{Timeout: 10 * time.Second}
    for _, seedAddr := range s.config.SeedNodes {
        url := fmt.Sprintf("https://%s/auth/cluster-bootstrap", seedAddr) // MUST use HTTPS
        req, err := http.NewRequest("GET", url, nil)
        if err != nil {
            s.logger.WithError(err).WithField("seed", seedAddr).Warn("Failed to create bootstrap request")
            continue
        }
        req.Header.Set("Authorization", "Bearer "+s.config.BootstrapToken)

        resp, err := client.Do(req)
        if err != nil {
            s.logger.WithError(err).WithField("seed", seedAddr).Warn("Failed to contact seed for ClusterSecret")
            continue
        }
        defer resp.Body.Close()

        if resp.StatusCode != http.StatusOK {
            s.logger.WithFields(logrus.Fields{
                "seed": seedAddr,
                "status": resp.StatusCode,
            }).Warn("Seed node rejected ClusterSecret bootstrap request")
            continue
        }

        var respData map[string]string
        if err := json.NewDecoder(resp.Body).Decode(&respData); err != nil {
            s.logger.WithError(err).WithField("seed", seedAddr).Error("Failed to decode ClusterSecret response")
            continue
        }

        if secret, ok := respData["cluster_secret"]; ok && secret != "" {
            s.config.ClusterSecret = secret // Update runtime config
            return nil
        }
        s.logger.WithField("seed", seedAddr).Warn("ClusterSecret not found in bootstrap response")
    }
    return fmt.Errorf("failed to fetch ClusterSecret from any seed node")
}

  

    Note on BootstrapToken: This implies adding a BootstrapToken string yaml:"bootstrap_token"field totypes.Config. This token would be the initial root token, or a specifically generated short-lived token with the admin:cluster:bootstrap` scope, provided to the new node's configuration once for bootstrapping.

Security Considerations

TLS is Non-Negotiable: Reiterate that all internal cluster communication must use TLS. Without it, the ClusterSecret is transmitted in plaintext.

Strong Global Cluster Secret: The GCS must be a long, random, and cryptographically strong string. It should never be hardcoded or checked into version control.

Secure Storage: The GCS (and any BootstrapToken) should be stored securely on disk on each node (e.g., restricted file permissions) or fetched from a secure secret management service.

Secret Rotation: Develop a plan for rotating the GCS in case of compromise. This will require a coordinated update across all cluster nodes.

Auditing: Log all successful and failed cluster authentication attempts, including the NodeID from the X-KVS-Node-ID header.

BootstrapToken Lifespan: If using a BootstrapToken, ensure it's either very short-lived or a single-use token to minimize its exposure. The initial root token is suitable for this one-time use if it's discarded or revoked afterwards.

Future Enhancements (Optional)

Node-Specific Keys: For even higher granularity, instead of a single GCS, each node could have a unique key. This would require a more complex key management system and potentially a node-specific key distribution mechanism via the bootstrapping endpoint.

Mutual TLS (mTLS): The strongest form of authentication, where both client and server present certificates to verify each other's identity. This adds significant complexity to certificate management but provides the highest level of trust.

This comprehensive plan addresses the security vulnerability while providing flexible configuration and a secure bootstrapping process for new KVS nodes.

Currently, KVS cluster membership and replication synchronization endpoints (/members/*, /merkle_tree/*, /kv_range) are not protected by the authentication middleware. This leaves internal cluster communication vulnerable to unauthorized access and manipulation, even when external client authentication (auth_enabled: true) is active. This gap poses a significant security risk, as any entity capable of reaching these endpoints can potentially join the cluster, extract data, or inject malicious information. Proposed Solution: PSK-Authenticated Cluster Communication with Secure Bootstrapping We will implement a security model for inter-node communication that combines a Global Cluster Secret (PSK) for authentication with the existing NodeID for identification. A secure bootstrapping mechanism, leveraging existing JWT tokens, will allow new nodes to obtain the Global Cluster Secret automatically. Crucial Prerequisite: TLS Encryption All inter-node communication MUST be encrypted using TLS (HTTPS). Transmitting any secrets (including the Global Cluster Secret) over unencrypted HTTP is unacceptable and negates the purpose of this security enhancement. This can be achieved either by configuring KVS nodes to serve HTTPS directly or by placing them behind a reverse proxy (e.g., Nginx, Traefik) that handles TLS termination for all inter-node traffic. Core Components & How They Work 1. Global Cluster Secret (GCS) Purpose: A single, strong, randomly generated secret key shared by all legitimate nodes in the cluster. It serves as the primary authentication credential for internal cluster communication, verifying that a request originates from a trusted cluster member. Configuration: The GCS will be configurable through multiple sources to facilitate automated deployments: Configuration File: cluster_secret: "your-strong-secret" Environment Variable: KVS_CLUSTER_SECRET File Path: cluster_secret_file: "/path/to/secret.key" (KVS reads the secret from this file) HTTPS URL: cluster_secret_url: "https://secrets.example.com/kvs_secret" (KVS fetches the secret from this URL) DNS TXT Record: cluster_secret_dns: "kvs-cluster.example.com" (KVS queries the TXT record for this domain) Usage: For every outgoing internal cluster request (e.g., /members/join, /members/gossip, /merkle_tree/root), the GCS will be included in a custom HTTP header, e.g., X-KVS-Cluster-Secret. 2. Node Identity (NodeID) Purpose: The existing node_id from the configuration will be used to uniquely identify the specific node making an internal request. While the GCS authenticates membership, the NodeID provides granular identification for logging, auditing, and potential future node-specific authorization. Usage: The NodeID will be included in another custom HTTP header, e.g., X-KVS-Node-ID, alongside the GCS. 3. Secure Bootstrapping Mechanism Purpose: To enable new nodes to securely obtain the Global Cluster Secret without manual intervention or having the GCS hardcoded in their initial configuration. Mechanism: A new API endpoint (e.g., POST /auth/cluster-bootstrap) will be introduced on existing cluster nodes. This endpoint will be protected by the standard JWT authentication middleware, requiring a token with sufficient administrative privileges (e.g., admin:cluster:bootstrap scope). A new node, during its Bootstrap() phase, will make a one-time request to this endpoint on a seed node. This request will include its NodeID and be authenticated using a temporary, admin-scoped JWT (e.g., the initial root token, or a specifically generated short-lived token). The endpoint will respond with the Global Cluster Secret. The new node will then store this GCS in its runtime configuration and use it for all subsequent internal cluster communications. Detailed Implementation Plan Phase 1: Configuration & TLS Enforcement Update types.Config (types/types.go): code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END type Config struct { // ... existing fields ... ClusterSecret string `yaml:"cluster_secret"` ClusterSecretFile string `yaml:"cluster_secret_file"` ClusterSecretURL string `yaml:"cluster_secret_url"` ClusterSecretDNS string `yaml:"cluster_secret_dns"` // ... } Update config.Load() (config/config.go): Implement logic to load the ClusterSecret from the specified sources in order of precedence (e.g., cluster_secret > KVS_CLUSTER_SECRET env var > cluster_secret_file > cluster_secret_url > cluster_secret_dns). Add a check: If ClusteringEnabled is true, but no ClusterSecret could be loaded, the server should fail to start with a clear error message. TLS Enforcement: Documentation: Update README.md to explicitly state that TLS is mandatory for all inter-node communication when clustering is enabled and a ClusterSecret is configured. Provide guidance on setting up reverse proxies or configuring built-in TLS (if KVS ever supports it). Runtime Check (Optional but Recommended): Consider adding a runtime check in server.NewServer() to warn or fail if ClusteringEnabled is true and the bind_address is http:// instead of https:// (assuming KVS could support built-in TLS or if you enforce reverse proxy usage). Phase 2: Cluster Authentication Middleware Create clusterAuthMiddleware (server/server.go or server/cluster_middleware.go): code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END func (s *Server) clusterAuthMiddleware(next http.HandlerFunc) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { if !s.config.ClusteringEnabled { // If clustering is disabled, these endpoints should not be accessible. http.Error(w, "Clustering is disabled", http.StatusForbidden) return } providedSecret := r.Header.Get("X-KVS-Cluster-Secret") providedNodeID := r.Header.Get("X-KVS-Node-ID") // For logging/auditing if providedSecret == "" || providedNodeID == "" { s.logger.WithFields(logrus.Fields{ "remote_addr": r.RemoteAddr, "missing_secret": providedSecret == "", "missing_node_id": providedNodeID == "", }).Warn("Missing cluster authentication headers on internal endpoint") http.Error(w, "Unauthorized: Missing cluster authentication headers", http.StatusUnauthorized) return } if providedSecret != s.config.ClusterSecret { s.logger.WithFields(logrus.Fields{ "remote_addr": r.RemoteAddr, "provided_node_id": providedNodeID, }).Warn("Unauthorized: Invalid cluster secret provided") http.Error(w, "Unauthorized: Invalid cluster secret", http.StatusUnauthorized) return } // Authentication successful, pass NodeID to context for handlers/logging ctx := context.WithValue(r.Context(), "cluster_node_id", providedNodeID) next(w, r.WithContext(ctx)) } } Apply Middleware (server/routes.go): Wrap all existing cluster-related handlers (/members/*, /merkle_tree/*, /kv_range) with s.clusterAuthMiddleware. code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END // In server/routes.go, inside setupRoutes(): if s.config.ClusteringEnabled { clusterAuth := s.clusterAuthMiddleware // Alias for brevity router.Handle("/members/", clusterAuth(s.getMembersHandler)).Methods("GET") router.Handle("/members/join", clusterAuth(s.joinMemberHandler)).Methods("POST") // ... apply to all other cluster endpoints ... } Phase 3: Outgoing Request Modification Modify HTTP Clients in Cluster Services: Update all outgoing HTTP requests from cluster/bootstrap.go, cluster/gossip.go, and cluster/sync.go to include the X-KVS-Cluster-Secret and X-KVS-Node-ID headers. The NodeID can be retrieved from s.config.NodeID. The ClusterSecret can be retrieved from s.config.ClusterSecret. code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END // Example modification in cluster/gossip.go (gossipWithPeer function): req, err := http.NewRequest("POST", url, bytes.NewBuffer(jsonData)) if err != nil { /* handle error */ } req.Header.Set("Content-Type", "application/json") req.Header.Set("X-KVS-Cluster-Secret", s.config.ClusterSecret) // Add this req.Header.Set("X-KVS-Node-ID", s.config.NodeID) // Add this resp, err := client.Do(req) // ... rest of the code ... Ensure the http.Client used has a reasonable timeout. Phase 4: Secure Bootstrapping Endpoint Add admin:cluster:bootstrap Scope (types/types.go): Define a new scope for this administrative action. New Handler getClusterCredentialsHandler (server/handlers.go): This handler will serve the ClusterSecret. code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END // server/handlers.go func (s *Server) getClusterCredentialsHandler(w http.ResponseWriter, r *http.Request) { // This endpoint should already be protected by JWT middleware requiring admin:cluster:bootstrap scope // No request body needed, as the secret is global. resp := map[string]string{ "cluster_secret": s.config.ClusterSecret, } w.Header().Set("Content-Type", "application/json") json.NewEncoder(w).Encode(resp) s.logger.WithField("remote_addr", r.RemoteAddr).Info("Served cluster credentials to new node") } Add Route (server/routes.go): Add a new route for the bootstrapping endpoint, protected by the authService.Middleware. code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END // In server/routes.go, inside setupRoutes(): if s.config.AuthEnabled { // ... other auth routes ... router.Handle("/auth/cluster-bootstrap", s.authService.Middleware( []string{"admin:cluster:bootstrap"}, nil, "", )(s.getClusterCredentialsHandler)).Methods("GET") // Or POST if a body is needed for future extensions } Modify BootstrapService (cluster/bootstrap.go): Before attempting to attemptJoin(), the Bootstrap() function on a new node needs to securely fetch the ClusterSecret. code Go IGNORE_WHEN_COPYING_START IGNORE_WHEN_COPYING_END // In cluster/bootstrap.go, Bootstrap() function: func (s *BootstrapService) Bootstrap() { // ... existing checks ... // NEW: Securely fetch ClusterSecret if not already configured directly if s.config.ClusterSecret == "" { s.logger.Info("ClusterSecret not configured directly, attempting to fetch via bootstrap endpoint") if err := s.fetchClusterSecretFromSeed(); err != nil { s.logger.WithError(err).Error("Failed to fetch ClusterSecret, running as standalone") s.setMode("normal") return } s.logger.Info("Successfully fetched ClusterSecret") } // ... rest of existing Bootstrap logic (attemptJoin, performGradualSync) ... } // NEW: fetchClusterSecretFromSeed function func (s *BootstrapService) fetchClusterSecretFromSeed() error { if len(s.config.SeedNodes) == 0 { return fmt.Errorf("no seed nodes configured to fetch ClusterSecret") } // Use the initial root token (or any admin token) for this one-time secure call // This implies the initial root token needs to be passed to the new node's config // OR the new node needs a way to obtain a temporary admin token. // For simplicity, let's assume the initial root token is provided as `bootstrap_token` in config if s.config.BootstrapToken == "" { // Add BootstrapToken to types.Config return fmt.Errorf("bootstrap_token not provided in config to fetch ClusterSecret") } client := &http.Client{Timeout: 10 * time.Second} for _, seedAddr := range s.config.SeedNodes { url := fmt.Sprintf("https://%s/auth/cluster-bootstrap", seedAddr) // MUST use HTTPS req, err := http.NewRequest("GET", url, nil) if err != nil { s.logger.WithError(err).WithField("seed", seedAddr).Warn("Failed to create bootstrap request") continue } req.Header.Set("Authorization", "Bearer "+s.config.BootstrapToken) resp, err := client.Do(req) if err != nil { s.logger.WithError(err).WithField("seed", seedAddr).Warn("Failed to contact seed for ClusterSecret") continue } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { s.logger.WithFields(logrus.Fields{ "seed": seedAddr, "status": resp.StatusCode, }).Warn("Seed node rejected ClusterSecret bootstrap request") continue } var respData map[string]string if err := json.NewDecoder(resp.Body).Decode(&respData); err != nil { s.logger.WithError(err).WithField("seed", seedAddr).Error("Failed to decode ClusterSecret response") continue } if secret, ok := respData["cluster_secret"]; ok && secret != "" { s.config.ClusterSecret = secret // Update runtime config return nil } s.logger.WithField("seed", seedAddr).Warn("ClusterSecret not found in bootstrap response") } return fmt.Errorf("failed to fetch ClusterSecret from any seed node") } Note on BootstrapToken: This implies adding a BootstrapToken string yaml:"bootstrap_token"field totypes.Config. This token would be the initial root token, or a specifically generated short-lived token with the admin:cluster:bootstrap` scope, provided to the new node's configuration once for bootstrapping. Security Considerations TLS is Non-Negotiable: Reiterate that all internal cluster communication must use TLS. Without it, the ClusterSecret is transmitted in plaintext. Strong Global Cluster Secret: The GCS must be a long, random, and cryptographically strong string. It should never be hardcoded or checked into version control. Secure Storage: The GCS (and any BootstrapToken) should be stored securely on disk on each node (e.g., restricted file permissions) or fetched from a secure secret management service. Secret Rotation: Develop a plan for rotating the GCS in case of compromise. This will require a coordinated update across all cluster nodes. Auditing: Log all successful and failed cluster authentication attempts, including the NodeID from the X-KVS-Node-ID header. BootstrapToken Lifespan: If using a BootstrapToken, ensure it's either very short-lived or a single-use token to minimize its exposure. The initial root token is suitable for this one-time use if it's discarded or revoked afterwards. Future Enhancements (Optional) Node-Specific Keys: For even higher granularity, instead of a single GCS, each node could have a unique key. This would require a more complex key management system and potentially a node-specific key distribution mechanism via the bootstrapping endpoint. Mutual TLS (mTLS): The strongest form of authentication, where both client and server present certificates to verify each other's identity. This adds significant complexity to certificate management but provides the highest level of trust. This comprehensive plan addresses the security vulnerability while providing flexible configuration and a secure bootstrapping process for new KVS nodes.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: ryyst/kalzu-value-store#13
No description provided.