Files
ping_service/MULTI_INSTANCE.md

10 KiB

Multi-Instance Deployment Guide

This document provides guidance for deploying multiple instances of each service for high availability and scalability.

Overview

All services in this distributed network mapping system are designed to support multi-instance deployments, but each has specific considerations and limitations.


Input Service (input_service/)

Multi-Instance Readiness: ⚠️ Partially Ready

How It Works

  • Each instance maintains its own per-consumer state and CIDR generators
  • State is stored locally in progress_state/ directory
  • Global hop deduplication (globalSeen map) is instance-local

Multi-Instance Deployment Strategies

Option 1: Session Affinity (Recommended)

Load Balancer (with sticky sessions based on source IP)
    ├── input_service instance 1
    ├── input_service instance 2
    └── input_service instance 3
  • Configure load balancer to route each ping worker to the same input_service instance
  • Ensures per-consumer state consistency
  • Simple to implement and maintain

Option 2: Broadcast Hop Submissions

output_service ---> POST /hops ---> ALL input_service instances

Modify output_service to POST discovered hops to all input_service instances instead of just one. This ensures hop deduplication works across instances.

Option 3: Shared Deduplication Backend (Future Enhancement) Implement Redis or database-backed globalSeen storage so all instances share deduplication state.

Known Limitations

  • Hop deduplication is instance-local: Different instances may serve duplicate hops if output_service sends hops to only one instance
  • Per-consumer state is instance-local: If a consumer switches instances, it gets a new generator and starts from the beginning
  • CIDR files must be present on all instances: The cloud-provider-ip-addresses/ directory must exist on each instance

Deployment Example

# Instance 1
./http_input_service &

# Instance 2 (different port)
PORT=8081 ./http_input_service &

# Load balancer (nginx example)
upstream input_service {
    ip_hash;  # Session affinity
    server 127.0.0.1:8080;
    server 127.0.0.1:8081;
}

Output Service (output_service/)

Multi-Instance Readiness: Fully Ready

How It Works

  • Each instance maintains its own SQLite database
  • Databases are independent and can be aggregated later
  • sentHops deduplication is instance-local with 24-hour TTL

Multi-Instance Deployment

ping_service workers ---> Load Balancer ---> output_service instances
  • No session affinity required
  • Each instance stores results independently
  • Use /dump endpoint to collect databases from all instances for aggregation

Aggregation Strategy

# Collect databases from all instances
curl http://instance1:8091/dump > instance1.db
curl http://instance2:8091/dump > instance2.db
curl http://instance3:8091/dump > instance3.db

# Merge using sqlite3
sqlite3 merged.db <<EOF
ATTACH 'instance1.db' AS db1;
ATTACH 'instance2.db' AS db2;
ATTACH 'instance3.db' AS db3;

INSERT INTO ping_results SELECT * FROM db1.ping_results;
INSERT INTO ping_results SELECT * FROM db2.ping_results;
INSERT INTO ping_results SELECT * FROM db3.ping_results;

INSERT INTO traceroute_hops SELECT * FROM db1.traceroute_hops;
INSERT INTO traceroute_hops SELECT * FROM db2.traceroute_hops;
INSERT INTO traceroute_hops SELECT * FROM db3.traceroute_hops;
EOF

Deployment Example

# Instance 1
./output_service --port=8081 --health-port=8091 --db-dir=/data/output1 &

# Instance 2
./output_service --port=8082 --health-port=8092 --db-dir=/data/output2 &

# Instance 3
./output_service --port=8083 --health-port=8093 --db-dir=/data/output3 &

Ping Service (ping_service/)

Multi-Instance Readiness: Fully Ready

How It Works

  • Designed from the ground up for distributed operation
  • Each worker independently polls input_service and submits results
  • Cooldown cache is instance-local (intentional - distributed workers coordinate via cooldown duration)

Multi-Instance Deployment

input_service <--- ping_service workers (many instances)
                         |
                         v
                  output_service
  • Deploy as many workers as needed across different networks/locations
  • Workers can run on Raspberry Pis, VPS, cloud instances, etc.
  • No coordination required between workers

Deployment Example

# Worker 1 (local network)
./ping_service -config config.yaml &

# Worker 2 (VPS)
ssh vps1 "./ping_service -config config.yaml" &

# Worker 3 (different geographic location)
ssh vps2 "./ping_service -config config.yaml" &

Manager (manager/)

Multi-Instance Readiness: ⚠️ Requires Configuration

How It Works

  • Session store is in-memory (not shared across instances)
  • User store uses file-based storage with file locking (multi-instance safe as of latest update)
  • Worker registry is instance-local

Multi-Instance Deployment Strategies

Option 1: Active-Passive with Failover

Load Balancer (active-passive)
    ├── manager instance 1 (active)
    └── manager instance 2 (standby)
  • Only one instance active at a time
  • Failover on primary failure
  • Simplest approach, no session coordination needed

Option 2: Shared Session Store (Recommended for Active-Active) Implement Redis or database-backed session storage to enable true active-active multi-instance deployment.

Required Changes for Active-Active:

// Replace in-memory sessions (main.go:31-34) with Redis
var sessions = redis.NewSessionStore(redisClient)

Current Limitations

  • Sessions are not shared: User authenticated on instance A cannot access instance B
  • Worker registry is not shared: Each instance maintains its own worker list
  • dy.fi updates may conflict: Multiple instances updating the same domain simultaneously

User Store File Locking ( Fixed)

As of the latest update, the user store uses file locking to prevent race conditions:

  • Shared locks for reads (multiple readers allowed)
  • Exclusive locks for writes (blocks all readers and writers)
  • Atomic write-then-rename prevents corruption
  • Safe for multi-instance deployment when instances share the same filesystem

Deployment Example (Active-Passive)

# Primary instance
./manager --port=8080 --domain=manager.dy.fi &

# Secondary instance (standby)
MANAGER_PORT=8081 ./manager &

# Load balancer health check both, route to active only

General Multi-Instance Recommendations

Health Checks

All services expose /health and /ready endpoints. Configure your load balancer to:

  • Route traffic only to healthy instances
  • Remove failed instances from rotation automatically
  • Monitor /metrics endpoint for Prometheus integration

Monitoring

Add instance_id labels to metrics for per-instance monitoring:

// Recommended enhancement for all services
var instanceID = os.Hostname()

File Locking

Services that write to shared storage should use file locking (like manager user store) to prevent corruption:

syscall.Flock(fd, syscall.LOCK_EX)  // Exclusive lock
syscall.Flock(fd, syscall.LOCK_SH)  // Shared lock

Network Considerations

  • Latency: Place input_service close to ping workers to minimize polling latency
  • Bandwidth: output_service should have sufficient bandwidth for result ingestion
  • NAT Traversal: Use manager gateway mode for ping workers behind NAT

Troubleshooting Multi-Instance Deployments

Input Service: Duplicate Hops Served

Symptom: Same hop appears multiple times in different workers Cause: Hop deduplication is instance-local Solution: Implement session affinity or broadcast hop submissions

Manager: Sessions Lost After Reconnect

Symptom: User logged out when load balancer switches instances Cause: Sessions are in-memory, not shared Solution: Use session affinity in load balancer or implement shared session store

Output Service: Database Conflicts

Symptom: Database file corruption or lock timeouts Cause: Multiple instances writing to same database file Solution: Each instance MUST have its own --db-dir, then aggregate later

Ping Service: Excessive Pinging

Symptom: Same IP pinged too frequently Cause: Too many workers with short cooldown period Solution: Increase cooldown_minutes in config.yaml


Production Deployment Checklist

  • Input service: Configure session affinity or hop broadcast
  • Output service: Each instance has unique --db-dir
  • Ping service: Cooldown duration accounts for total worker count
  • Manager: Decide active-passive or implement shared sessions
  • All services: Health check endpoints configured in load balancer
  • All services: Metrics exported to monitoring system
  • All services: Logs aggregated to central logging system
  • File-based state: Shared filesystem or backup/sync strategy
  • Database rotation: Automated collection of output service dumps

Future Enhancements

High Priority

  1. Shared session store for manager (Redis/database)
  2. Shared hop deduplication for input_service (Redis)
  3. Distributed worker coordination for ping_service cooldowns

Medium Priority

  1. Instance ID labels in metrics for better observability
  2. Graceful shutdown coordination to prevent data loss
  3. Health check improvements to verify actual functionality

Low Priority

  1. Automated database aggregation for output_service
  2. Service mesh integration (Consul, etcd) for discovery
  3. Horizontal autoscaling based on load metrics

Summary Table

Service Multi-Instance Ready Session Affinity Needed Shared Storage Needed Notes
input_service ⚠️ Partial Yes (recommended) No Hop dedup is instance-local
output_service Full No No Each instance has own DB
ping_service Full No No Fully distributed by design
manager ⚠️ Requires config Yes (sessions) Yes (user store) Sessions in-memory; user store file-locked

For questions or issues with multi-instance deployments, refer to the service-specific README files or open an issue in the project repository.