10 KiB
Multi-Instance Deployment Guide
This document provides guidance for deploying multiple instances of each service for high availability and scalability.
Overview
All services in this distributed network mapping system are designed to support multi-instance deployments, but each has specific considerations and limitations.
Input Service (input_service/)
Multi-Instance Readiness: ⚠️ Partially Ready
How It Works
- Each instance maintains its own per-consumer state and CIDR generators
- State is stored locally in
progress_state/directory - Global hop deduplication (
globalSeenmap) is instance-local
Multi-Instance Deployment Strategies
Option 1: Session Affinity (Recommended)
Load Balancer (with sticky sessions based on source IP)
├── input_service instance 1
├── input_service instance 2
└── input_service instance 3
- Configure load balancer to route each ping worker to the same input_service instance
- Ensures per-consumer state consistency
- Simple to implement and maintain
Option 2: Broadcast Hop Submissions
output_service ---> POST /hops ---> ALL input_service instances
Modify output_service to POST discovered hops to all input_service instances instead of just one. This ensures hop deduplication works across instances.
Option 3: Shared Deduplication Backend (Future Enhancement)
Implement Redis or database-backed globalSeen storage so all instances share deduplication state.
Known Limitations
- Hop deduplication is instance-local: Different instances may serve duplicate hops if output_service sends hops to only one instance
- Per-consumer state is instance-local: If a consumer switches instances, it gets a new generator and starts from the beginning
- CIDR files must be present on all instances: The
cloud-provider-ip-addresses/directory must exist on each instance
Deployment Example
# Instance 1
./http_input_service &
# Instance 2 (different port)
PORT=8081 ./http_input_service &
# Load balancer (nginx example)
upstream input_service {
ip_hash; # Session affinity
server 127.0.0.1:8080;
server 127.0.0.1:8081;
}
Output Service (output_service/)
Multi-Instance Readiness: ✅ Fully Ready
How It Works
- Each instance maintains its own SQLite database
- Databases are independent and can be aggregated later
sentHopsdeduplication is instance-local with 24-hour TTL
Multi-Instance Deployment
ping_service workers ---> Load Balancer ---> output_service instances
- No session affinity required
- Each instance stores results independently
- Use
/dumpendpoint to collect databases from all instances for aggregation
Aggregation Strategy
# Collect databases from all instances
curl http://instance1:8091/dump > instance1.db
curl http://instance2:8091/dump > instance2.db
curl http://instance3:8091/dump > instance3.db
# Merge using sqlite3
sqlite3 merged.db <<EOF
ATTACH 'instance1.db' AS db1;
ATTACH 'instance2.db' AS db2;
ATTACH 'instance3.db' AS db3;
INSERT INTO ping_results SELECT * FROM db1.ping_results;
INSERT INTO ping_results SELECT * FROM db2.ping_results;
INSERT INTO ping_results SELECT * FROM db3.ping_results;
INSERT INTO traceroute_hops SELECT * FROM db1.traceroute_hops;
INSERT INTO traceroute_hops SELECT * FROM db2.traceroute_hops;
INSERT INTO traceroute_hops SELECT * FROM db3.traceroute_hops;
EOF
Deployment Example
# Instance 1
./output_service --port=8081 --health-port=8091 --db-dir=/data/output1 &
# Instance 2
./output_service --port=8082 --health-port=8092 --db-dir=/data/output2 &
# Instance 3
./output_service --port=8083 --health-port=8093 --db-dir=/data/output3 &
Ping Service (ping_service/)
Multi-Instance Readiness: ✅ Fully Ready
How It Works
- Designed from the ground up for distributed operation
- Each worker independently polls input_service and submits results
- Cooldown cache is instance-local (intentional - distributed workers coordinate via cooldown duration)
Multi-Instance Deployment
input_service <--- ping_service workers (many instances)
|
v
output_service
- Deploy as many workers as needed across different networks/locations
- Workers can run on Raspberry Pis, VPS, cloud instances, etc.
- No coordination required between workers
Deployment Example
# Worker 1 (local network)
./ping_service -config config.yaml &
# Worker 2 (VPS)
ssh vps1 "./ping_service -config config.yaml" &
# Worker 3 (different geographic location)
ssh vps2 "./ping_service -config config.yaml" &
Manager (manager/)
Multi-Instance Readiness: ⚠️ Requires Configuration
How It Works
- Session store is in-memory (not shared across instances)
- User store uses file-based storage with file locking (multi-instance safe as of latest update)
- Worker registry is instance-local
Multi-Instance Deployment Strategies
Option 1: Active-Passive with Failover
Load Balancer (active-passive)
├── manager instance 1 (active)
└── manager instance 2 (standby)
- Only one instance active at a time
- Failover on primary failure
- Simplest approach, no session coordination needed
Option 2: Shared Session Store (Recommended for Active-Active) Implement Redis or database-backed session storage to enable true active-active multi-instance deployment.
Required Changes for Active-Active:
// Replace in-memory sessions (main.go:31-34) with Redis
var sessions = redis.NewSessionStore(redisClient)
Current Limitations
- Sessions are not shared: User authenticated on instance A cannot access instance B
- Worker registry is not shared: Each instance maintains its own worker list
- dy.fi updates may conflict: Multiple instances updating the same domain simultaneously
User Store File Locking (✅ Fixed)
As of the latest update, the user store uses file locking to prevent race conditions:
- Shared locks for reads (multiple readers allowed)
- Exclusive locks for writes (blocks all readers and writers)
- Atomic write-then-rename prevents corruption
- Safe for multi-instance deployment when instances share the same filesystem
Deployment Example (Active-Passive)
# Primary instance
./manager --port=8080 --domain=manager.dy.fi &
# Secondary instance (standby)
MANAGER_PORT=8081 ./manager &
# Load balancer health check both, route to active only
General Multi-Instance Recommendations
Health Checks
All services expose /health and /ready endpoints. Configure your load balancer to:
- Route traffic only to healthy instances
- Remove failed instances from rotation automatically
- Monitor
/metricsendpoint for Prometheus integration
Monitoring
Add instance_id labels to metrics for per-instance monitoring:
// Recommended enhancement for all services
var instanceID = os.Hostname()
File Locking
Services that write to shared storage should use file locking (like manager user store) to prevent corruption:
syscall.Flock(fd, syscall.LOCK_EX) // Exclusive lock
syscall.Flock(fd, syscall.LOCK_SH) // Shared lock
Network Considerations
- Latency: Place input_service close to ping workers to minimize polling latency
- Bandwidth: output_service should have sufficient bandwidth for result ingestion
- NAT Traversal: Use manager gateway mode for ping workers behind NAT
Troubleshooting Multi-Instance Deployments
Input Service: Duplicate Hops Served
Symptom: Same hop appears multiple times in different workers Cause: Hop deduplication is instance-local Solution: Implement session affinity or broadcast hop submissions
Manager: Sessions Lost After Reconnect
Symptom: User logged out when load balancer switches instances Cause: Sessions are in-memory, not shared Solution: Use session affinity in load balancer or implement shared session store
Output Service: Database Conflicts
Symptom: Database file corruption or lock timeouts
Cause: Multiple instances writing to same database file
Solution: Each instance MUST have its own --db-dir, then aggregate later
Ping Service: Excessive Pinging
Symptom: Same IP pinged too frequently
Cause: Too many workers with short cooldown period
Solution: Increase cooldown_minutes in config.yaml
Production Deployment Checklist
- Input service: Configure session affinity or hop broadcast
- Output service: Each instance has unique
--db-dir - Ping service: Cooldown duration accounts for total worker count
- Manager: Decide active-passive or implement shared sessions
- All services: Health check endpoints configured in load balancer
- All services: Metrics exported to monitoring system
- All services: Logs aggregated to central logging system
- File-based state: Shared filesystem or backup/sync strategy
- Database rotation: Automated collection of output service dumps
Future Enhancements
High Priority
- Shared session store for manager (Redis/database)
- Shared hop deduplication for input_service (Redis)
- Distributed worker coordination for ping_service cooldowns
Medium Priority
- Instance ID labels in metrics for better observability
- Graceful shutdown coordination to prevent data loss
- Health check improvements to verify actual functionality
Low Priority
- Automated database aggregation for output_service
- Service mesh integration (Consul, etcd) for discovery
- Horizontal autoscaling based on load metrics
Summary Table
| Service | Multi-Instance Ready | Session Affinity Needed | Shared Storage Needed | Notes |
|---|---|---|---|---|
| input_service | ⚠️ Partial | ✅ Yes (recommended) | ❌ No | Hop dedup is instance-local |
| output_service | ✅ Full | ❌ No | ❌ No | Each instance has own DB |
| ping_service | ✅ Full | ❌ No | ❌ No | Fully distributed by design |
| manager | ⚠️ Requires config | ✅ Yes (sessions) | ✅ Yes (user store) | Sessions in-memory; user store file-locked |
For questions or issues with multi-instance deployments, refer to the service-specific README files or open an issue in the project repository.