MrKalzu/ping_service

Fork 0

Files

Kalzu Rekku 1130b7fb8c Fixed few memory leaks. Implement testing of the functionality.

2026-01-08 18:55:32 +02:00

10 KiB

Raw Permalink Blame History

Multi-Instance Deployment Guide

This document provides guidance for deploying multiple instances of each service for high availability and scalability.

Overview

All services in this distributed network mapping system are designed to support multi-instance deployments, but each has specific considerations and limitations.

Input Service (input_service/)

Multi-Instance Readiness: ⚠️ Partially Ready

How It Works

Each instance maintains its own per-consumer state and CIDR generators
State is stored locally in progress_state/ directory
Global hop deduplication (globalSeen map) is instance-local

Multi-Instance Deployment Strategies

Option 1: Session Affinity (Recommended)

Load Balancer (with sticky sessions based on source IP)
    ├── input_service instance 1
    ├── input_service instance 2
    └── input_service instance 3

Configure load balancer to route each ping worker to the same input_service instance
Ensures per-consumer state consistency
Simple to implement and maintain

Option 2: Broadcast Hop Submissions

output_service ---> POST /hops ---> ALL input_service instances

Modify output_service to POST discovered hops to all input_service instances instead of just one. This ensures hop deduplication works across instances.

Option 3: Shared Deduplication Backend (Future Enhancement) Implement Redis or database-backed globalSeen storage so all instances share deduplication state.

Known Limitations

Hop deduplication is instance-local: Different instances may serve duplicate hops if output_service sends hops to only one instance
Per-consumer state is instance-local: If a consumer switches instances, it gets a new generator and starts from the beginning
CIDR files must be present on all instances: The cloud-provider-ip-addresses/ directory must exist on each instance

Deployment Example

# Instance 1
./http_input_service &

# Instance 2 (different port)
PORT=8081 ./http_input_service &

# Load balancer (nginx example)
upstream input_service {
    ip_hash;  # Session affinity
    server 127.0.0.1:8080;
    server 127.0.0.1:8081;
}

Output Service (output_service/)

Multi-Instance Readiness: ✅ Fully Ready

How It Works

Each instance maintains its own SQLite database
Databases are independent and can be aggregated later
sentHops deduplication is instance-local with 24-hour TTL

Multi-Instance Deployment

ping_service workers ---> Load Balancer ---> output_service instances

No session affinity required
Each instance stores results independently
Use /dump endpoint to collect databases from all instances for aggregation

Aggregation Strategy

# Collect databases from all instances
curl http://instance1:8091/dump > instance1.db
curl http://instance2:8091/dump > instance2.db
curl http://instance3:8091/dump > instance3.db

# Merge using sqlite3
sqlite3 merged.db <<EOF
ATTACH 'instance1.db' AS db1;
ATTACH 'instance2.db' AS db2;
ATTACH 'instance3.db' AS db3;

INSERT INTO ping_results SELECT * FROM db1.ping_results;
INSERT INTO ping_results SELECT * FROM db2.ping_results;
INSERT INTO ping_results SELECT * FROM db3.ping_results;

INSERT INTO traceroute_hops SELECT * FROM db1.traceroute_hops;
INSERT INTO traceroute_hops SELECT * FROM db2.traceroute_hops;
INSERT INTO traceroute_hops SELECT * FROM db3.traceroute_hops;
EOF

Deployment Example

# Instance 1
./output_service --port=8081 --health-port=8091 --db-dir=/data/output1 &

# Instance 2
./output_service --port=8082 --health-port=8092 --db-dir=/data/output2 &

# Instance 3
./output_service --port=8083 --health-port=8093 --db-dir=/data/output3 &

Ping Service (ping_service/)

Multi-Instance Readiness: ✅ Fully Ready

How It Works

Designed from the ground up for distributed operation
Each worker independently polls input_service and submits results
Cooldown cache is instance-local (intentional - distributed workers coordinate via cooldown duration)

Multi-Instance Deployment

input_service <--- ping_service workers (many instances)
                         |
                         v
                  output_service

Deploy as many workers as needed across different networks/locations
Workers can run on Raspberry Pis, VPS, cloud instances, etc.
No coordination required between workers

Deployment Example

# Worker 1 (local network)
./ping_service -config config.yaml &

# Worker 2 (VPS)
ssh vps1 "./ping_service -config config.yaml" &

# Worker 3 (different geographic location)
ssh vps2 "./ping_service -config config.yaml" &

Manager (manager/)

Multi-Instance Readiness: ⚠️ Requires Configuration

How It Works

Session store is in-memory (not shared across instances)
User store uses file-based storage with file locking (multi-instance safe as of latest update)
Worker registry is instance-local

Multi-Instance Deployment Strategies

Option 1: Active-Passive with Failover

Load Balancer (active-passive)
    ├── manager instance 1 (active)
    └── manager instance 2 (standby)

Only one instance active at a time
Failover on primary failure
Simplest approach, no session coordination needed

Option 2: Shared Session Store (Recommended for Active-Active) Implement Redis or database-backed session storage to enable true active-active multi-instance deployment.

Required Changes for Active-Active:

// Replace in-memory sessions (main.go:31-34) with Redis
var sessions = redis.NewSessionStore(redisClient)

Current Limitations

Sessions are not shared: User authenticated on instance A cannot access instance B
Worker registry is not shared: Each instance maintains its own worker list
dy.fi updates may conflict: Multiple instances updating the same domain simultaneously

User Store File Locking (✅ Fixed)

As of the latest update, the user store uses file locking to prevent race conditions:

Shared locks for reads (multiple readers allowed)
Exclusive locks for writes (blocks all readers and writers)
Atomic write-then-rename prevents corruption
Safe for multi-instance deployment when instances share the same filesystem

Deployment Example (Active-Passive)

# Primary instance
./manager --port=8080 --domain=manager.dy.fi &

# Secondary instance (standby)
MANAGER_PORT=8081 ./manager &

# Load balancer health check both, route to active only

General Multi-Instance Recommendations

Health Checks

All services expose /health and /ready endpoints. Configure your load balancer to:

Route traffic only to healthy instances
Remove failed instances from rotation automatically
Monitor /metrics endpoint for Prometheus integration

Monitoring

Add instance_id labels to metrics for per-instance monitoring:

// Recommended enhancement for all services
var instanceID = os.Hostname()

File Locking

Services that write to shared storage should use file locking (like manager user store) to prevent corruption:

syscall.Flock(fd, syscall.LOCK_EX)  // Exclusive lock
syscall.Flock(fd, syscall.LOCK_SH)  // Shared lock

Network Considerations

Latency: Place input_service close to ping workers to minimize polling latency
Bandwidth: output_service should have sufficient bandwidth for result ingestion
NAT Traversal: Use manager gateway mode for ping workers behind NAT

Troubleshooting Multi-Instance Deployments

Input Service: Duplicate Hops Served

Symptom: Same hop appears multiple times in different workers Cause: Hop deduplication is instance-local Solution: Implement session affinity or broadcast hop submissions

Manager: Sessions Lost After Reconnect

Symptom: User logged out when load balancer switches instances Cause: Sessions are in-memory, not shared Solution: Use session affinity in load balancer or implement shared session store

Output Service: Database Conflicts

Symptom: Database file corruption or lock timeouts Cause: Multiple instances writing to same database file Solution: Each instance MUST have its own --db-dir, then aggregate later

Ping Service: Excessive Pinging

Symptom: Same IP pinged too frequently Cause: Too many workers with short cooldown period Solution: Increase cooldown_minutes in config.yaml

Production Deployment Checklist

Input service: Configure session affinity or hop broadcast
Output service: Each instance has unique --db-dir
Ping service: Cooldown duration accounts for total worker count
Manager: Decide active-passive or implement shared sessions
All services: Health check endpoints configured in load balancer
All services: Metrics exported to monitoring system
All services: Logs aggregated to central logging system
File-based state: Shared filesystem or backup/sync strategy
Database rotation: Automated collection of output service dumps

Future Enhancements

High Priority

Shared session store for manager (Redis/database)
Shared hop deduplication for input_service (Redis)
Distributed worker coordination for ping_service cooldowns

Medium Priority

Instance ID labels in metrics for better observability
Graceful shutdown coordination to prevent data loss
Health check improvements to verify actual functionality

Low Priority

Automated database aggregation for output_service
Service mesh integration (Consul, etcd) for discovery
Horizontal autoscaling based on load metrics

Summary Table

Service	Multi-Instance Ready	Session Affinity Needed	Shared Storage Needed	Notes
input_service	⚠️ Partial	✅ Yes (recommended)	❌ No	Hop dedup is instance-local
output_service	✅ Full	❌ No	❌ No	Each instance has own DB
ping_service	✅ Full	❌ No	❌ No	Fully distributed by design
manager	⚠️ Requires config	✅ Yes (sessions)	✅ Yes (user store)	Sessions in-memory; user store file-locked

For questions or issues with multi-instance deployments, refer to the service-specific README files or open an issue in the project repository.

10 KiB Raw Permalink Blame History

Multi-Instance Deployment Guide

Overview

Input Service (input_service/)

Multi-Instance Readiness: ⚠️ Partially Ready

How It Works

Multi-Instance Deployment Strategies

Known Limitations

Deployment Example

Output Service (output_service/)

Multi-Instance Readiness: ✅ Fully Ready

How It Works

Multi-Instance Deployment

Aggregation Strategy

Deployment Example

Ping Service (ping_service/)

Multi-Instance Readiness: ✅ Fully Ready

How It Works

Multi-Instance Deployment

Deployment Example

Manager (manager/)

Multi-Instance Readiness: ⚠️ Requires Configuration

How It Works

Multi-Instance Deployment Strategies

Current Limitations

User Store File Locking (✅ Fixed)

Deployment Example (Active-Passive)

General Multi-Instance Recommendations

Health Checks

Monitoring

File Locking

Network Considerations

Troubleshooting Multi-Instance Deployments

Input Service: Duplicate Hops Served

Manager: Sessions Lost After Reconnect

Output Service: Database Conflicts

Ping Service: Excessive Pinging

Production Deployment Checklist

Future Enhancements

High Priority

Medium Priority

Low Priority

Summary Table

10 KiB

Raw Permalink Blame History