Files
ping_service/output_service/README.md
2026-01-08 12:11:26 +02:00

345 lines
8.6 KiB
Markdown

# Output Service
HTTP service that receives ping and traceroute results from distributed `ping_service` nodes, stores them in SQLite databases with automatic rotation, extracts intermediate hops from traceroute data, and feeds them back to `input_service`.
## Purpose
- **Data Collection**: Store ping results and traceroute paths from multiple ping_service instances
- **Hop Discovery**: Extract intermediate hop IPs from traceroute data
- **Feedback Loop**: Send discovered hops to input_service to grow the target pool organically
- **Data Management**: Automatic database rotation and retention policy
- **Observability**: Expose metrics and statistics for monitoring
## Features
- **Multi-Instance Ready**: Each instance maintains its own SQLite database
- **Automatic Rotation**: Databases rotate weekly OR when reaching 100MB (whichever first)
- **Retention Policy**: Keeps 5 most recent database files, auto-deletes older ones
- **Hop Deduplication**: Tracks sent hops to minimize duplicate network traffic to input_service
- **Manual Operations**: API endpoints for manual rotation and database dumps
- **Health Monitoring**: Prometheus metrics, stats, and health checks
## Requirements
- Go 1.25+
- SQLite3 (via go-sqlite3 driver)
## Building
```bash
cd output_service
go build -o output_service main.go
```
## Usage
### Basic
```bash
./output_service
```
Starts on port 8081 for results, port 8091 for health checks.
### With Custom Configuration
```bash
./output_service \
--port=8082 \
--health-port=8092 \
--input-url=http://input-service:8080/hops \
--db-dir=/var/lib/output_service \
--max-size-mb=200 \
--rotation-days=14 \
--keep-files=10 \
--verbose
```
### Command Line Flags
| Flag | Default | Description |
|------|---------|-------------|
| `--port` | 8081 | Port for receiving results |
| `--health-port` | 8091 | Port for health/metrics endpoints |
| `--input-url` | `http://localhost:8080/hops` | Input service URL for hop submission |
| `--db-dir` | `./output_data` | Directory for database files |
| `--max-size-mb` | 100 | Max database size (MB) before rotation |
| `--rotation-days` | 7 | Rotate database after N days |
| `--keep-files` | 5 | Number of database files to retain |
| `-v, --verbose` | false | Enable verbose logging |
| `--version` | - | Show version |
| `--help` | - | Show help |
## API Endpoints
### Main Service (Port 8081)
#### `POST /results`
Receive ping results from ping_service nodes.
**Request Body**: JSON array of ping results
```json
[
{
"ip": "8.8.8.8",
"sent": 4,
"received": 4,
"packet_loss": 0,
"avg_rtt": 15000000,
"timestamp": "2026-01-07T22:30:00Z",
"traceroute": {
"method": "icmp",
"completed": true,
"hops": [
{"ttl": 1, "ip": "192.168.1.1", "rtt": 2000000},
{"ttl": 2, "ip": "10.0.0.1", "rtt": 5000000},
{"ttl": 3, "ip": "8.8.8.8", "rtt": 15000000}
]
}
}
]
```
**Response**:
```json
{
"status": "ok",
"received": 1
}
```
#### `POST /rotate`
Manually trigger database rotation.
**Response**:
```json
{
"status": "rotated",
"file": "results_2026-01-07_22-30-45.db"
}
```
#### `GET /dump`
Download current SQLite database file.
**Response**: Binary SQLite database file
### Health Service (Port 8091)
#### `GET /health`
Overall health status and statistics.
**Response**:
```json
{
"status": "healthy",
"version": "0.0.1",
"uptime": "2h15m30s",
"stats": {
"total_results": 15420,
"successful_pings": 14890,
"failed_pings": 530,
"hops_discovered": 2341,
"hops_sent": 2341,
"last_result_time": "2026-01-07T22:30:15Z",
"current_db_file": "results_2026-01-07.db",
"current_db_size": 52428800,
"last_rotation": "2026-01-07T00:00:00Z"
}
}
```
#### `GET /ready`
Readiness check (verifies database connectivity).
**Response**: `200 OK` if ready, `503 Service Unavailable` if not
#### `GET /metrics`
Prometheus-compatible metrics.
**Response** (text/plain):
```
# HELP output_service_total_results Total number of results processed
# TYPE output_service_total_results counter
output_service_total_results 15420
# HELP output_service_successful_pings Total successful pings
# TYPE output_service_successful_pings counter
output_service_successful_pings 14890
...
```
#### `GET /stats`
Detailed statistics in JSON format.
**Response**: Same as `stats` object in `/health`
#### `GET /recent?limit=100&ip=8.8.8.8`
Query recent ping results.
**Query Parameters**:
- `limit` (optional): Max results to return (default 100, max 1000)
- `ip` (optional): Filter by specific IP address
**Response**:
```json
[
{
"id": 12345,
"ip": "8.8.8.8",
"sent": 4,
"received": 4,
"packet_loss": 0,
"avg_rtt": 15000000,
"timestamp": "2026-01-07T22:30:00Z"
}
]
```
## Database Schema
### `ping_results`
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| ip | TEXT | Target IP address |
| sent | INTEGER | Packets sent |
| received | INTEGER | Packets received |
| packet_loss | REAL | Packet loss percentage |
| avg_rtt | INTEGER | Average RTT (nanoseconds) |
| timestamp | DATETIME | Ping timestamp |
| error | TEXT | Error message if failed |
| created_at | DATETIME | Record creation time |
**Indexes**: `ip`, `timestamp`
### `traceroute_results`
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| ping_result_id | INTEGER | Foreign key to ping_results |
| method | TEXT | Traceroute method (icmp/tcp) |
| completed | BOOLEAN | Whether trace completed |
| error | TEXT | Error message if failed |
### `traceroute_hops`
| Column | Type | Description |
|--------|------|-------------|
| id | INTEGER | Primary key |
| traceroute_id | INTEGER | Foreign key to traceroute_results |
| ttl | INTEGER | Time-to-live / hop number |
| ip | TEXT | Hop IP address |
| rtt | INTEGER | Round-trip time (nanoseconds) |
| timeout | BOOLEAN | Whether hop timed out |
**Indexes**: `ip` (for hop discovery)
## Database Rotation
Rotation triggers automatically when **either** condition is met:
- **Time**: Database age exceeds `rotation_days` (default 7 days)
- **Size**: Database size exceeds `max_size_mb` (default 100MB)
Rotation process:
1. Close current database connection
2. Create new database with timestamp filename (`results_2026-01-07_22-30-45.db`)
3. Initialize schema in new database
4. Delete oldest database files if count exceeds `keep_files`
Manual rotation: `curl -X POST http://localhost:8081/rotate`
## Hop Discovery and Feedback
1. **Extraction**: For each traceroute, extract non-timeout hop IPs
2. **Deduplication**: Track sent hops in memory to avoid re-sending
3. **Submission**: HTTP POST to input_service `/hops` endpoint:
```json
{
"hops": ["10.0.0.1", "172.16.5.3", "8.8.8.8"]
}
```
4. **Statistics**: Track `hops_discovered` and `hops_sent` metrics
## Multi-Instance Deployment
Each output_service instance:
- Maintains its **own SQLite database** in `db_dir`
- Manages its **own rotation schedule** independently
- Tracks its **own hop deduplication** (some duplicate hop submissions across instances are acceptable)
- Can receive results from **multiple ping_service nodes**
For central data aggregation:
- Use `/dump` endpoint to collect database files from all instances
- Merge databases offline for analysis/visualization
- Or use shared network storage for `db_dir` (with file locking considerations)
## Integration with ping_service
Configure ping_service to send results to output_service:
**`config.yaml`** (ping_service):
```yaml
output_file: "http://output-service:8081/results"
```
## Integration with input_service
Output service expects input_service to have a `/hops` endpoint:
**Expected endpoint**: `POST /hops`
**Payload**:
```json
{
"hops": ["10.0.0.1", "172.16.5.3"]
}
```
## Monitoring
**Check health**:
```bash
curl http://localhost:8091/health
```
**View metrics**:
```bash
curl http://localhost:8091/metrics
```
**Query recent failures**:
```bash
curl 'http://localhost:8091/recent?limit=50' | jq '.[] | select(.error != null)'
```
**Download database backup**:
```bash
curl http://localhost:8081/dump -o backup.db
```
## Development Testing
Use the Python demo output server to see example data format:
```bash
cd output_service
python3 http_ouput_demo.py # Note: file has typo in name
```
## Graceful Shutdown
Press `Ctrl+C` for graceful shutdown with 10s timeout.
The service will:
1. Stop accepting new requests
2. Finish processing in-flight requests
3. Close database connections cleanly
4. Exit
## Version
Current version: **0.0.1**
## Dependencies
- `github.com/mattn/go-sqlite3` - SQLite driver (requires CGO)