ping_service/project.md

# Ping Service – Distributed Internet Network Mapper

## Overview

Ping Service is an experimental **distributed internet mapping system** designed to observe, learn, and visualize how packets traverse the internet.

Multiple geographically and topologically diverse servers cooperate to run **pings and traceroutes** against a shared and continuously evolving target set. The discovered network hops are fed back into the system as new targets, allowing the mapper to *grow organically* and track **routing changes over time**.

The end goal is an **auto-updating map of internet routes**, their stability, and how they change.

This repository contains all components in a single Git repo: **`ping_service`**.

All the code is MIT licensed.

---

## Core Idea

1. Start with a large bootstrap list of IP addresses (currently ~19,000 cloud provider IPs).
2. Distributed nodes ping these targets.
3. Targets that respond reliably are tracerouted.
4. Intermediate hops discovered via traceroute are extracted.
5. New hops are shared back into the system as fresh targets.
6. Over time, this builds a continuously updating graph of routes and paths.

The system is intentionally **decentralized, fault-tolerant, and latency-tolerant**, reflecting real-world residential and low-end hosting environments.

---

## Architecture

The system is composed of four main parts:

### 1. `ping_service`

The worker agent running on every participating node.

Responsibilities:

* Execute ICMP/TCP pings
* Apply per-IP cooldowns
* Optionally run traceroute on successful pings
* Output structured JSON results
* Expose health and metrics endpoints

This component is designed to run unattended under **systemd** on Debian-based systems.

---

### 2. `input_service`

Responsible for **feeding targets** into the system.

Responsibilities:

* Provide IPs from files, HTTP endpoints, or other sources
* Accept newly discovered hop IPs from the output pipeline
* Act as a simple shared job source for workers

---

### 3. `output_service`

Processes results coming from `ping_service` nodes.

Responsibilities:

* Store ping and traceroute results in a mapping-friendly format
* Extract intermediate hops from traceroute data
* Forward newly discovered hops back into `input_service`

This component is the bridge between **measurement** and **graph growth**.

---

### 4. `manager`

A centralized control and visibility plane.

Responsibilities:

* Web UI for observing system state
* Control and coordination of job execution
* Certificate and crypto handling
* Storage and templating

The manager may also evolve into a **viewer-only frontend** for map visualization.

---

## Repository Layout

```
ping_service/
├── config.yaml
├── go.mod
├── install.sh
├── ping_service.go
├── ping_service.service
├── README.md
│
├── input_service/
│   ├── http_input_demo.py
│   ├── http_input_service.go
│   └── README.md
│
├── output_service/
│   ├── http_output_demo.py
│   └── README.md
│
└── manager/
    ├── main.go
    ├── store.go
    ├── logger.go
    ├── template.go
    ├── crypto.go
    ├── cert.go
    ├── dyfi.go
    ├── gr.go
    ├── README.md
    └── go.mod
```

---

## Technology Choices

* **Languages**: Go, Python 3
* **OS**: Debian-based Linux (systemd assumed)
* **Networking**:

  * ICMP & TCP traceroute
  * WireGuard VPN interconnect between nodes
* **Deployment style**:

  * Long-running services
  * Designed for unreliable environments

---

## Network Reality & Constraints

The system is intentionally designed around *imperfect infrastructure*:

* Nodes include:

  * Raspberry Pi 3 / 4
  * Low-core amd64 servers
  * Cheap VPS instances
* Network conditions:

  * Some nodes behind consumer NAT
  * Some nodes on 4G/LTE connections
  * At least one node cannot receive external ICMP
* Availability:

  * Nodes may disappear without warning
  * Power and connectivity are not guaranteed

**Resilience is a core design requirement.**

---

## Distributed Design Goals

* Nodes can join and leave freely
* Partial failures are expected and tolerated
* Latency variations are normal
* No assumption of always-online workers
* Central components should degrade gracefully

The system must continue operating even when:

* Only a subset of nodes are reachable
* Some nodes cannot perform ICMP
* Network paths fluctuate

---

## Future Expansion

* Allow external contributors to run **only `ping_service`**
* Reduce assumptions about node ownership
* Improve trust, isolation, and input validation
* Add permissions or scoped job execution

---

## Visualization (Open Problem)

There is currently **no finalized design** for route visualization.

Open questions:

* Static vs real-time maps
* Graph layout for internet-scale paths
* Time-based route change visualization
* Data reduction and aggregation strategies

This is an explicit area for future experimentation.

---

## Bootstrapping Strategy

Initial targets are sourced from:

* Public cloud provider IP address lists (~19,000 IPs)

From there, the system relies on:

* Reliability scoring
* Traceroute hop discovery
* Feedback loops into the input pipeline

---

## Project Status

* Functional distributed ping + traceroute workers
* Basic input and output services
* Central manager with early UI and control logic
* Mapping and visualization still exploratory

---

## Project Vision (Short)

> *Build a living, distributed map of the internet—measured from the edges, shaped by reality, and resilient to failure.*