# Kattila Agent Implementation Plan This document outlines the detailed architecture and implementation steps for the Go-based Kattila Agent. ## Overview The Kattila Agent continuously gathers network topology information from the host OS (using `ip` and `wg` commands), cryptographically signs the data, and pushes it to the Kattila Manager. If direct communication fails, it uses peer scanning to find a relay path. ## User Review Required > [!IMPORTANT] > - Do we assume `wg`, `ip` commands are always available in the `PATH` of the agent? > - The TXT record is returned with wrapping quotes (e.g., `"955f333e5b9cc..."`). The agent will strip these quotes. Is the PSK used exactly as-is for the HMAC key? > - For Wireguard peer scanning during relay fallback: Will the agent scan the *entire subnet* of `allowed ips` (e.g., `172.16.100.8/29`) to find other agents on port `5087`, or just guess based on endpoints? Scanning the small subnet is usually reliable. > - We should parse `wg show all dump` instead of raw `wg` if possible, since it's much easier and safer to parse TSV outputs programmatically. Is it okay to use `wg show all dump` instead of human-readable `wg`? ## Proposed Architecture / Packages ### 1. `config` Package - Responsibilities: Load `.env` file containing `DNS`, `MANAGER_URL`, etc. Provide access to environment configurations. - Store the agent's unique ID (which is generated and saved locally on first run to persist across restarts until `/status/reset`). ### 2. `security` Package - **Key Discovery**: Periodically resolve the TXT record of the configured `DNS` name to get the Bootstrap PSK. Strip any surrounding quotes. Keep a history of the current and two previous keys. - **HMAC Generation**: Provide a function to calculate `HMAC-SHA256` of JSON payloads using the current PSK. - **Nonce Generation**: Generate cryptographically secure base64 strings for the `nonce` field. ### 3. `network` Package - Execute OS commands and parse their outputs: - `ip -j a`: Parse the JSON output into `Interface` structs. - `ip -j -4 r`: Parse the JSON output into `Route` structs. - `wg show all dump`: Parse the TSV wireguard connections. If `wg` human-readable parsing is strictly required, we will build a custom state-machine parser for the provided format. - Maintain a gathering function `GatherStatus()` that bundles all these details into the expected `data` payload. ### 4. `api` Package (Agent HTTP Server) - Runs an HTTP server on `0.0.0.0:5087` using standard `net/http`. - Endpoints: - `GET /status/healthcheck`: Return `200 OK {"status": "ok"}` - `POST /status/reset`: Delete local `agent_id` state, delete internal cache, and trigger a fresh registration loop. - `GET /status/peer`: Return local network info so peers can decide routing paths. - `POST /status/relay`: Accept relay payloads, ensure own `agent_id` is not in `relay_path` (loop detection), and forward to manager. ### 5. `reporter` Package (Main Loop) - Triggers every 30 seconds. - Gathers data from `network` package. - Wraps it in the report envelope: `version`, `tick`, `type`, `nonce`, `timestamp`, `agent_id`, `hmac`. - Sends POST request to Manager. - **Relay Fallback**: On failure, queries local wireguard interfaces, pings port `5087` on known subnets, and attempts to find a working peer to relay through. ## Verification Plan ### Automated testing - Write unit tests for parsing the provided `ip` and `wg` example files. - Write unit test for the PSK rotation logic. ### Manual Verification - Run the agent locally and verify the logs show successful gathering of interfaces and routes. - Force a bad manager URL and observe logs indicating relay peer scanning behavior.