Makefile and agent start.
This commit is contained in:
55
agent/# Kattila Agent Implementation Plan.md
Normal file
55
agent/# Kattila Agent Implementation Plan.md
Normal file
@@ -0,0 +1,55 @@
|
||||
# Kattila Agent Implementation Plan
|
||||
|
||||
This document outlines the detailed architecture and implementation steps for the Go-based Kattila Agent.
|
||||
|
||||
## Overview
|
||||
The Kattila Agent continuously gathers network topology information from the host OS (using `ip` and `wg` commands), cryptographically signs the data, and pushes it to the Kattila Manager. If direct communication fails, it uses peer scanning to find a relay path.
|
||||
|
||||
## User Review Required
|
||||
> [!IMPORTANT]
|
||||
> - Do we assume `wg`, `ip` commands are always available in the `PATH` of the agent?
|
||||
> - The TXT record is returned with wrapping quotes (e.g., `"955f333e5b9cc..."`). The agent will strip these quotes. Is the PSK used exactly as-is for the HMAC key?
|
||||
> - For Wireguard peer scanning during relay fallback: Will the agent scan the *entire subnet* of `allowed ips` (e.g., `172.16.100.8/29`) to find other agents on port `5087`, or just guess based on endpoints? Scanning the small subnet is usually reliable.
|
||||
> - We should parse `wg show all dump` instead of raw `wg` if possible, since it's much easier and safer to parse TSV outputs programmatically. Is it okay to use `wg show all dump` instead of human-readable `wg`?
|
||||
|
||||
## Proposed Architecture / Packages
|
||||
|
||||
### 1. `config` Package
|
||||
- Responsibilities: Load `.env` file containing `DNS`, `MANAGER_URL`, etc. Provide access to environment configurations.
|
||||
- Store the agent's unique ID (which is generated and saved locally on first run to persist across restarts until `/status/reset`).
|
||||
|
||||
### 2. `security` Package
|
||||
- **Key Discovery**: Periodically resolve the TXT record of the configured `DNS` name to get the Bootstrap PSK. Strip any surrounding quotes. Keep a history of the current and two previous keys.
|
||||
- **HMAC Generation**: Provide a function to calculate `HMAC-SHA256` of JSON payloads using the current PSK.
|
||||
- **Nonce Generation**: Generate cryptographically secure base64 strings for the `nonce` field.
|
||||
|
||||
### 3. `network` Package
|
||||
- Execute OS commands and parse their outputs:
|
||||
- `ip -j a`: Parse the JSON output into `Interface` structs.
|
||||
- `ip -j -4 r`: Parse the JSON output into `Route` structs.
|
||||
- `wg show all dump`: Parse the TSV wireguard connections. If `wg` human-readable parsing is strictly required, we will build a custom state-machine parser for the provided format.
|
||||
- Maintain a gathering function `GatherStatus()` that bundles all these details into the expected `data` payload.
|
||||
|
||||
### 4. `api` Package (Agent HTTP Server)
|
||||
- Runs an HTTP server on `0.0.0.0:5087` using standard `net/http`.
|
||||
- Endpoints:
|
||||
- `GET /status/healthcheck`: Return `200 OK {"status": "ok"}`
|
||||
- `POST /status/reset`: Delete local `agent_id` state, delete internal cache, and trigger a fresh registration loop.
|
||||
- `GET /status/peer`: Return local network info so peers can decide routing paths.
|
||||
- `POST /status/relay`: Accept relay payloads, ensure own `agent_id` is not in `relay_path` (loop detection), and forward to manager.
|
||||
|
||||
### 5. `reporter` Package (Main Loop)
|
||||
- Triggers every 30 seconds.
|
||||
- Gathers data from `network` package.
|
||||
- Wraps it in the report envelope: `version`, `tick`, `type`, `nonce`, `timestamp`, `agent_id`, `hmac`.
|
||||
- Sends POST request to Manager.
|
||||
- **Relay Fallback**: On failure, queries local wireguard interfaces, pings port `5087` on known subnets, and attempts to find a working peer to relay through.
|
||||
|
||||
## Verification Plan
|
||||
### Automated testing
|
||||
- Write unit tests for parsing the provided `ip` and `wg` example files.
|
||||
- Write unit test for the PSK rotation logic.
|
||||
|
||||
### Manual Verification
|
||||
- Run the agent locally and verify the logs show successful gathering of interfaces and routes.
|
||||
- Force a bad manager URL and observe logs indicating relay peer scanning behavior.
|
||||
Reference in New Issue
Block a user