Bug fixes in input and onramp. Hot config reload on signals. Added example utility scripts for signals.

This commit is contained in:
Kalzu Rekku
2026-01-17 14:47:13 +02:00
parent 7d7038d6bd
commit aa216981d2
16 changed files with 2339 additions and 306 deletions

173
input/README.md Normal file
View File

@@ -0,0 +1,173 @@
# WebSocket Streamer (JSONL Logger)
This Go program connects to a WebSocket endpoint, subscribes to a topic, and continuously writes incoming messages to hourly-rotated `.jsonl` files.
It is designed for **long-running, low-overhead data capture** with basic observability and operational safety.
Typical use cases include:
* Market data capture (e.g. crypto trades)
* Event stream archiving
* Lightweight ingestion on small servers (VPS, Raspberry Pi, etc.)
---
## Features
* WebSocket client with automatic reconnect
* Topic subscription (configurable)
* Hourly file rotation (`.jsonl` format)
* Buffered channel to decouple network I/O from disk writes
* Atomic message counters
* Periodic status logging
* Unix domain socket for live status queries
* Graceful shutdown on SIGINT / SIGTERM
* Configurable logging (file and/or stdout)
---
## How It Works
1. Connects to a WebSocket endpoint
2. Sends a subscription message for the configured topic
3. Maintains connection with periodic `ping`
4. Reads messages and pushes them into a buffered channel
5. Writes messages line-by-line to hourly JSONL files
6. Exposes runtime status via:
* Periodic log output
* Unix socket query (`--status` mode)
---
## Output Format
Messages are written **verbatim** as received, one JSON object per line:
```
output/
├── publicTrade.BTCUSDT_1700000000.jsonl
├── publicTrade.BTCUSDT_1700003600.jsonl
└── ...
```
Each file contains data for exactly one UTC hour.
---
## Configuration
The application is configured via a JSON file.
### Example `config.json`
```json
{
"output_dir": "./output",
"topic": "publicTrade.BTCUSDT",
"ws_url": "wss://stream.bybit.com/v5/public/linear",
"buffer_size": 10000,
"status_interval": 30,
"log_file": "system.log",
"log_to_stdout": false,
"status_socket": "/tmp/streamer.sock"
}
```
### Configuration Fields
| Field | Description |
| ----------------- | ----------------------------------- |
| `output_dir` | Directory for JSONL output files |
| `topic` | WebSocket subscription topic |
| `ws_url` | WebSocket endpoint URL |
| `buffer_size` | Size of internal message buffer |
| `status_interval` | Seconds between status log messages |
| `log_file` | Log file path |
| `log_to_stdout` | Also log to stdout |
| `status_socket` | Unix socket path for status queries |
Defaults are applied automatically if fields are omitted.
---
## Command Line Flags
| Flag | Description |
| --------- | -------------------------------------------- |
| `-config` | Path to config file (default: `config.json`) |
| `-debug` | Force logs to stdout (overrides config) |
| `-status` | Query running instance status and exit |
---
## Running the Streamer
```bash
go run main.go -config config.json
```
Or build a binary:
```bash
go build -o streamer
./streamer -config config.json
```
---
## Querying Runtime Status
While the streamer is running:
```bash
./streamer -status -config config.json
```
Example output:
```
Uptime: 12m34s | Total Msgs: 152340 | Rate: 7260.12 msg/min
```
This works via a Unix domain socket and does **not** interrupt the running process.
---
## Logging
* Logs are written to `log_file`
* Optional stdout logging for debugging
* Includes:
* Startup information
* Connection errors and reconnects
* Buffer overflow warnings
* Periodic status summaries
---
## Graceful Shutdown
On `SIGINT` or `SIGTERM`:
* WebSocket connection closes
* Status socket is removed
* Current output file is flushed and closed
Safe to run under systemd, Docker, or supervisord.
---
## Dependencies
* Go 1.20+
* [`github.com/gorilla/websocket`](https://github.com/gorilla/websocket)
---
## Notes & Design Choices
* **JSONL** is used for easy streaming, compression, and downstream processing
* Hourly rotation avoids large files and simplifies retention policies
* Unix socket status avoids HTTP overhead and exposed ports
* Minimal memory footprint, suitable for low-end machines