Made the onramp in Go with way better architecture. Created onramp/DATABASE.md to help with next development cycle.

2026-01-13 22:51:43 +02:00
parent ac4f2cfcc8
commit 5d2aa8f499
6 changed files with 474 additions and 54 deletions
--- a/onramp/DATABASE.md
+++ b/onramp/DATABASE.md
@@ -0,0 +1,124 @@
+# Data Source Documentation: `candles.db`
+
+## 1. Database Overview
+The database is an **Aggregated Trade Store**. Instead of storing millions of individual trades (which are kept in the raw `.jsonl` files), this database stores **OHLCV** (Open, High, Low, Close, Volume) data across multiple timeframes.
+
+*   **Database Engine:** SQLite 3
+*   **Concurrency Mode:** WAL (Write-Ahead Logging) enabled.
+*   **Update Frequency:** Real-time (updated as trades arrive).
+
+---
+
+## 2. Schema Definition
+
+The database contains a single primary table: `candles`.
+
+### Table: `candles`
+| Column | Type | Description |
+| :--- | :--- | :--- |
+| `timeframe` | `TEXT` | The aggregation window: `1m`, `5m`, `15m`, or `1h`. |
+| `timestamp` | `INTEGER` | Unix Timestamp (seconds) representing the **start** of the candle. |
+| `open` | `REAL` | Price of the first trade in this window. |
+| `high` | `REAL` | Highest price reached during this window. |
+| `low` | `REAL` | Lowest price reached during this window. |
+| `close` | `REAL` | Price of the last trade received for this window. |
+| `volume` | `REAL` | Total base currency (BTC) volume traded. |
+| `buy_volume` | `REAL` | Total volume from trades marked as "Buy" (Taker Buy). |
+
+**Primary Key:** `(timeframe, timestamp)`
+*This ensures no duplicate candles exist for the same timeframe and time slot.*
+
+---
+
+## 3. Key Data Logic
+
+### Buy/Sell Pressure
+Unlike standard exchange OHLCV, this database includes `buy_volume`. 
+*   **Sell Volume** = `volume - buy_volume`.
+*   **Net Flow** = `buy_volume - (volume - buy_volume)`.
+*   **Buy Ratio** = `buy_volume / volume`.
+
+### Candle Completion
+Because the `onramp` service tails a live file, the **latest** candle for any timeframe is "unstable." It will continue to update until the next time window begins. Your analysis engine should account for this by either:
+1.  Filtering for `timestamp < current_window_start` (to get only closed candles).
+2.  Treating the latest row as "Live" data.
+
+---
+
+## 4. Accessing the Data
+
+### Recommended Connection Settings (Python/Analysis Engine)
+Since the `onramp` service is constantly writing to the database, you **must** use specific flags to avoid "Database is locked" errors.
+
+```python
+import sqlite3
+import pandas as pd
+
+def get_connection(db_path):
+    # Connect with a timeout to wait for the writer to finish
+    conn = sqlite3.connect(db_path, timeout=10)
+    # Enable WAL mode for high-concurrency reading
+    conn.execute("PRAGMA journal_mode=WAL;")
+    return conn
+```
+
+### Common Query Patterns
+
+**Get the last 100 closed 1-minute candles:**
+```sql
+SELECT * FROM candles 
+WHERE timeframe = '1m' 
+ORDER BY timestamp DESC 
+LIMIT 100;
+```
+
+**Calculate 5-minute volatility (High-Low) over the last hour:**
+```sql
+SELECT timestamp, (high - low) as volatility 
+FROM candles 
+WHERE timeframe = '5m' 
+AND timestamp > (strftime('%s', 'now') - 3600)
+ORDER BY timestamp ASC;
+```
+
+---
+
+## 5. Integration with Analysis Engine (Pandas Example)
+
+If you are building an analysis engine in Python, this is the most efficient way to load data for processing:
+
+```python
+import pandas as pd
+import sqlite3
+
+DB_PATH = "path/to/your/candles.db"
+
+def load_candles(timeframe="1m", limit=1000):
+    conn = sqlite3.connect(DB_PATH)
+    query = f"""
+        SELECT * FROM candles 
+        WHERE timeframe = ? 
+        ORDER BY timestamp DESC 
+        LIMIT ?
+    """
+    df = pd.read_sql_query(query, conn, params=(timeframe, limit))
+    conn.close()
+    
+    # Convert timestamp to readable datetime
+    df['datetime'] = pd.to_datetime(df['timestamp'], unit='s')
+    
+    # Sort back to chronological order for analysis
+    return df.sort_values('timestamp').reset_index(drop=True)
+
+# Usage
+df = load_candles("1m")
+print(df.tail())
+```
+
+---
+
+## 6. Maintenance & Performance Notes
+
+1.  **Index Optimization:** The Primary Key already creates an index on `(timeframe, timestamp)`. This makes queries filtered by timeframe and sorted by time extremely fast.
+2.  **Storage:** SQLite handles millions of rows easily. However, if the database exceeds several gigabytes, you may want to run `VACUUM;` occasionally (though this requires stopping the `onramp` service briefly).
+3.  **Backups:** You can safely copy the `candles.db` file while the system is running, provided you also copy the `candles.db-wal` and `candles.db-shm` files (or use the SQLite `.backup` command).