Files
BytbitBTC/analysis/DATABASE.md

249 lines
6.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Data Output Documentation: `analysis.db`
## 1. Database Overview
The `analysis.db` database is a **Derived Market Analysis Store**.
It contains **technical indicator outputs** calculated from the raw OHLCV candle data stored in `candles.db`.
This database is **append-only**, preserves full historical analysis, and is designed to act as a **clean input source** for downstream services such as:
* Signal generators
* Strategy evaluators
* Paper/live trading engines
* Dashboards and visualization layers
* Alerting systems
- **Database Engine:** SQLite 3
- **Concurrency Mode:** WAL (Write-Ahead Logging) enabled
- **Update Frequency:** Near real-time (continuous processing loop)
- **Retention Model:** Rolling window (currently ~1 month via cleanup + incremental vacuum)
---
## 2. Relationship to `candles.db`
This database is **fully derived** from `candles.db`.
* Each row in `analysis.db` corresponds **1:1** with a *closed* candle from `candles.db`
* No raw trade or OHLCV data is duplicated beyond what is needed for indicator outputs
* Only **completed candles** are analyzed
(the currently-forming candle per timeframe is intentionally excluded)
This separation ensures:
* Clean responsibility boundaries
* Safe concurrent reads
* Zero risk of contaminating raw market data
---
## 3. Schema Definition
The database contains a single primary table: `analysis`.
### Table: `analysis`
| Column | Type | Description |
| -------------- | --------- | -------------------------------------------------------------- |
| `timeframe` | `TEXT` | Candle timeframe: `1m`, `5m`, `15m`, `1h` |
| `timestamp` | `INTEGER` | Unix timestamp (seconds) **start of the candle window** |
| `ema_9` | `REAL` | Exponential Moving Average (9) |
| `ema_21` | `REAL` | Exponential Moving Average (21) |
| `sma_50` | `REAL` | Simple Moving Average (50) |
| `sma_200` | `REAL` | Simple Moving Average (200) |
| `rsi_14` | `REAL` | Relative Strength Index (14) |
| `macd` | `REAL` | MACD line (EMA12 EMA26) |
| `macd_signal` | `REAL` | MACD signal line (9-period EMA of MACD) |
| `macd_hist` | `REAL` | MACD histogram (`macd macd_signal`) |
| `bb_upper` | `REAL` | Bollinger Band upper (20, 2σ) |
| `bb_middle` | `REAL` | Bollinger Band middle (20 SMA) |
| `bb_lower` | `REAL` | Bollinger Band lower (20, 2σ) |
| `bb_squeeze` | `INTEGER` | Volatility squeeze flag (`1` = squeeze detected, `0` = normal) |
| `volume_ma_20` | `REAL` | 20-period moving average of volume |
### Primary Key
```
(timeframe, timestamp)
```
This guarantees:
* No duplicate indicator rows
* Perfect alignment with candle boundaries
* Deterministic joins with `candles.db`
---
## 4. Indicator Semantics
### Moving Averages
* **EMA 9 / 21**: Short-term momentum and trend confirmation
* **SMA 50 / 200**: Medium- and long-term trend structure
### RSI (14)
* Range: `0100`
* Typical interpretations:
* `>70` → Overbought
* `<30` → Oversold
* Calculated using classic Wilder-style average gains/losses
### MACD (12, 26, 9)
* `macd`: Momentum direction
* `macd_signal`: Smoothed momentum
* `macd_hist`: Acceleration / deceleration of momentum
### Bollinger Bands (20, 2)
* Measures volatility expansion and contraction
* Bands are calculated using 20-period SMA ± 2 standard deviations
#### Bollinger Squeeze (`bb_squeeze`)
* `1` when Bollinger Band width is at a **local minimum**
* Indicates **volatility compression**
* Often precedes large directional moves
* Designed for breakout-style strategies
### Volume MA (20)
* Used for:
* Breakout confirmation
* Divergence detection
* Trend strength validation
---
## 5. Time Handling & Candle Validity
* All timestamps represent the **start** of the candle window
* Example:
* A `1m` candle at `12:00:00` covers `12:00:00 → 12:00:59`
* Only **closed candles** are analyzed
* No partial or live candle values exist in this database
This makes `analysis.db` safe for:
* Backtesting
* Deterministic replay
* Strategy evaluation without repainting risk
---
## 6. Accessing the Data
### Recommended Connection Settings (Python)
```python
import sqlite3
import pandas as pd
def get_connection(db_path):
conn = sqlite3.connect(db_path, timeout=10)
conn.execute("PRAGMA journal_mode=WAL;")
return conn
```
---
### Common Query Patterns
#### Latest analysis for a timeframe
```sql
SELECT *
FROM analysis
WHERE timeframe = '5m'
ORDER BY timestamp DESC
LIMIT 1;
```
---
#### Last 200 RSI values (1-minute)
```sql
SELECT timestamp, rsi_14
FROM analysis
WHERE timeframe = '1m'
ORDER BY timestamp DESC
LIMIT 200;
```
---
#### Detect active Bollinger squeezes
```sql
SELECT timeframe, timestamp
FROM analysis
WHERE bb_squeeze = 1
ORDER BY timestamp DESC;
```
---
#### Join with raw candles for strategy logic
```sql
SELECT
c.timestamp,
c.close,
a.ema_21,
a.rsi_14,
a.macd_hist
FROM candles c
JOIN analysis a
ON c.timeframe = a.timeframe
AND c.timestamp = a.timestamp
WHERE c.timeframe = '15m'
ORDER BY c.timestamp DESC
LIMIT 100;
```
---
## 7. Retention, Cleanup & Vacuuming
* The analysis engine enforces a **rolling ~1-month window**
* Rows older than the cutoff are deleted
* `PRAGMA incremental_vacuum` is used to reclaim disk space gradually
This avoids:
* Long blocking `VACUUM` operations
* Database bloat over long-running operation
---
## 8. Intended Usage Patterns
`analysis.db` is designed to be:
* **Read-heavy**
* **Safe for multiple consumers**
* **Stable and deterministic**
Typical consumers include:
* Signal engines (rule-based or ML-driven)
* Strategy backtesters
* Alert pipelines
* Visualization / dashboards
* Risk and regime detection modules
---
## 9. Design Philosophy
* **Separation of concerns**: raw data vs. derived signals
* **Determinism first**: no repainting, no live-candle noise
* **SQLite-friendly**: WAL mode, append-only, incremental vacuum
* **Composable**: easy joins, easy extensions, predictable schema