sliding_sqlite/usage_by_grok.md
2025-03-16 10:30:47 +02:00

501 lines
16 KiB
Markdown

# SlidingSQLite Usage Documentation
This document provides detailed instructions on how to use the `SlidingSQLite` library, including its API, configuration options, and best practices.
## Table of Contents
1. [Overview](#overview)
2. [Installation](#installation)
3. [Configuration](#configuration)
4. [Basic Usage](#basic-usage)
- [Initializing the Database](#initializing-the-database)
- [Executing Write Queries](#executing-write-queries)
- [Executing Read Queries](#executing-read-queries)
- [Retrieving Results](#retrieving-results)
- [Shutting Down](#shutting-down)
5. [Advanced Usage](#advanced-usage)
- [Multi-Threaded Applications](#multi-threaded-applications)
- [Managing Database Retention](#managing-database-retention)
- [Customizing Cleanup](#customizing-cleanup)
- [Querying Across Time Windows](#querying-across-time-windows)
6. [API Reference](#api-reference)
7. [Error Handling](#error-handling)
8. [Best Practices](#best-practices)
9. [Example](#example)
## Overview
`SlidingSQLite` is a thread-safe SQLite wrapper that supports time-based database rotation, making it ideal for applications that need to manage time-series data or logs with automatic cleanup. It provides asynchronous query execution, automatic database rotation, and retention policies, all while ensuring thread safety through a queue-based worker system.
## Installation
To use `SlidingSQLite`, ensure you have Python 3.7 or higher installed. The library uses only the standard library and SQLite, which is included with Python.
1. Copy the `SlidingSqlite.py` file into your project directory.
2. Import the `SlidingSQLite` class in your Python code:
```python
from SlidingSqlite import SlidingSQLite
```
## Configuration
The `SlidingSQLite` class is initialized with several configuration parameters:
- **`db_dir`**: Directory where database files will be stored.
- **`schema`**: SQL schema to initialize new database files (e.g., table definitions).
- **`rotation_interval`**: Time interval (in seconds) after which a new database file is created (default: 3600 seconds, or 1 hour).
- **`retention_period`**: Time period (in seconds) to retain database files before deletion (default: 604800 seconds, or 7 days).
- **`cleanup_interval`**: Frequency (in seconds) of the cleanup process for old databases and stale queries (default: 3600 seconds, or 1 hour).
- **`auto_delete_old_dbs`**: Boolean flag to enable or disable automatic deletion of old databases (default: `True`).
Example configuration:
```python
schema = """
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp REAL,
message TEXT
);
"""
db = SlidingSQLite(
db_dir="./databases",
schema=schema,
rotation_interval=3600, # Rotate every hour
retention_period=604800, # Keep databases for 7 days
cleanup_interval=3600, # Run cleanup every hour
auto_delete_old_dbs=True
)
```
## Basic Usage
### Initializing the Database
Create an instance of `SlidingSQLite` with your desired configuration. This will set up the database directory, initialize the metadata database, and start the background workers for write operations and cleanup.
```python
from SlidingSqlite import SlidingSQLite
import logging
logging.basicConfig(level=logging.INFO)
schema = """
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp REAL,
message TEXT
);
"""
db = SlidingSQLite(
db_dir="./databases",
schema=schema
)
```
### Executing Write Queries
Use the `execute_write` method to perform write operations (e.g., `INSERT`, `UPDATE`, `DELETE`). This method is asynchronous and returns a UUID that can be used to retrieve the result.
```python
import time
query_id = db.execute_write(
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
(time.time(), "Hello, SlidingSQLite!")
)
```
For synchronous execution, use `execute_write_sync`, which blocks until the operation completes or times out:
```python
result = db.execute_write_sync(
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
(time.time(), "Synchronous write"),
timeout=5.0
)
if result.success:
logging.info("Write operation successful")
else:
logging.error(f"Write operation failed: {result.error}")
```
### Executing Read Queries
Use the `execute_read` method to perform read operations (e.g., `SELECT`). This method executes the query across all relevant database files, providing a seamless view of time-windowed data. It is asynchronous and returns a UUID.
```python
query_id = db.execute_read(
"SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
(time.time() - 86400,) # Last 24 hours
)
```
For synchronous execution, use `execute_read_sync`:
```python
result = db.execute_read_sync(
"SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
(time.time() - 86400,),
timeout=5.0
)
if result.success:
logging.info(f"Found {len(result.data)} log entries: {result.data}")
else:
logging.error(f"Read operation failed: {result.error}")
```
### Retrieving Results
For asynchronous operations, use `get_result` (for write queries) or `get_read_result` (for read queries) to retrieve the results using the UUID returned by `execute_write` or `execute_read`.
```python
# Write result
result = db.get_result(query_id, timeout=5.0)
if result.success:
logging.info("Write operation successful")
else:
logging.error(f"Write operation failed: {result.error}")
# Read result
result = db.get_read_result(query_id, timeout=5.0)
if result.success:
logging.info(f"Found {len(result.data)} log entries: {result.data}")
else:
logging.error(f"Read operation failed: {result.error}")
```
### Shutting Down
Always call the `shutdown` method when you are done with the database to ensure graceful cleanup of resources:
```python
db.shutdown()
```
## Advanced Usage
### Multi-Threaded Applications
`SlidingSQLite` is designed for multi-threaded environments. It uses queues and locks to ensure thread safety. Here is an example of using multiple writer and reader threads:
```python
import threading
import time
import random
from SlidingSqlite import SlidingSQLite
import logging
logging.basicConfig(level=logging.INFO)
schema = """
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp REAL,
message TEXT
);
"""
db = SlidingSQLite(
db_dir="./databases",
schema=schema,
rotation_interval=10, # Rotate every 10 seconds for testing
retention_period=60, # Keep databases for 60 seconds
cleanup_interval=30 # Run cleanup every 30 seconds
)
def writer_thread():
while True:
db.execute_write(
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
(time.time(), f"Message from thread {threading.current_thread().name}")
)
time.sleep(random.uniform(0.05, 0.15))
def reader_thread():
while True:
result = db.execute_read_sync(
"SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
timeout=5.0
)
if result.success:
logging.info(f"Recent logs: {result.data}")
time.sleep(random.uniform(0.5, 1.5))
threads = []
for _ in range(4): # Start 4 writer threads
t = threading.Thread(target=writer_thread, daemon=True)
t.start()
threads.append(t)
for _ in range(2): # Start 2 reader threads
t = threading.Thread(target=reader_thread, daemon=True)
t.start()
threads.append(t)
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print("\nShutting down...")
db.shutdown()
```
### Managing Database Retention
You can configure the retention period and control database deletion:
- **Set Retention Period**: Use `set_retention_period` to change how long databases are kept:
```python
db.set_retention_period(86400) # Keep databases for 1 day
```
- **Enable/Disable Auto-Delete**: Use `set_auto_delete` to control automatic deletion of old databases:
```python
db.set_auto_delete(False) # Disable automatic deletion
```
- **Manual Deletion**: Use `delete_databases_before` or `delete_databases_in_range` to manually delete databases:
```python
import time
# Delete all databases before a specific timestamp
count = db.delete_databases_before(time.time() - 86400)
logging.info(f"Deleted {count} databases")
# Delete databases in a specific time range
count = db.delete_databases_in_range(time.time() - 172800, time.time() - 86400)
logging.info(f"Deleted {count} databases in range")
```
### Customizing Cleanup
You can adjust the cleanup interval to control how often the system checks for old databases and stale queries:
```python
db = SlidingSQLite(
db_dir="./databases",
schema=schema,
cleanup_interval=1800 # Run cleanup every 30 minutes
)
```
### Querying Across Time Windows
Read queries are automatically executed across all relevant database files, providing a unified view of data across time windows. This is particularly useful for time-series data or logs. For example:
```python
result = db.execute_read_sync(
"SELECT timestamp, message FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
(time.time() - 604800,) # Last 7 days
)
if result.success:
logging.info(f"Found {len(result.data)} log entries: {result.data}")
```
## API Reference
### `SlidingSQLite` Class
#### Initialization
```python
SlidingSQLite(
db_dir: str,
schema: str,
retention_period: int = 604800,
rotation_interval: int = 3600,
cleanup_interval: int = 3600,
auto_delete_old_dbs: bool = True
)
```
- **Parameters**:
- `db_dir`: Directory to store database files.
- `schema`: SQL schema to initialize new databases.
- `retention_period`: Seconds to keep databases before deletion.
- `rotation_interval`: Seconds between database rotations.
- `cleanup_interval`: Seconds between cleanup operations.
- `auto_delete_old_dbs`: Whether to automatically delete old databases.
#### Methods
- **`execute(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
Smart query executor that routes read or write operations appropriately.
- **`execute_write(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
Execute a write query asynchronously. Returns a UUID for result retrieval.
- **`execute_write_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[bool]`**:
Execute a write query synchronously. Returns a `QueryResult` object.
- **`execute_read(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
Execute a read query asynchronously across all databases. Returns a UUID.
- **`execute_read_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]`**:
Execute a read query synchronously across all databases. Returns a `QueryResult`.
- **`get_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[bool]`**:
Retrieve the result of a write query using its UUID.
- **`get_read_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]`**:
Retrieve the result of a read query using its UUID.
- **`set_retention_period(seconds: int) -> None`**:
Set the retention period for databases.
- **`set_auto_delete(enabled: bool) -> None`**:
Enable or disable automatic deletion of old databases.
- **`delete_databases_before(timestamp: float) -> int`**:
Delete all databases with `end_time` before the specified timestamp. Returns the number of databases deleted.
- **`delete_databases_in_range(start_time: float, end_time: float) -> int`**:
Delete all databases overlapping with the specified time range. Returns the number of databases deleted.
- **`get_databases_info() -> List[DatabaseTimeframe]`**:
Get information about all available databases, including file paths and time ranges.
- **`shutdown() -> None`**:
Gracefully shut down the database, stopping workers and closing connections.
### `QueryResult` Class
A generic class to handle query results with error handling.
- **Attributes**:
- `data`: The result data (if successful).
- `error`: The exception (if failed).
- `success`: Boolean indicating if the query was successful.
- **Usage**:
```python
result = db.execute_write_sync("INSERT INTO logs (timestamp, message) VALUES (?, ?)", (time.time(), "Test"))
if result.success:
print("Success:", result.data)
else:
print("Error:", result.error)
```
### Exceptions
- **`DatabaseError`**: Base exception for all database errors.
- **`QueryError`**: Exception raised when a query fails.
## Error Handling
`SlidingSQLite` provides robust error handling through the `QueryResult` class and custom exceptions. Always check the `success` attribute of a `QueryResult` object and handle potential errors:
```python
result = db.execute_read_sync("SELECT * FROM logs", timeout=5.0)
if result.success:
print("Data:", result.data)
else:
print("Error:", result.error)
```
Common errors include:
- **Query Timeout**: If a query takes longer than the specified timeout, a `QueryError` with "Query timed out" is returned.
- **Invalid Query ID**: Attempting to retrieve results with an invalid UUID results in a `QueryError`.
- **Database Errors**: SQLite errors are wrapped in `DatabaseError` or `QueryError`.
## Best Practices
1. **Always Shut Down**: Call `db.shutdown()` when your application exits to ensure resources are cleaned up properly.
2. **Use Timeouts**: Specify appropriate timeouts for synchronous operations to avoid blocking indefinitely.
3. **Handle Errors**: Always check the `success` attribute of `QueryResult` objects and handle errors appropriately.
4. **Configure Retention**: Choose a retention period that balances disk usage and data availability needs.
5. **Monitor Disk Space**: Even with automatic cleanup, monitor disk space usage in production environments.
6. **Thread Safety**: Use `SlidingSQLite` in multi-threaded applications without additional synchronization, as it is thread-safe by design.
7. **Optimize Queries**: For read operations across many databases, optimize your queries to reduce execution time, especially if the number of database files is large.
## Example
Here is a complete example demonstrating multi-threaded usage, including configuration, query execution, and cleanup:
```python
import time
import uuid
import threading
import random
from datetime import datetime, timezone
from SlidingSqlite import SlidingSQLite
import logging
# Set up logging
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(levelname)s - %(message)s",
handlers=[logging.StreamHandler()],
)
# Configuration
NUM_WRITER_THREADS = 4
NUM_READER_THREADS = 2
TARGET_OPS_PER_SECOND = 10
# Define a schema
db_schema = """
CREATE TABLE IF NOT EXISTS logs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
timestamp REAL,
message TEXT
);
"""
# Initialize SlidingSQLite
db = SlidingSQLite(
db_dir="./databases",
schema=db_schema,
rotation_interval=10, # Rotate every 10 seconds for testing
retention_period=60, # Keep databases for 60 seconds
cleanup_interval=30, # Run cleanup every 30 seconds
auto_delete_old_dbs=True,
)
def writer_thread():
while True:
db.execute_write(
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
(time.time(), f"Message from thread {threading.current_thread().name}")
)
time.sleep(random.uniform(0.05, 0.15)) # Target ~10 ops/sec
def reader_thread():
while True:
result = db.execute_read_sync(
"SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
timeout=5.0
)
if result.success:
logging.info(f"Recent logs: {result.data}")
time.sleep(random.uniform(0.5, 1.5)) # Randomized sleep for natural load
# Start threads
threads = []
for _ in range(NUM_WRITER_THREADS):
t = threading.Thread(target=writer_thread, daemon=True)
t.start()
threads.append(t)
for _ in range(NUM_READER_THREADS):
t = threading.Thread(target=reader_thread, daemon=True)
t.start()
threads.append(t)
try:
print("Running multi-threaded SlidingSQLite test. Press Ctrl+C to stop.")
while True:
time.sleep(1)
except KeyboardInterrupt:
print("\nShutting down...")
db.shutdown()
```
This example demonstrates how to set up a multi-threaded application with `SlidingSQLite`, including logging, configuration, and proper shutdown handling.