First push to rauhala gitea.
This commit is contained in:
@@ -0,0 +1,500 @@
|
||||
|
||||
# SlidingSQLite Usage Documentation
|
||||
|
||||
This document provides detailed instructions on how to use the `SlidingSQLite` library, including its API, configuration options, and best practices.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Overview](#overview)
|
||||
2. [Installation](#installation)
|
||||
3. [Configuration](#configuration)
|
||||
4. [Basic Usage](#basic-usage)
|
||||
- [Initializing the Database](#initializing-the-database)
|
||||
- [Executing Write Queries](#executing-write-queries)
|
||||
- [Executing Read Queries](#executing-read-queries)
|
||||
- [Retrieving Results](#retrieving-results)
|
||||
- [Shutting Down](#shutting-down)
|
||||
5. [Advanced Usage](#advanced-usage)
|
||||
- [Multi-Threaded Applications](#multi-threaded-applications)
|
||||
- [Managing Database Retention](#managing-database-retention)
|
||||
- [Customizing Cleanup](#customizing-cleanup)
|
||||
- [Querying Across Time Windows](#querying-across-time-windows)
|
||||
6. [API Reference](#api-reference)
|
||||
7. [Error Handling](#error-handling)
|
||||
8. [Best Practices](#best-practices)
|
||||
9. [Example](#example)
|
||||
|
||||
## Overview
|
||||
|
||||
`SlidingSQLite` is a thread-safe SQLite wrapper that supports time-based database rotation, making it ideal for applications that need to manage time-series data or logs with automatic cleanup. It provides asynchronous query execution, automatic database rotation, and retention policies, all while ensuring thread safety through a queue-based worker system.
|
||||
|
||||
## Installation
|
||||
|
||||
To use `SlidingSQLite`, ensure you have Python 3.7 or higher installed. The library uses only the standard library and SQLite, which is included with Python.
|
||||
|
||||
1. Copy the `SlidingSqlite.py` file into your project directory.
|
||||
2. Import the `SlidingSQLite` class in your Python code:
|
||||
```python
|
||||
from SlidingSqlite import SlidingSQLite
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The `SlidingSQLite` class is initialized with several configuration parameters:
|
||||
|
||||
- **`db_dir`**: Directory where database files will be stored.
|
||||
- **`schema`**: SQL schema to initialize new database files (e.g., table definitions).
|
||||
- **`rotation_interval`**: Time interval (in seconds) after which a new database file is created (default: 3600 seconds, or 1 hour).
|
||||
- **`retention_period`**: Time period (in seconds) to retain database files before deletion (default: 604800 seconds, or 7 days).
|
||||
- **`cleanup_interval`**: Frequency (in seconds) of the cleanup process for old databases and stale queries (default: 3600 seconds, or 1 hour).
|
||||
- **`auto_delete_old_dbs`**: Boolean flag to enable or disable automatic deletion of old databases (default: `True`).
|
||||
|
||||
Example configuration:
|
||||
|
||||
```python
|
||||
schema = """
|
||||
CREATE TABLE IF NOT EXISTS logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp REAL,
|
||||
message TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
db = SlidingSQLite(
|
||||
db_dir="./databases",
|
||||
schema=schema,
|
||||
rotation_interval=3600, # Rotate every hour
|
||||
retention_period=604800, # Keep databases for 7 days
|
||||
cleanup_interval=3600, # Run cleanup every hour
|
||||
auto_delete_old_dbs=True
|
||||
)
|
||||
```
|
||||
|
||||
## Basic Usage
|
||||
|
||||
### Initializing the Database
|
||||
|
||||
Create an instance of `SlidingSQLite` with your desired configuration. This will set up the database directory, initialize the metadata database, and start the background workers for write operations and cleanup.
|
||||
|
||||
```python
|
||||
from SlidingSqlite import SlidingSQLite
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
schema = """
|
||||
CREATE TABLE IF NOT EXISTS logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp REAL,
|
||||
message TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
db = SlidingSQLite(
|
||||
db_dir="./databases",
|
||||
schema=schema
|
||||
)
|
||||
```
|
||||
|
||||
### Executing Write Queries
|
||||
|
||||
Use the `execute_write` method to perform write operations (e.g., `INSERT`, `UPDATE`, `DELETE`). This method is asynchronous and returns a UUID that can be used to retrieve the result.
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
query_id = db.execute_write(
|
||||
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
||||
(time.time(), "Hello, SlidingSQLite!")
|
||||
)
|
||||
```
|
||||
|
||||
For synchronous execution, use `execute_write_sync`, which blocks until the operation completes or times out:
|
||||
|
||||
```python
|
||||
result = db.execute_write_sync(
|
||||
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
||||
(time.time(), "Synchronous write"),
|
||||
timeout=5.0
|
||||
)
|
||||
if result.success:
|
||||
logging.info("Write operation successful")
|
||||
else:
|
||||
logging.error(f"Write operation failed: {result.error}")
|
||||
```
|
||||
|
||||
### Executing Read Queries
|
||||
|
||||
Use the `execute_read` method to perform read operations (e.g., `SELECT`). This method executes the query across all relevant database files, providing a seamless view of time-windowed data. It is asynchronous and returns a UUID.
|
||||
|
||||
```python
|
||||
query_id = db.execute_read(
|
||||
"SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
|
||||
(time.time() - 86400,) # Last 24 hours
|
||||
)
|
||||
```
|
||||
|
||||
For synchronous execution, use `execute_read_sync`:
|
||||
|
||||
```python
|
||||
result = db.execute_read_sync(
|
||||
"SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
|
||||
(time.time() - 86400,),
|
||||
timeout=5.0
|
||||
)
|
||||
if result.success:
|
||||
logging.info(f"Found {len(result.data)} log entries: {result.data}")
|
||||
else:
|
||||
logging.error(f"Read operation failed: {result.error}")
|
||||
```
|
||||
|
||||
### Retrieving Results
|
||||
|
||||
For asynchronous operations, use `get_result` (for write queries) or `get_read_result` (for read queries) to retrieve the results using the UUID returned by `execute_write` or `execute_read`.
|
||||
|
||||
```python
|
||||
# Write result
|
||||
result = db.get_result(query_id, timeout=5.0)
|
||||
if result.success:
|
||||
logging.info("Write operation successful")
|
||||
else:
|
||||
logging.error(f"Write operation failed: {result.error}")
|
||||
|
||||
# Read result
|
||||
result = db.get_read_result(query_id, timeout=5.0)
|
||||
if result.success:
|
||||
logging.info(f"Found {len(result.data)} log entries: {result.data}")
|
||||
else:
|
||||
logging.error(f"Read operation failed: {result.error}")
|
||||
```
|
||||
|
||||
### Shutting Down
|
||||
|
||||
Always call the `shutdown` method when you are done with the database to ensure graceful cleanup of resources:
|
||||
|
||||
```python
|
||||
db.shutdown()
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Multi-Threaded Applications
|
||||
|
||||
`SlidingSQLite` is designed for multi-threaded environments. It uses queues and locks to ensure thread safety. Here is an example of using multiple writer and reader threads:
|
||||
|
||||
```python
|
||||
import threading
|
||||
import time
|
||||
import random
|
||||
from SlidingSqlite import SlidingSQLite
|
||||
import logging
|
||||
|
||||
logging.basicConfig(level=logging.INFO)
|
||||
|
||||
schema = """
|
||||
CREATE TABLE IF NOT EXISTS logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp REAL,
|
||||
message TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
db = SlidingSQLite(
|
||||
db_dir="./databases",
|
||||
schema=schema,
|
||||
rotation_interval=10, # Rotate every 10 seconds for testing
|
||||
retention_period=60, # Keep databases for 60 seconds
|
||||
cleanup_interval=30 # Run cleanup every 30 seconds
|
||||
)
|
||||
|
||||
def writer_thread():
|
||||
while True:
|
||||
db.execute_write(
|
||||
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
||||
(time.time(), f"Message from thread {threading.current_thread().name}")
|
||||
)
|
||||
time.sleep(random.uniform(0.05, 0.15))
|
||||
|
||||
def reader_thread():
|
||||
while True:
|
||||
result = db.execute_read_sync(
|
||||
"SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
|
||||
timeout=5.0
|
||||
)
|
||||
if result.success:
|
||||
logging.info(f"Recent logs: {result.data}")
|
||||
time.sleep(random.uniform(0.5, 1.5))
|
||||
|
||||
threads = []
|
||||
for _ in range(4): # Start 4 writer threads
|
||||
t = threading.Thread(target=writer_thread, daemon=True)
|
||||
t.start()
|
||||
threads.append(t)
|
||||
for _ in range(2): # Start 2 reader threads
|
||||
t = threading.Thread(target=reader_thread, daemon=True)
|
||||
t.start()
|
||||
threads.append(t)
|
||||
|
||||
try:
|
||||
while True:
|
||||
time.sleep(1)
|
||||
except KeyboardInterrupt:
|
||||
print("\nShutting down...")
|
||||
db.shutdown()
|
||||
```
|
||||
|
||||
### Managing Database Retention
|
||||
|
||||
You can configure the retention period and control database deletion:
|
||||
|
||||
- **Set Retention Period**: Use `set_retention_period` to change how long databases are kept:
|
||||
|
||||
```python
|
||||
db.set_retention_period(86400) # Keep databases for 1 day
|
||||
```
|
||||
|
||||
- **Enable/Disable Auto-Delete**: Use `set_auto_delete` to control automatic deletion of old databases:
|
||||
|
||||
```python
|
||||
db.set_auto_delete(False) # Disable automatic deletion
|
||||
```
|
||||
|
||||
- **Manual Deletion**: Use `delete_databases_before` or `delete_databases_in_range` to manually delete databases:
|
||||
|
||||
```python
|
||||
import time
|
||||
|
||||
# Delete all databases before a specific timestamp
|
||||
count = db.delete_databases_before(time.time() - 86400)
|
||||
logging.info(f"Deleted {count} databases")
|
||||
|
||||
# Delete databases in a specific time range
|
||||
count = db.delete_databases_in_range(time.time() - 172800, time.time() - 86400)
|
||||
logging.info(f"Deleted {count} databases in range")
|
||||
```
|
||||
|
||||
### Customizing Cleanup
|
||||
|
||||
You can adjust the cleanup interval to control how often the system checks for old databases and stale queries:
|
||||
|
||||
```python
|
||||
db = SlidingSQLite(
|
||||
db_dir="./databases",
|
||||
schema=schema,
|
||||
cleanup_interval=1800 # Run cleanup every 30 minutes
|
||||
)
|
||||
```
|
||||
|
||||
### Querying Across Time Windows
|
||||
|
||||
Read queries are automatically executed across all relevant database files, providing a unified view of data across time windows. This is particularly useful for time-series data or logs. For example:
|
||||
|
||||
```python
|
||||
result = db.execute_read_sync(
|
||||
"SELECT timestamp, message FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
|
||||
(time.time() - 604800,) # Last 7 days
|
||||
)
|
||||
if result.success:
|
||||
logging.info(f"Found {len(result.data)} log entries: {result.data}")
|
||||
```
|
||||
|
||||
## API Reference
|
||||
|
||||
### `SlidingSQLite` Class
|
||||
|
||||
#### Initialization
|
||||
|
||||
```python
|
||||
SlidingSQLite(
|
||||
db_dir: str,
|
||||
schema: str,
|
||||
retention_period: int = 604800,
|
||||
rotation_interval: int = 3600,
|
||||
cleanup_interval: int = 3600,
|
||||
auto_delete_old_dbs: bool = True
|
||||
)
|
||||
```
|
||||
|
||||
- **Parameters**:
|
||||
- `db_dir`: Directory to store database files.
|
||||
- `schema`: SQL schema to initialize new databases.
|
||||
- `retention_period`: Seconds to keep databases before deletion.
|
||||
- `rotation_interval`: Seconds between database rotations.
|
||||
- `cleanup_interval`: Seconds between cleanup operations.
|
||||
- `auto_delete_old_dbs`: Whether to automatically delete old databases.
|
||||
|
||||
#### Methods
|
||||
|
||||
- **`execute(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
|
||||
Smart query executor that routes read or write operations appropriately.
|
||||
|
||||
- **`execute_write(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
|
||||
Execute a write query asynchronously. Returns a UUID for result retrieval.
|
||||
|
||||
- **`execute_write_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[bool]`**:
|
||||
Execute a write query synchronously. Returns a `QueryResult` object.
|
||||
|
||||
- **`execute_read(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
|
||||
Execute a read query asynchronously across all databases. Returns a UUID.
|
||||
|
||||
- **`execute_read_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]`**:
|
||||
Execute a read query synchronously across all databases. Returns a `QueryResult`.
|
||||
|
||||
- **`get_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[bool]`**:
|
||||
Retrieve the result of a write query using its UUID.
|
||||
|
||||
- **`get_read_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]`**:
|
||||
Retrieve the result of a read query using its UUID.
|
||||
|
||||
- **`set_retention_period(seconds: int) -> None`**:
|
||||
Set the retention period for databases.
|
||||
|
||||
- **`set_auto_delete(enabled: bool) -> None`**:
|
||||
Enable or disable automatic deletion of old databases.
|
||||
|
||||
- **`delete_databases_before(timestamp: float) -> int`**:
|
||||
Delete all databases with `end_time` before the specified timestamp. Returns the number of databases deleted.
|
||||
|
||||
- **`delete_databases_in_range(start_time: float, end_time: float) -> int`**:
|
||||
Delete all databases overlapping with the specified time range. Returns the number of databases deleted.
|
||||
|
||||
- **`get_databases_info() -> List[DatabaseTimeframe]`**:
|
||||
Get information about all available databases, including file paths and time ranges.
|
||||
|
||||
- **`shutdown() -> None`**:
|
||||
Gracefully shut down the database, stopping workers and closing connections.
|
||||
|
||||
### `QueryResult` Class
|
||||
|
||||
A generic class to handle query results with error handling.
|
||||
|
||||
- **Attributes**:
|
||||
- `data`: The result data (if successful).
|
||||
- `error`: The exception (if failed).
|
||||
- `success`: Boolean indicating if the query was successful.
|
||||
|
||||
- **Usage**:
|
||||
```python
|
||||
result = db.execute_write_sync("INSERT INTO logs (timestamp, message) VALUES (?, ?)", (time.time(), "Test"))
|
||||
if result.success:
|
||||
print("Success:", result.data)
|
||||
else:
|
||||
print("Error:", result.error)
|
||||
```
|
||||
|
||||
### Exceptions
|
||||
|
||||
- **`DatabaseError`**: Base exception for all database errors.
|
||||
- **`QueryError`**: Exception raised when a query fails.
|
||||
|
||||
## Error Handling
|
||||
|
||||
`SlidingSQLite` provides robust error handling through the `QueryResult` class and custom exceptions. Always check the `success` attribute of a `QueryResult` object and handle potential errors:
|
||||
|
||||
```python
|
||||
result = db.execute_read_sync("SELECT * FROM logs", timeout=5.0)
|
||||
if result.success:
|
||||
print("Data:", result.data)
|
||||
else:
|
||||
print("Error:", result.error)
|
||||
```
|
||||
|
||||
Common errors include:
|
||||
|
||||
- **Query Timeout**: If a query takes longer than the specified timeout, a `QueryError` with "Query timed out" is returned.
|
||||
- **Invalid Query ID**: Attempting to retrieve results with an invalid UUID results in a `QueryError`.
|
||||
- **Database Errors**: SQLite errors are wrapped in `DatabaseError` or `QueryError`.
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Always Shut Down**: Call `db.shutdown()` when your application exits to ensure resources are cleaned up properly.
|
||||
2. **Use Timeouts**: Specify appropriate timeouts for synchronous operations to avoid blocking indefinitely.
|
||||
3. **Handle Errors**: Always check the `success` attribute of `QueryResult` objects and handle errors appropriately.
|
||||
4. **Configure Retention**: Choose a retention period that balances disk usage and data availability needs.
|
||||
5. **Monitor Disk Space**: Even with automatic cleanup, monitor disk space usage in production environments.
|
||||
6. **Thread Safety**: Use `SlidingSQLite` in multi-threaded applications without additional synchronization, as it is thread-safe by design.
|
||||
7. **Optimize Queries**: For read operations across many databases, optimize your queries to reduce execution time, especially if the number of database files is large.
|
||||
|
||||
## Example
|
||||
|
||||
Here is a complete example demonstrating multi-threaded usage, including configuration, query execution, and cleanup:
|
||||
|
||||
```python
|
||||
import time
|
||||
import uuid
|
||||
import threading
|
||||
import random
|
||||
from datetime import datetime, timezone
|
||||
from SlidingSqlite import SlidingSQLite
|
||||
import logging
|
||||
|
||||
# Set up logging
|
||||
logging.basicConfig(
|
||||
level=logging.INFO,
|
||||
format="%(asctime)s - %(levelname)s - %(message)s",
|
||||
handlers=[logging.StreamHandler()],
|
||||
)
|
||||
|
||||
# Configuration
|
||||
NUM_WRITER_THREADS = 4
|
||||
NUM_READER_THREADS = 2
|
||||
TARGET_OPS_PER_SECOND = 10
|
||||
|
||||
# Define a schema
|
||||
db_schema = """
|
||||
CREATE TABLE IF NOT EXISTS logs (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
timestamp REAL,
|
||||
message TEXT
|
||||
);
|
||||
"""
|
||||
|
||||
# Initialize SlidingSQLite
|
||||
db = SlidingSQLite(
|
||||
db_dir="./databases",
|
||||
schema=db_schema,
|
||||
rotation_interval=10, # Rotate every 10 seconds for testing
|
||||
retention_period=60, # Keep databases for 60 seconds
|
||||
cleanup_interval=30, # Run cleanup every 30 seconds
|
||||
auto_delete_old_dbs=True,
|
||||
)
|
||||
|
||||
def writer_thread():
|
||||
while True:
|
||||
db.execute_write(
|
||||
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
||||
(time.time(), f"Message from thread {threading.current_thread().name}")
|
||||
)
|
||||
time.sleep(random.uniform(0.05, 0.15)) # Target ~10 ops/sec
|
||||
|
||||
def reader_thread():
|
||||
while True:
|
||||
result = db.execute_read_sync(
|
||||
"SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
|
||||
timeout=5.0
|
||||
)
|
||||
if result.success:
|
||||
logging.info(f"Recent logs: {result.data}")
|
||||
time.sleep(random.uniform(0.5, 1.5)) # Randomized sleep for natural load
|
||||
|
||||
# Start threads
|
||||
threads = []
|
||||
for _ in range(NUM_WRITER_THREADS):
|
||||
t = threading.Thread(target=writer_thread, daemon=True)
|
||||
t.start()
|
||||
threads.append(t)
|
||||
for _ in range(NUM_READER_THREADS):
|
||||
t = threading.Thread(target=reader_thread, daemon=True)
|
||||
t.start()
|
||||
threads.append(t)
|
||||
|
||||
try:
|
||||
print("Running multi-threaded SlidingSQLite test. Press Ctrl+C to stop.")
|
||||
while True:
|
||||
time.sleep(1)
|
||||
except KeyboardInterrupt:
|
||||
print("\nShutting down...")
|
||||
db.shutdown()
|
||||
```
|
||||
|
||||
This example demonstrates how to set up a multi-threaded application with `SlidingSQLite`, including logging, configuration, and proper shutdown handling.
|
||||
Reference in New Issue
Block a user