501 lines
16 KiB
Markdown
501 lines
16 KiB
Markdown
|
|
# SlidingSQLite Usage Documentation
|
|
|
|
This document provides detailed instructions on how to use the `SlidingSQLite` library, including its API, configuration options, and best practices.
|
|
|
|
## Table of Contents
|
|
|
|
1. [Overview](#overview)
|
|
2. [Installation](#installation)
|
|
3. [Configuration](#configuration)
|
|
4. [Basic Usage](#basic-usage)
|
|
- [Initializing the Database](#initializing-the-database)
|
|
- [Executing Write Queries](#executing-write-queries)
|
|
- [Executing Read Queries](#executing-read-queries)
|
|
- [Retrieving Results](#retrieving-results)
|
|
- [Shutting Down](#shutting-down)
|
|
5. [Advanced Usage](#advanced-usage)
|
|
- [Multi-Threaded Applications](#multi-threaded-applications)
|
|
- [Managing Database Retention](#managing-database-retention)
|
|
- [Customizing Cleanup](#customizing-cleanup)
|
|
- [Querying Across Time Windows](#querying-across-time-windows)
|
|
6. [API Reference](#api-reference)
|
|
7. [Error Handling](#error-handling)
|
|
8. [Best Practices](#best-practices)
|
|
9. [Example](#example)
|
|
|
|
## Overview
|
|
|
|
`SlidingSQLite` is a thread-safe SQLite wrapper that supports time-based database rotation, making it ideal for applications that need to manage time-series data or logs with automatic cleanup. It provides asynchronous query execution, automatic database rotation, and retention policies, all while ensuring thread safety through a queue-based worker system.
|
|
|
|
## Installation
|
|
|
|
To use `SlidingSQLite`, ensure you have Python 3.7 or higher installed. The library uses only the standard library and SQLite, which is included with Python.
|
|
|
|
1. Copy the `SlidingSqlite.py` file into your project directory.
|
|
2. Import the `SlidingSQLite` class in your Python code:
|
|
```python
|
|
from SlidingSqlite import SlidingSQLite
|
|
```
|
|
|
|
## Configuration
|
|
|
|
The `SlidingSQLite` class is initialized with several configuration parameters:
|
|
|
|
- **`db_dir`**: Directory where database files will be stored.
|
|
- **`schema`**: SQL schema to initialize new database files (e.g., table definitions).
|
|
- **`rotation_interval`**: Time interval (in seconds) after which a new database file is created (default: 3600 seconds, or 1 hour).
|
|
- **`retention_period`**: Time period (in seconds) to retain database files before deletion (default: 604800 seconds, or 7 days).
|
|
- **`cleanup_interval`**: Frequency (in seconds) of the cleanup process for old databases and stale queries (default: 3600 seconds, or 1 hour).
|
|
- **`auto_delete_old_dbs`**: Boolean flag to enable or disable automatic deletion of old databases (default: `True`).
|
|
|
|
Example configuration:
|
|
|
|
```python
|
|
schema = """
|
|
CREATE TABLE IF NOT EXISTS logs (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
timestamp REAL,
|
|
message TEXT
|
|
);
|
|
"""
|
|
|
|
db = SlidingSQLite(
|
|
db_dir="./databases",
|
|
schema=schema,
|
|
rotation_interval=3600, # Rotate every hour
|
|
retention_period=604800, # Keep databases for 7 days
|
|
cleanup_interval=3600, # Run cleanup every hour
|
|
auto_delete_old_dbs=True
|
|
)
|
|
```
|
|
|
|
## Basic Usage
|
|
|
|
### Initializing the Database
|
|
|
|
Create an instance of `SlidingSQLite` with your desired configuration. This will set up the database directory, initialize the metadata database, and start the background workers for write operations and cleanup.
|
|
|
|
```python
|
|
from SlidingSqlite import SlidingSQLite
|
|
import logging
|
|
|
|
logging.basicConfig(level=logging.INFO)
|
|
|
|
schema = """
|
|
CREATE TABLE IF NOT EXISTS logs (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
timestamp REAL,
|
|
message TEXT
|
|
);
|
|
"""
|
|
|
|
db = SlidingSQLite(
|
|
db_dir="./databases",
|
|
schema=schema
|
|
)
|
|
```
|
|
|
|
### Executing Write Queries
|
|
|
|
Use the `execute_write` method to perform write operations (e.g., `INSERT`, `UPDATE`, `DELETE`). This method is asynchronous and returns a UUID that can be used to retrieve the result.
|
|
|
|
```python
|
|
import time
|
|
|
|
query_id = db.execute_write(
|
|
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
|
(time.time(), "Hello, SlidingSQLite!")
|
|
)
|
|
```
|
|
|
|
For synchronous execution, use `execute_write_sync`, which blocks until the operation completes or times out:
|
|
|
|
```python
|
|
result = db.execute_write_sync(
|
|
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
|
(time.time(), "Synchronous write"),
|
|
timeout=5.0
|
|
)
|
|
if result.success:
|
|
logging.info("Write operation successful")
|
|
else:
|
|
logging.error(f"Write operation failed: {result.error}")
|
|
```
|
|
|
|
### Executing Read Queries
|
|
|
|
Use the `execute_read` method to perform read operations (e.g., `SELECT`). This method executes the query across all relevant database files, providing a seamless view of time-windowed data. It is asynchronous and returns a UUID.
|
|
|
|
```python
|
|
query_id = db.execute_read(
|
|
"SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
|
|
(time.time() - 86400,) # Last 24 hours
|
|
)
|
|
```
|
|
|
|
For synchronous execution, use `execute_read_sync`:
|
|
|
|
```python
|
|
result = db.execute_read_sync(
|
|
"SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
|
|
(time.time() - 86400,),
|
|
timeout=5.0
|
|
)
|
|
if result.success:
|
|
logging.info(f"Found {len(result.data)} log entries: {result.data}")
|
|
else:
|
|
logging.error(f"Read operation failed: {result.error}")
|
|
```
|
|
|
|
### Retrieving Results
|
|
|
|
For asynchronous operations, use `get_result` (for write queries) or `get_read_result` (for read queries) to retrieve the results using the UUID returned by `execute_write` or `execute_read`.
|
|
|
|
```python
|
|
# Write result
|
|
result = db.get_result(query_id, timeout=5.0)
|
|
if result.success:
|
|
logging.info("Write operation successful")
|
|
else:
|
|
logging.error(f"Write operation failed: {result.error}")
|
|
|
|
# Read result
|
|
result = db.get_read_result(query_id, timeout=5.0)
|
|
if result.success:
|
|
logging.info(f"Found {len(result.data)} log entries: {result.data}")
|
|
else:
|
|
logging.error(f"Read operation failed: {result.error}")
|
|
```
|
|
|
|
### Shutting Down
|
|
|
|
Always call the `shutdown` method when you are done with the database to ensure graceful cleanup of resources:
|
|
|
|
```python
|
|
db.shutdown()
|
|
```
|
|
|
|
## Advanced Usage
|
|
|
|
### Multi-Threaded Applications
|
|
|
|
`SlidingSQLite` is designed for multi-threaded environments. It uses queues and locks to ensure thread safety. Here is an example of using multiple writer and reader threads:
|
|
|
|
```python
|
|
import threading
|
|
import time
|
|
import random
|
|
from SlidingSqlite import SlidingSQLite
|
|
import logging
|
|
|
|
logging.basicConfig(level=logging.INFO)
|
|
|
|
schema = """
|
|
CREATE TABLE IF NOT EXISTS logs (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
timestamp REAL,
|
|
message TEXT
|
|
);
|
|
"""
|
|
|
|
db = SlidingSQLite(
|
|
db_dir="./databases",
|
|
schema=schema,
|
|
rotation_interval=10, # Rotate every 10 seconds for testing
|
|
retention_period=60, # Keep databases for 60 seconds
|
|
cleanup_interval=30 # Run cleanup every 30 seconds
|
|
)
|
|
|
|
def writer_thread():
|
|
while True:
|
|
db.execute_write(
|
|
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
|
(time.time(), f"Message from thread {threading.current_thread().name}")
|
|
)
|
|
time.sleep(random.uniform(0.05, 0.15))
|
|
|
|
def reader_thread():
|
|
while True:
|
|
result = db.execute_read_sync(
|
|
"SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
|
|
timeout=5.0
|
|
)
|
|
if result.success:
|
|
logging.info(f"Recent logs: {result.data}")
|
|
time.sleep(random.uniform(0.5, 1.5))
|
|
|
|
threads = []
|
|
for _ in range(4): # Start 4 writer threads
|
|
t = threading.Thread(target=writer_thread, daemon=True)
|
|
t.start()
|
|
threads.append(t)
|
|
for _ in range(2): # Start 2 reader threads
|
|
t = threading.Thread(target=reader_thread, daemon=True)
|
|
t.start()
|
|
threads.append(t)
|
|
|
|
try:
|
|
while True:
|
|
time.sleep(1)
|
|
except KeyboardInterrupt:
|
|
print("\nShutting down...")
|
|
db.shutdown()
|
|
```
|
|
|
|
### Managing Database Retention
|
|
|
|
You can configure the retention period and control database deletion:
|
|
|
|
- **Set Retention Period**: Use `set_retention_period` to change how long databases are kept:
|
|
|
|
```python
|
|
db.set_retention_period(86400) # Keep databases for 1 day
|
|
```
|
|
|
|
- **Enable/Disable Auto-Delete**: Use `set_auto_delete` to control automatic deletion of old databases:
|
|
|
|
```python
|
|
db.set_auto_delete(False) # Disable automatic deletion
|
|
```
|
|
|
|
- **Manual Deletion**: Use `delete_databases_before` or `delete_databases_in_range` to manually delete databases:
|
|
|
|
```python
|
|
import time
|
|
|
|
# Delete all databases before a specific timestamp
|
|
count = db.delete_databases_before(time.time() - 86400)
|
|
logging.info(f"Deleted {count} databases")
|
|
|
|
# Delete databases in a specific time range
|
|
count = db.delete_databases_in_range(time.time() - 172800, time.time() - 86400)
|
|
logging.info(f"Deleted {count} databases in range")
|
|
```
|
|
|
|
### Customizing Cleanup
|
|
|
|
You can adjust the cleanup interval to control how often the system checks for old databases and stale queries:
|
|
|
|
```python
|
|
db = SlidingSQLite(
|
|
db_dir="./databases",
|
|
schema=schema,
|
|
cleanup_interval=1800 # Run cleanup every 30 minutes
|
|
)
|
|
```
|
|
|
|
### Querying Across Time Windows
|
|
|
|
Read queries are automatically executed across all relevant database files, providing a unified view of data across time windows. This is particularly useful for time-series data or logs. For example:
|
|
|
|
```python
|
|
result = db.execute_read_sync(
|
|
"SELECT timestamp, message FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
|
|
(time.time() - 604800,) # Last 7 days
|
|
)
|
|
if result.success:
|
|
logging.info(f"Found {len(result.data)} log entries: {result.data}")
|
|
```
|
|
|
|
## API Reference
|
|
|
|
### `SlidingSQLite` Class
|
|
|
|
#### Initialization
|
|
|
|
```python
|
|
SlidingSQLite(
|
|
db_dir: str,
|
|
schema: str,
|
|
retention_period: int = 604800,
|
|
rotation_interval: int = 3600,
|
|
cleanup_interval: int = 3600,
|
|
auto_delete_old_dbs: bool = True
|
|
)
|
|
```
|
|
|
|
- **Parameters**:
|
|
- `db_dir`: Directory to store database files.
|
|
- `schema`: SQL schema to initialize new databases.
|
|
- `retention_period`: Seconds to keep databases before deletion.
|
|
- `rotation_interval`: Seconds between database rotations.
|
|
- `cleanup_interval`: Seconds between cleanup operations.
|
|
- `auto_delete_old_dbs`: Whether to automatically delete old databases.
|
|
|
|
#### Methods
|
|
|
|
- **`execute(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
|
|
Smart query executor that routes read or write operations appropriately.
|
|
|
|
- **`execute_write(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
|
|
Execute a write query asynchronously. Returns a UUID for result retrieval.
|
|
|
|
- **`execute_write_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[bool]`**:
|
|
Execute a write query synchronously. Returns a `QueryResult` object.
|
|
|
|
- **`execute_read(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID`**:
|
|
Execute a read query asynchronously across all databases. Returns a UUID.
|
|
|
|
- **`execute_read_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]`**:
|
|
Execute a read query synchronously across all databases. Returns a `QueryResult`.
|
|
|
|
- **`get_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[bool]`**:
|
|
Retrieve the result of a write query using its UUID.
|
|
|
|
- **`get_read_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]`**:
|
|
Retrieve the result of a read query using its UUID.
|
|
|
|
- **`set_retention_period(seconds: int) -> None`**:
|
|
Set the retention period for databases.
|
|
|
|
- **`set_auto_delete(enabled: bool) -> None`**:
|
|
Enable or disable automatic deletion of old databases.
|
|
|
|
- **`delete_databases_before(timestamp: float) -> int`**:
|
|
Delete all databases with `end_time` before the specified timestamp. Returns the number of databases deleted.
|
|
|
|
- **`delete_databases_in_range(start_time: float, end_time: float) -> int`**:
|
|
Delete all databases overlapping with the specified time range. Returns the number of databases deleted.
|
|
|
|
- **`get_databases_info() -> List[DatabaseTimeframe]`**:
|
|
Get information about all available databases, including file paths and time ranges.
|
|
|
|
- **`shutdown() -> None`**:
|
|
Gracefully shut down the database, stopping workers and closing connections.
|
|
|
|
### `QueryResult` Class
|
|
|
|
A generic class to handle query results with error handling.
|
|
|
|
- **Attributes**:
|
|
- `data`: The result data (if successful).
|
|
- `error`: The exception (if failed).
|
|
- `success`: Boolean indicating if the query was successful.
|
|
|
|
- **Usage**:
|
|
```python
|
|
result = db.execute_write_sync("INSERT INTO logs (timestamp, message) VALUES (?, ?)", (time.time(), "Test"))
|
|
if result.success:
|
|
print("Success:", result.data)
|
|
else:
|
|
print("Error:", result.error)
|
|
```
|
|
|
|
### Exceptions
|
|
|
|
- **`DatabaseError`**: Base exception for all database errors.
|
|
- **`QueryError`**: Exception raised when a query fails.
|
|
|
|
## Error Handling
|
|
|
|
`SlidingSQLite` provides robust error handling through the `QueryResult` class and custom exceptions. Always check the `success` attribute of a `QueryResult` object and handle potential errors:
|
|
|
|
```python
|
|
result = db.execute_read_sync("SELECT * FROM logs", timeout=5.0)
|
|
if result.success:
|
|
print("Data:", result.data)
|
|
else:
|
|
print("Error:", result.error)
|
|
```
|
|
|
|
Common errors include:
|
|
|
|
- **Query Timeout**: If a query takes longer than the specified timeout, a `QueryError` with "Query timed out" is returned.
|
|
- **Invalid Query ID**: Attempting to retrieve results with an invalid UUID results in a `QueryError`.
|
|
- **Database Errors**: SQLite errors are wrapped in `DatabaseError` or `QueryError`.
|
|
|
|
## Best Practices
|
|
|
|
1. **Always Shut Down**: Call `db.shutdown()` when your application exits to ensure resources are cleaned up properly.
|
|
2. **Use Timeouts**: Specify appropriate timeouts for synchronous operations to avoid blocking indefinitely.
|
|
3. **Handle Errors**: Always check the `success` attribute of `QueryResult` objects and handle errors appropriately.
|
|
4. **Configure Retention**: Choose a retention period that balances disk usage and data availability needs.
|
|
5. **Monitor Disk Space**: Even with automatic cleanup, monitor disk space usage in production environments.
|
|
6. **Thread Safety**: Use `SlidingSQLite` in multi-threaded applications without additional synchronization, as it is thread-safe by design.
|
|
7. **Optimize Queries**: For read operations across many databases, optimize your queries to reduce execution time, especially if the number of database files is large.
|
|
|
|
## Example
|
|
|
|
Here is a complete example demonstrating multi-threaded usage, including configuration, query execution, and cleanup:
|
|
|
|
```python
|
|
import time
|
|
import uuid
|
|
import threading
|
|
import random
|
|
from datetime import datetime, timezone
|
|
from SlidingSqlite import SlidingSQLite
|
|
import logging
|
|
|
|
# Set up logging
|
|
logging.basicConfig(
|
|
level=logging.INFO,
|
|
format="%(asctime)s - %(levelname)s - %(message)s",
|
|
handlers=[logging.StreamHandler()],
|
|
)
|
|
|
|
# Configuration
|
|
NUM_WRITER_THREADS = 4
|
|
NUM_READER_THREADS = 2
|
|
TARGET_OPS_PER_SECOND = 10
|
|
|
|
# Define a schema
|
|
db_schema = """
|
|
CREATE TABLE IF NOT EXISTS logs (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
timestamp REAL,
|
|
message TEXT
|
|
);
|
|
"""
|
|
|
|
# Initialize SlidingSQLite
|
|
db = SlidingSQLite(
|
|
db_dir="./databases",
|
|
schema=db_schema,
|
|
rotation_interval=10, # Rotate every 10 seconds for testing
|
|
retention_period=60, # Keep databases for 60 seconds
|
|
cleanup_interval=30, # Run cleanup every 30 seconds
|
|
auto_delete_old_dbs=True,
|
|
)
|
|
|
|
def writer_thread():
|
|
while True:
|
|
db.execute_write(
|
|
"INSERT INTO logs (timestamp, message) VALUES (?, ?)",
|
|
(time.time(), f"Message from thread {threading.current_thread().name}")
|
|
)
|
|
time.sleep(random.uniform(0.05, 0.15)) # Target ~10 ops/sec
|
|
|
|
def reader_thread():
|
|
while True:
|
|
result = db.execute_read_sync(
|
|
"SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
|
|
timeout=5.0
|
|
)
|
|
if result.success:
|
|
logging.info(f"Recent logs: {result.data}")
|
|
time.sleep(random.uniform(0.5, 1.5)) # Randomized sleep for natural load
|
|
|
|
# Start threads
|
|
threads = []
|
|
for _ in range(NUM_WRITER_THREADS):
|
|
t = threading.Thread(target=writer_thread, daemon=True)
|
|
t.start()
|
|
threads.append(t)
|
|
for _ in range(NUM_READER_THREADS):
|
|
t = threading.Thread(target=reader_thread, daemon=True)
|
|
t.start()
|
|
threads.append(t)
|
|
|
|
try:
|
|
print("Running multi-threaded SlidingSQLite test. Press Ctrl+C to stop.")
|
|
while True:
|
|
time.sleep(1)
|
|
except KeyboardInterrupt:
|
|
print("\nShutting down...")
|
|
db.shutdown()
|
|
```
|
|
|
|
This example demonstrates how to set up a multi-threaded application with `SlidingSQLite`, including logging, configuration, and proper shutdown handling.
|