Kalzu Rekku dc5d652503 First push to rauhala gitea.

2025-03-16 10:30:47 +02:00

16 KiB

Raw Blame History

SlidingSQLite Usage Documentation

This document provides detailed instructions on how to use the SlidingSQLite library, including its API, configuration options, and best practices.

Overview
Installation
Configuration
Basic Usage
Advanced Usage
API Reference
Error Handling
Best Practices
Example

Overview

SlidingSQLite is a thread-safe SQLite wrapper that supports time-based database rotation, making it ideal for applications that need to manage time-series data or logs with automatic cleanup. It provides asynchronous query execution, automatic database rotation, and retention policies, all while ensuring thread safety through a queue-based worker system.

Installation

To use SlidingSQLite, ensure you have Python 3.7 or higher installed. The library uses only the standard library and SQLite, which is included with Python.

Copy the SlidingSqlite.py file into your project directory.
Import the SlidingSQLite class in your Python code:
```
from SlidingSqlite import SlidingSQLite
```

Configuration

The SlidingSQLite class is initialized with several configuration parameters:

db_dir: Directory where database files will be stored.
schema: SQL schema to initialize new database files (e.g., table definitions).
rotation_interval: Time interval (in seconds) after which a new database file is created (default: 3600 seconds, or 1 hour).
retention_period: Time period (in seconds) to retain database files before deletion (default: 604800 seconds, or 7 days).
cleanup_interval: Frequency (in seconds) of the cleanup process for old databases and stale queries (default: 3600 seconds, or 1 hour).
auto_delete_old_dbs: Boolean flag to enable or disable automatic deletion of old databases (default: True).

Example configuration:

schema = """
CREATE TABLE IF NOT EXISTS logs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp REAL,
    message TEXT
);
"""

db = SlidingSQLite(
    db_dir="./databases",
    schema=schema,
    rotation_interval=3600,  # Rotate every hour
    retention_period=604800,  # Keep databases for 7 days
    cleanup_interval=3600,   # Run cleanup every hour
    auto_delete_old_dbs=True
)

Basic Usage

Initializing the Database

Create an instance of SlidingSQLite with your desired configuration. This will set up the database directory, initialize the metadata database, and start the background workers for write operations and cleanup.

from SlidingSqlite import SlidingSQLite
import logging

logging.basicConfig(level=logging.INFO)

schema = """
CREATE TABLE IF NOT EXISTS logs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp REAL,
    message TEXT
);
"""

db = SlidingSQLite(
    db_dir="./databases",
    schema=schema
)

Executing Write Queries

Use the execute_write method to perform write operations (e.g., INSERT, UPDATE, DELETE). This method is asynchronous and returns a UUID that can be used to retrieve the result.

import time

query_id = db.execute_write(
    "INSERT INTO logs (timestamp, message) VALUES (?, ?)",
    (time.time(), "Hello, SlidingSQLite!")
)

For synchronous execution, use execute_write_sync, which blocks until the operation completes or times out:

result = db.execute_write_sync(
    "INSERT INTO logs (timestamp, message) VALUES (?, ?)",
    (time.time(), "Synchronous write"),
    timeout=5.0
)
if result.success:
    logging.info("Write operation successful")
else:
    logging.error(f"Write operation failed: {result.error}")

Executing Read Queries

Use the execute_read method to perform read operations (e.g., SELECT). This method executes the query across all relevant database files, providing a seamless view of time-windowed data. It is asynchronous and returns a UUID.

query_id = db.execute_read(
    "SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
    (time.time() - 86400,)  # Last 24 hours
)

For synchronous execution, use execute_read_sync:

result = db.execute_read_sync(
    "SELECT * FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
    (time.time() - 86400,),
    timeout=5.0
)
if result.success:
    logging.info(f"Found {len(result.data)} log entries: {result.data}")
else:
    logging.error(f"Read operation failed: {result.error}")

Retrieving Results

For asynchronous operations, use get_result (for write queries) or get_read_result (for read queries) to retrieve the results using the UUID returned by execute_write or execute_read.

# Write result
result = db.get_result(query_id, timeout=5.0)
if result.success:
    logging.info("Write operation successful")
else:
    logging.error(f"Write operation failed: {result.error}")

# Read result
result = db.get_read_result(query_id, timeout=5.0)
if result.success:
    logging.info(f"Found {len(result.data)} log entries: {result.data}")
else:
    logging.error(f"Read operation failed: {result.error}")

Shutting Down

Always call the shutdown method when you are done with the database to ensure graceful cleanup of resources:

db.shutdown()

Advanced Usage

Multi-Threaded Applications

SlidingSQLite is designed for multi-threaded environments. It uses queues and locks to ensure thread safety. Here is an example of using multiple writer and reader threads:

import threading
import time
import random
from SlidingSqlite import SlidingSQLite
import logging

logging.basicConfig(level=logging.INFO)

schema = """
CREATE TABLE IF NOT EXISTS logs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp REAL,
    message TEXT
);
"""

db = SlidingSQLite(
    db_dir="./databases",
    schema=schema,
    rotation_interval=10,  # Rotate every 10 seconds for testing
    retention_period=60,   # Keep databases for 60 seconds
    cleanup_interval=30    # Run cleanup every 30 seconds
)

def writer_thread():
    while True:
        db.execute_write(
            "INSERT INTO logs (timestamp, message) VALUES (?, ?)",
            (time.time(), f"Message from thread {threading.current_thread().name}")
        )
        time.sleep(random.uniform(0.05, 0.15))

def reader_thread():
    while True:
        result = db.execute_read_sync(
            "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
            timeout=5.0
        )
        if result.success:
            logging.info(f"Recent logs: {result.data}")
        time.sleep(random.uniform(0.5, 1.5))

threads = []
for _ in range(4):  # Start 4 writer threads
    t = threading.Thread(target=writer_thread, daemon=True)
    t.start()
    threads.append(t)
for _ in range(2):  # Start 2 reader threads
    t = threading.Thread(target=reader_thread, daemon=True)
    t.start()
    threads.append(t)

try:
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    print("\nShutting down...")
    db.shutdown()

Managing Database Retention

You can configure the retention period and control database deletion:

Set Retention Period: Use set_retention_period to change how long databases are kept:
```
db.set_retention_period(86400)  # Keep databases for 1 day
```
Enable/Disable Auto-Delete: Use set_auto_delete to control automatic deletion of old databases:
```
db.set_auto_delete(False)  # Disable automatic deletion
```

Manual Deletion: Use delete_databases_before or delete_databases_in_range to manually delete databases:

import time

# Delete all databases before a specific timestamp
count = db.delete_databases_before(time.time() - 86400)
logging.info(f"Deleted {count} databases")

# Delete databases in a specific time range
count = db.delete_databases_in_range(time.time() - 172800, time.time() - 86400)
logging.info(f"Deleted {count} databases in range")

Customizing Cleanup

You can adjust the cleanup interval to control how often the system checks for old databases and stale queries:

db = SlidingSQLite(
    db_dir="./databases",
    schema=schema,
    cleanup_interval=1800  # Run cleanup every 30 minutes
)

Querying Across Time Windows

Read queries are automatically executed across all relevant database files, providing a unified view of data across time windows. This is particularly useful for time-series data or logs. For example:

result = db.execute_read_sync(
    "SELECT timestamp, message FROM logs WHERE timestamp > ? ORDER BY timestamp DESC",
    (time.time() - 604800,)  # Last 7 days
)
if result.success:
    logging.info(f"Found {len(result.data)} log entries: {result.data}")

API Reference

`SlidingSQLite` Class

Initialization

SlidingSQLite(
    db_dir: str,
    schema: str,
    retention_period: int = 604800,
    rotation_interval: int = 3600,
    cleanup_interval: int = 3600,
    auto_delete_old_dbs: bool = True
)

Parameters:
- db_dir: Directory to store database files.
- schema: SQL schema to initialize new databases.
- retention_period: Seconds to keep databases before deletion.
- rotation_interval: Seconds between database rotations.
- cleanup_interval: Seconds between cleanup operations.
- auto_delete_old_dbs: Whether to automatically delete old databases.

Methods

execute(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID: Smart query executor that routes read or write operations appropriately.
execute_write(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID: Execute a write query asynchronously. Returns a UUID for result retrieval.
execute_write_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[bool]: Execute a write query synchronously. Returns a QueryResult object.
execute_read(query: str, params: Tuple[Any, ...] = ()) -> uuid.UUID: Execute a read query asynchronously across all databases. Returns a UUID.
execute_read_sync(query: str, params: Tuple[Any, ...] = (), timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]: Execute a read query synchronously across all databases. Returns a QueryResult.
get_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[bool]: Retrieve the result of a write query using its UUID.
get_read_result(query_id: uuid.UUID, timeout: float = 5.0) -> QueryResult[List[Tuple[Any, ...]]]: Retrieve the result of a read query using its UUID.
set_retention_period(seconds: int) -> None: Set the retention period for databases.
set_auto_delete(enabled: bool) -> None: Enable or disable automatic deletion of old databases.
delete_databases_before(timestamp: float) -> int: Delete all databases with end_time before the specified timestamp. Returns the number of databases deleted.
delete_databases_in_range(start_time: float, end_time: float) -> int: Delete all databases overlapping with the specified time range. Returns the number of databases deleted.
get_databases_info() -> List[DatabaseTimeframe]: Get information about all available databases, including file paths and time ranges.
shutdown() -> None: Gracefully shut down the database, stopping workers and closing connections.

`QueryResult` Class

A generic class to handle query results with error handling.

Attributes:
- data: The result data (if successful).
- error: The exception (if failed).
- success: Boolean indicating if the query was successful.

Usage:

result = db.execute_write_sync("INSERT INTO logs (timestamp, message) VALUES (?, ?)", (time.time(), "Test"))
if result.success:
    print("Success:", result.data)
else:
    print("Error:", result.error)

Exceptions

DatabaseError: Base exception for all database errors.
QueryError: Exception raised when a query fails.

Error Handling

SlidingSQLite provides robust error handling through the QueryResult class and custom exceptions. Always check the success attribute of a QueryResult object and handle potential errors:

result = db.execute_read_sync("SELECT * FROM logs", timeout=5.0)
if result.success:
    print("Data:", result.data)
else:
    print("Error:", result.error)

Common errors include:

Query Timeout: If a query takes longer than the specified timeout, a QueryError with "Query timed out" is returned.
Invalid Query ID: Attempting to retrieve results with an invalid UUID results in a QueryError.
Database Errors: SQLite errors are wrapped in DatabaseError or QueryError.

Best Practices

Always Shut Down: Call db.shutdown() when your application exits to ensure resources are cleaned up properly.
Use Timeouts: Specify appropriate timeouts for synchronous operations to avoid blocking indefinitely.
Handle Errors: Always check the success attribute of QueryResult objects and handle errors appropriately.
Configure Retention: Choose a retention period that balances disk usage and data availability needs.
Monitor Disk Space: Even with automatic cleanup, monitor disk space usage in production environments.
Thread Safety: Use SlidingSQLite in multi-threaded applications without additional synchronization, as it is thread-safe by design.
Optimize Queries: For read operations across many databases, optimize your queries to reduce execution time, especially if the number of database files is large.

Example

Here is a complete example demonstrating multi-threaded usage, including configuration, query execution, and cleanup:

import time
import uuid
import threading
import random
from datetime import datetime, timezone
from SlidingSqlite import SlidingSQLite
import logging

# Set up logging
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    handlers=[logging.StreamHandler()],
)

# Configuration
NUM_WRITER_THREADS = 4
NUM_READER_THREADS = 2
TARGET_OPS_PER_SECOND = 10

# Define a schema
db_schema = """
CREATE TABLE IF NOT EXISTS logs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp REAL,
    message TEXT
);
"""

# Initialize SlidingSQLite
db = SlidingSQLite(
    db_dir="./databases",
    schema=db_schema,
    rotation_interval=10,  # Rotate every 10 seconds for testing
    retention_period=60,   # Keep databases for 60 seconds
    cleanup_interval=30,   # Run cleanup every 30 seconds
    auto_delete_old_dbs=True,
)

def writer_thread():
    while True:
        db.execute_write(
            "INSERT INTO logs (timestamp, message) VALUES (?, ?)",
            (time.time(), f"Message from thread {threading.current_thread().name}")
        )
        time.sleep(random.uniform(0.05, 0.15))  # Target ~10 ops/sec

def reader_thread():
    while True:
        result = db.execute_read_sync(
            "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 5",
            timeout=5.0
        )
        if result.success:
            logging.info(f"Recent logs: {result.data}")
        time.sleep(random.uniform(0.5, 1.5))  # Randomized sleep for natural load

# Start threads
threads = []
for _ in range(NUM_WRITER_THREADS):
    t = threading.Thread(target=writer_thread, daemon=True)
    t.start()
    threads.append(t)
for _ in range(NUM_READER_THREADS):
    t = threading.Thread(target=reader_thread, daemon=True)
    t.start()
    threads.append(t)

try:
    print("Running multi-threaded SlidingSQLite test. Press Ctrl+C to stop.")
    while True:
        time.sleep(1)
except KeyboardInterrupt:
    print("\nShutting down...")
    db.shutdown()

This example demonstrates how to set up a multi-threaded application with SlidingSQLite, including logging, configuration, and proper shutdown handling.

16 KiB Raw Blame History