Technical Overview

What is Musoq?

Musoq is a SQL-like query engine that enables developers to query diverse data sources without requiring a traditional database. It transforms SQL queries into executable C# code that can process files, APIs, databases, and other data sources through a flexible plugin architecture.

Core Architecture Principles

1. SQL-First Design

  • Familiar SQL syntax for data querying
  • Extensions for non-relational data scenarios
  • Type-safe query processing

2. Plugin-Based Extensibility

  • Modular data source plugins
  • Custom function libraries
  • Clean separation of concerns

3. Dynamic Compilation

  • Runtime C# code generation
  • JIT compilation for performance
  • Type safety throughout the pipeline

4. Unified Data Access

  • Single query language for multiple data sources
  • Consistent API across different data types
  • Seamless data source composition

Key Components at a Glance

┌─────────────────────────────────────────────────────────────┐
│                     MUSOQ ARCHITECTURE                     │
├─────────────────────────────────────────────────────────────┤
│  Parser     │  Converter  │  Evaluator  │  Schema System  │
│  --------   │  ---------  │  ---------  │  -------------  │
│  • Lexer    │  • AST      │  • Compiler │  • Data Sources │
│  • AST      │    Transform│  • Runtime  │  • Type System  │
│  • Syntax   │  • Code Gen │  • Execute  │  • Plugins      │
│    Analysis │  • Optimize │             │                 │
└─────────────────────────────────────────────────────────────┘

Parser Module

  • Input: SQL query string
  • Output: Abstract Syntax Tree (AST)
  • Purpose: Validate syntax and create structured representation

Schema System

  • Input: Data source specifications
  • Output: Type information and data access methods
  • Purpose: Provide unified interface to diverse data sources

Converter Module

  • Input: AST + Schema information
  • Output: Optimized AST + Generated C# code
  • Purpose: Transform query logic into executable code

Evaluator Module

  • Input: Generated C# code
  • Output: Query results
  • Purpose: Compile and execute queries with runtime support

Data Flow Example

Let’s trace a simple query through the system:

Input Query

SELECT Name, Size 
FROM #os.files('/Documents') 
WHERE Extension = '.pdf'
ORDER BY Size DESC

1. Parsing Phase

// Tokens: SELECT, Name, Comma, Size, FROM, Hash, os, Dot, files, ...
// AST: SelectNode { Fields: [Name, Size], From: FunctionNode { Schema: "os", Method: "files" }, ... }

2. Schema Resolution

// Resolve #os.files -> OSSchema.GetFilesRowSource
// Infer types: Name (string), Size (long), Extension (string)

3. Code Generation

// Generated C# (simplified):
public Table Execute() {
    var source = new OSFilesRowSource("/Documents");
    return source.Rows
        .Where(f => f["Extension"].ToString() == ".pdf")
        .Select(f => new { Name = f["Name"], Size = f["Size"] })
        .OrderByDescending(f => f.Size)
        .ToTable();
}

4. Compilation & Execution

// Compile to assembly, create instance, execute
var results = compiledQuery.Run();

Plugin Development Overview

Creating a Data Source Plugin

// 1. Schema Definition
public class MyDataSchema : SchemaBase
{
    public override string Name => "mydata";
    
    public override RowSource GetRowSource(string name, RuntimeContext context, params object[] parameters)
    {
        return new MyDataRowSource(parameters);
    }
}

// 2. Data Source Implementation
public class MyDataRowSource : RowSource
{
    public override IEnumerable<IObjectResolver> Rows
    {
        get
        {
            foreach (var item in GetMyData())
                yield return new EntityResolver<MyDataItem>(item, MyDataMapping.Instance);
        }
    }
}

// 3. Usage in Queries
// SELECT * FROM #mydata.source('parameter')

Performance Characteristics

Compilation Strategy

  • First Execution: Parse → Transform → Compile → Execute
  • Subsequent Executions: Cache hit → Execute directly
  • Memory Usage: Minimal object allocation during execution

Optimization Techniques

  • Predicate Pushdown: WHERE clauses pushed to data sources
  • Lazy Evaluation: Data processed on-demand
  • Type Inference: Compile-time type checking

Scalability Considerations

  • Data Size: Optimized for small to medium datasets
  • Query Complexity: Handles complex SQL constructs efficiently
  • Plugin Ecosystem: Modular design supports growing data source library

Use Cases and Benefits

Development Scenarios

  • Code Analysis: Query source code structure and metrics
  • Log Processing: Analyze application logs with SQL
  • File System Operations: Directory traversal and file analysis
  • Data Transformation: Convert between different data formats

Operational Benefits

  • No Database Required: Direct data source querying
  • Familiar Syntax: SQL knowledge immediately applicable
  • Rapid Prototyping: Quick data exploration and analysis
  • Integration Friendly: Easy to embed in applications

Integration Patterns

Embedded Usage

var engine = new MusoqEngine();
var results = engine.Execute("SELECT COUNT(*) FROM #os.files('/logs')");

CLI Usage

musoq "SELECT Name FROM #git.commits('/repo') WHERE AuthorEmail LIKE '%@company.com'"

Custom Plugin Integration

engine.RegisterSchema<MyCustomSchema>();
var results = engine.Execute("SELECT * FROM #mycustom.data()");

Next Steps

For comprehensive technical details, implementation guides, and advanced usage patterns, see the complete Architecture Documentation.

Key topics covered in the full documentation:

  • Detailed component architecture
  • Plugin development guide
  • Performance optimization strategies
  • Error handling patterns
  • Testing approaches
  • Extension points and customization options