Musoq Architecture Guide
This document provides a comprehensive overview of Musoq’s architecture, designed for developers who want to understand the internal workings, extend functionality, or contribute to the project.
Table of Contents
- Architecture Overview
- Core Components
- Query Processing Pipeline
- Plugin System
- Implementation Details
- Extension Guide
Architecture Overview
Musoq follows a layered architecture that transforms SQL queries into executable code through a multi-stage pipeline:
graph TD
A[SQL Query Input] --> B[Parser Module]
B --> C[AST Generation]
C --> D[Converter Module]
D --> E[Semantic Analysis]
E --> F[Code Generation]
F --> G[Evaluator Module]
G --> H[Compilation]
H --> I[Execution]
I --> J[Results Output]
K[Schema System] --> D
K --> G
L[Plugin System] --> K
L --> G
Core Components
Parser Module (Musoq.Parser
)
The parser module is responsible for transforming SQL text into an Abstract Syntax Tree (AST).
Key Responsibilities:
- Lexical Analysis: Breaking SQL text into tokens
- Syntax Analysis: Building AST from token stream
- Error Detection: Identifying and reporting syntax errors
- Extension Support: Handling Musoq-specific SQL extensions
Architecture Pattern:
- Recursive Descent Parser: For predictable parsing behavior
- Visitor Pattern: For AST traversal and transformation
- Token-based Processing: Clean separation between lexing and parsing
Key Classes:
// Main parser class
public class Parser
{
public RootNode ComposeAll() // Parse complete SQL statement
public StatementNode ComposeStatement() // Parse individual statements
}
// Lexical analyzer
public class Lexer
{
public Token Current() // Get current token
public void Next() // Advance to next token
}
// AST node hierarchy
public abstract class Node
{
public abstract void Accept(IExpressionVisitor visitor);
}
Schema System (Musoq.Schema
)
The schema system provides abstraction for different data sources and manages type information.
Key Responsibilities:
- Data Source Abstraction: Unified interface for diverse data sources
- Type Management: Handling type information and conversions
- Method Resolution: Finding appropriate methods for operations
- Metadata Management: Caching and managing schema information
Plugin Interface:
public interface ISchema
{
string Name { get; }
// Get table metadata
ISchemaTable GetTableByName(string name, RuntimeContext context, params object[] parameters);
// Get data source
RowSource GetRowSource(string name, RuntimeContext context, params object[] parameters);
// Method resolution
bool TryResolveMethod(string method, Type[] parameters, Type entityType, out MethodInfo methodInfo);
}
Data Flow Pattern:
// Data source implementation pattern
public abstract class RowSource
{
public abstract IEnumerable<IObjectResolver> Rows { get; }
}
public interface IObjectResolver
{
object this[string name] { get; }
object this[int index] { get; }
}
Converter Module (Musoq.Converter
)
The converter transforms the AST into executable C# code through a series of transformation passes.
Transformation Pipeline:
- CreateTree: Initial AST processing and setup
- TransformTree: Semantic analysis and optimization
- TurnQueryIntoRunnableCode: Final C# code generation
Key Transformation Patterns:
// Chain of responsibility for transformations
public abstract class BuildChain
{
protected readonly BuildChain Successor;
public abstract void Build(BuildItems items);
}
// Visitor pattern for AST traversal
public abstract class ExpressionVisitor
{
public virtual void Visit(SelectNode node) { /* implementation */ }
public virtual void Visit(FromNode node) { /* implementation */ }
// ... other node types
}
Evaluator Module (Musoq.Evaluator
)
The evaluator compiles generated C# code and executes queries.
Key Components:
// Compiled query wrapper
public class CompiledQuery
{
public Table Run() // Execute query
public Table Run(CancellationToken token) // Execute with cancellation
}
// Executable interface
public interface IRunnable
{
Table Run(CancellationToken token);
}
Query Processing Pipeline
1. Parsing Phase
// Input: SQL string
var sql = "SELECT Name FROM #os.files('/path') WHERE Extension = '.txt'";
// Lexical analysis
var lexer = new Lexer(sql);
var parser = new Parser(lexer);
// Syntax analysis
var ast = parser.ComposeAll();
2. Semantic Analysis Phase
// Type inference and schema resolution
var buildItems = new BuildItems(ast, schemaManager);
// Apply transformations
var createTree = new CreateTree(transformTree);
createTree.Build(buildItems);
3. Code Generation Phase
// Transform AST to C# code
var codeGenerator = new ToCSharpRewriteTreeVisitor();
ast.Accept(codeGenerator);
var generatedCode = codeGenerator.ToString();
4. Compilation Phase
// Compile generated C# code
var compilation = CSharpCompilation.Create(
assemblyName: "DynamicQuery",
syntaxTrees: new[] { CSharpSyntaxTree.ParseText(generatedCode) },
references: references);
// Create executable
var assembly = compilation.Emit();
var runnableType = assembly.GetType("GeneratedQuery");
var runnable = (IRunnable)Activator.CreateInstance(runnableType);
5. Execution Phase
// Execute query
var compiledQuery = new CompiledQuery(runnable);
var results = compiledQuery.Run();
Plugin System
Data Source Plugin Development
1. Schema Implementation
public class MyCustomSchema : SchemaBase
{
public override string Name => "mycustom";
public override ISchemaTable GetTableByName(string name, RuntimeContext context, params object[] parameters)
{
return name switch
{
"table1" => new MyTable1(parameters),
"table2" => new MyTable2(parameters),
_ => throw new UnknownTableException(name)
};
}
public override RowSource GetRowSource(string name, RuntimeContext context, params object[] parameters)
{
return name switch
{
"table1" => new MyRowSource1(parameters),
"table2" => new MyRowSource2(parameters),
_ => throw new UnknownTableException(name)
};
}
}
2. Row Source Implementation
public class MyRowSource : RowSource
{
private readonly string[] _parameters;
public MyRowSource(params object[] parameters)
{
_parameters = parameters.Cast<string>().ToArray();
}
public override IEnumerable<IObjectResolver> Rows
{
get
{
// Implementation-specific data retrieval
foreach (var dataItem in GetDataItems())
{
yield return new EntityResolver<MyDataItem>(dataItem, MySchemaMapping.Instance);
}
}
}
private IEnumerable<MyDataItem> GetDataItems()
{
// Custom data retrieval logic
// Can be file reading, API calls, database queries, etc.
}
}
3. Entity Resolver
public class MyDataItem
{
public string Name { get; set; }
public int Size { get; set; }
public DateTime Modified { get; set; }
}
public class MySchemaMapping : ISchemaMapping
{
public static readonly MySchemaMapping Instance = new();
public ISchemaColumn[] GetColumns()
{
return new ISchemaColumn[]
{
new SchemaColumn("Name", typeof(string)),
new SchemaColumn("Size", typeof(int)),
new SchemaColumn("Modified", typeof(DateTime))
};
}
}
Function Plugin Development
// Custom function library
public class MyFunctionLibrary : LibraryBase
{
[BindableMethod]
public string FormatSize(long bytes)
{
string[] sizes = { "B", "KB", "MB", "GB", "TB" };
double len = bytes;
int order = 0;
while (len >= 1024 && order < sizes.Length - 1)
{
order++;
len = len / 1024;
}
return $"{len:0.##} {sizes[order]}";
}
[BindableMethod]
public DateTime ParseCustomDate(string dateString, string format)
{
return DateTime.ParseExact(dateString, format, CultureInfo.InvariantCulture);
}
}
Implementation Details
AST Node Structure
// Base node class
public abstract class Node
{
public abstract void Accept(IExpressionVisitor visitor);
public abstract Type ReturnType { get; set; }
}
// Query structure nodes
public class SelectNode : Node
{
public FieldNode[] Fields { get; set; }
public FromNode From { get; set; }
public WhereNode Where { get; set; }
public GroupByNode GroupBy { get; set; }
public HavingNode Having { get; set; }
public OrderByNode OrderBy { get; set; }
}
// Expression nodes
public class FieldNode : Node
{
public Node Expression { get; set; }
public string Alias { get; set; }
}
Code Generation Patterns
// Generated code structure
public class GeneratedQuery : IRunnable
{
public Table Run(CancellationToken token)
{
// Data source initialization
var dataSource = GetDataSource();
// Query execution pipeline
var results = dataSource
.Where(item => /* generated condition */)
.Select(item => new
{
// generated projections
})
.GroupBy(/* generated grouping */)
.OrderBy(/* generated ordering */);
// Result materialization
return new Table(results);
}
}
Error Handling Strategy
// Custom exception hierarchy
public abstract class MusoqException : Exception
{
protected MusoqException(string message) : base(message) { }
}
public class ParseException : MusoqException
{
public int Position { get; }
public string ExpectedToken { get; }
}
public class CompilationException : MusoqException
{
public IEnumerable<string> CompilerErrors { get; }
}
public class SchemaException : MusoqException
{
public string SchemaName { get; }
public string TableName { get; }
}
Extension Guide
Adding New Data Sources
- Create Schema Class
public class MySchema : SchemaBase { // Implementation }
- Implement Row Source
public class MyRowSource : RowSource { // Data iteration logic }
- Define Entity Mapping
public class MyEntityResolver : IObjectResolver { // Property access implementation }
- Register Schema
schemaManager.RegisterSchema<MySchema>();
Adding Custom Functions
- Create Function Library
public class MyFunctions : LibraryBase { [BindableMethod] public ReturnType MyFunction(ParameterType param) { // Implementation } }
- Register Library
runtimeContext.RegisterLibrary(new MyFunctions());
Performance Optimization Tips
- Lazy Evaluation: Use
yield return
in row sources - Predicate Pushdown: Implement WHERE clause optimization in data sources
- Connection Pooling: Reuse expensive resources
- Memory Management: Dispose resources properly
- Caching: Cache expensive computations and metadata
Testing Strategies
- Unit Tests: Test individual components in isolation
- Integration Tests: Test complete query processing pipeline
- Performance Tests: Benchmark query execution times
- Plugin Tests: Validate custom data sources and functions
This architecture guide provides the foundation for understanding and extending Musoq. The modular design enables developers to add new data sources, functions, and optimizations while maintaining system stability and performance.