Interpretation Schemas
Interpretation schemas are a powerful feature that allows you to define how to parse binary data (bytes) or text data (strings) directly in SQL. This enables structured extraction of data from file formats, network protocols, log files, and other structured data sources.
Overview
Interpretation schemas let you:
- Define binary schemas to parse byte sequences (files, packets, etc.)
- Define text schemas to parse character sequences (log files, CSV, etc.)
- Use the
Interpret()function to apply schemas to data - Combine with
CROSS APPLYfor powerful data transformations
Binary Schema Syntax
Binary schemas define how to interpret byte sequences:
INTERPRET binary Header {
Magic: int le,
Version: short le,
Length: int le,
Flags: byte
}
Primitive Types
| Type | Description | Size |
|---|---|---|
byte | Unsigned 8-bit | 1 byte |
sbyte | Signed 8-bit | 1 byte |
short | Signed 16-bit | 2 bytes |
ushort | Unsigned 16-bit | 2 bytes |
int | Signed 32-bit | 4 bytes |
uint | Unsigned 32-bit | 4 bytes |
long | Signed 64-bit | 8 bytes |
ulong | Unsigned 64-bit | 8 bytes |
float | 32-bit floating point | 4 bytes |
double | 64-bit floating point | 8 bytes |
Endianness
Multi-byte types require an endianness specifier:
le- Little-endian (Intel, x86/x64)be- Big-endian (network byte order, MIPS, PowerPC)
INTERPRET binary NetworkPacket {
SourcePort: ushort be, -- Network byte order
DestPort: ushort be,
Length: int le -- Local byte order
}
Byte Arrays
Fixed-size byte arrays:
INTERPRET binary ImageHeader {
Magic: byte[4], -- Fixed 4-byte signature
Reserved: byte[16], -- 16 reserved bytes
Payload: byte[256] -- 256-byte payload
}
Text Schema Syntax
Text schemas define how to parse character sequences:
INTERPRET text LogEntry {
Timestamp: between '[' ']',
_: literal ' ',
Level: until ':',
_: literal ': ',
Message: rest
}
Field Types
| Type | Description | Example |
|---|---|---|
literal | Match exact string | literal ': ' |
until | Capture until delimiter | until ',' |
between | Capture between delimiters | between '[' ']' |
chars[n] | Capture exactly n characters | chars[10] |
token | Whitespace-delimited token | token |
rest | Remaining content | rest |
whitespace | Skip whitespace | whitespace |
pattern | Regex pattern match | pattern '\d+' |
Modifiers
Text fields support optional modifiers:
INTERPRET text Record {
Name: chars[20] trim, -- Trim whitespace
Value: until ',' lower, -- Convert to lowercase
Code: chars[5] upper -- Convert to uppercase
}
Available modifiers:
trim- Trim both endsltrim- Trim leftrtrim- Trim rightlower- Convert to lowercaseupper- Convert to uppercasenested- Handle nested delimiters (forbetween)optional- Field is optional
Using Interpretation Schemas
The Interpret() Function
Apply a schema to data using Interpret():
INTERPRET binary BmpHeader {
Signature: byte[2],
FileSize: int le,
Reserved: int le,
DataOffset: int le
}
SELECT
h.FileSize,
h.DataOffset
FROM #os.file('image.bmp') f
CROSS APPLY Interpret(f.GetBytes(), BmpHeader) h
The Parse() Function
For text schemas, use Parse():
INTERPRET text CsvRow {
Name: until ',',
_: literal ',',
Age: until ',',
_: literal ',',
Email: rest
}
SELECT
r.Name,
r.Age,
r.Email
FROM #csv.lines('data.csv') l
CROSS APPLY Parse(l.Line, CsvRow) r
InterpretAt() for Offset Access
Read at a specific offset:
SELECT
h.Magic
FROM #os.file('data.bin') f
CROSS APPLY InterpretAt(f.GetBytes(), 100, Header) h
Schema Composition
Schemas can reference other schemas:
INTERPRET binary Point {
X: int le,
Y: int le
}
INTERPRET binary Rectangle {
TopLeft: Point,
BottomRight: Point,
Color: int le
}
Error Handling
Interpretation functions throw exceptions on parse errors:
- Use
TryInterpret()orTryParse()for non-throwing variants (returns null on failure) - Error codes follow the ISExxxx format (Interpretation Schema Error)
Common error codes:
ISE001- Insufficient dataISE002- Validation failedISE003- Pattern mismatchISE004- Literal not foundISE005- Delimiter not found
Practical Examples
Reading a ZIP File Header
INTERPRET binary ZipLocalHeader {
Signature: int le,
VersionNeeded: short le,
Flags: short le,
CompressionMethod: short le,
LastModTime: short le,
LastModDate: short le,
Crc32: int le,
CompressedSize: int le,
UncompressedSize: int le,
FileNameLength: short le,
ExtraFieldLength: short le
}
SELECT
z.CompressedSize,
z.UncompressedSize,
z.CompressionMethod
FROM #os.file('archive.zip') f
CROSS APPLY Interpret(f.GetBytes(), ZipLocalHeader) z
WHERE z.Signature = 0x04034B50 -- PK\x03\x04
Parsing Apache Log Format
INTERPRET text ApacheLogEntry {
IpAddress: until ' ',
_: literal ' - - [',
Timestamp: until ']',
_: literal '] "',
Method: until ' ',
_: literal ' ',
Path: until ' ',
_: until '" ',
StatusCode: until ' ',
_: literal ' ',
Size: rest
}
SELECT
a.IpAddress,
a.Timestamp,
a.Method,
a.Path,
a.StatusCode
FROM #text.lines('/var/log/apache/access.log') l
CROSS APPLY Parse(l.Line, ApacheLogEntry) a
WHERE a.StatusCode = '404'
Discard Fields
Use _ for fields you want to skip but not capture:
INTERPRET text KeyValue {
Key: until ':',
_: literal ': ', -- Match but don't capture
Value: rest
}
Multiple discard fields are allowed - they’re matched in order but not included in the result.