Git Operations with SQL Queries
There are many Git commands that can be represented using SQL queries in more or less useful ways. This section will progressively showcase examples of such queries.
Working with Patches
To see the overall changes introduced in a given branch, we need to find the branching point
and examine the difference between it and the last commit in that branch. Let’s consider a repository with the following structure:
ff067d6177e3c94970e8ae308412c6d1bed0d42a (HEAD -> feature/feature_1) fiveth commit for feature_1
f3414f10b6cce06910dfb06bd9a96f52a73b9e91 fourth commit for feature_1
0933e9b9846a8bb81b66fd94c0c666c9ed2914fc third commit for feature_1
eac5aec2c1f9ac2dca8c7537226bb9c4fe9ccabd second commit for feature_1
edd4d2c2f95b1d8b6a202ee150ca8277b9cf6373 first commit for feature_1
2cdb0d0d384ba387a0d05c62054139a18f3bda91 (master) initial commit
Using Git Diff
Using the git diff
command, we can calculate the difference like this:
git diff 2cdb0d0d384ba387a0d05c62054139a18f3bda91 ff067d6177e3c94970e8ae308412c6d1bed0d42a --shortstat
Here, 2cdb0d0d384ba387a0d05c62054139a18f3bda91
represents the “branching point” and ff067d6177e3c94970e8ae308412c6d1bed0d42a
is the last commit in the branch.
Using SQL Queries
The equivalent SQL query would be:
with BranchCommits as (
select
r.MinCommit(c.Self) as FirstCommit,
r.MaxCommit(c.Self) as LastCommit
from #git.repository('/some/git/repo') r
cross apply r.SearchForBranches('feature/feature_1') b
cross apply r.GetBranchSpecificCommits(r.Self, b.Self, false) c
group by 'fake'
)
select
b.FirstCommit.Sha,
b.FirstCommit.CommittedWhen,
b.FirstCommit.MessageShort,
b.LastCommit.Sha,
b.LastCommit.CommittedWhen,
b.LastCommit.MessageShort,
p.LinesAdded,
p.LinesDeleted
from BranchCommits b
inner join #git.repository('/some/git/repo') r on 1 = 1
cross apply r.PatchBetween(r.CommitFrom(b.FirstCommit.Sha), r.CommitFrom(b.LastCommit.Sha)) p
This query consists of two main parts:
- A CTE (Common Table Expression) query that retrieves the first and last commit in the branch
- The main query that shows the changes between these commits
Important note: GetBranchSpecificCommits
with the false
parameter returns branch-specific commits including the “branching point” (which doesn’t belong to the branch). You can exclude this commit by changing the flag to true
.
The query returns results in this format:
FirstCommit.Sha | FirstCommit.CommittedWhen | FirstCommit.MessageShort | LastCommit.Sha | LastCommit.CommittedWhen | LastCommit.MessageShort | LinesAdded | LinesDeleted |
---|---|---|---|---|---|---|---|
2cdb0d0d384… | 12/14/2024 21:45:49 | initial commit | ff067d6177… | 12/14/2024 22:02:33 | fiveth commit for feature_1 | 5 | 0 |
Viewing Changed Files
To see which specific files were modified, we can modify the query to use a private table for the calculated patch:
with BranchCommits as (
select
r.MinCommit(c.Self) as FirstCommit,
r.MaxCommit(c.Self) as LastCommit
from #git.repository('/some/git/repo') r
cross apply r.SearchForBranches('feature/feature_1') b
cross apply r.GetBranchSpecificCommits(r.Self, b.Self, false) c
group by 'fake'
)
select
c.Path,
c.LinesAdded,
c.LinesDeleted
from BranchCommits b
inner join #git.repository('/some/git/repo') r on 1 = 1
cross apply r.PatchBetween(r.CommitFrom(b.FirstCommit.Sha), r.CommitFrom(b.LastCommit.Sha)) p
cross apply p.Changes c
This query shows the changes by file:
Path | LinesAdded | LinesDeleted |
---|---|---|
application/main.py | 2 | 0 |
application/meta.py | 0 | 0 |
ci/test_1.py | 3 | 0 |