Querying Multiple Repositories at Once

A quite useful feature of the tool is the ability to query multiple Git repositories at once. Sometimes I need to track what I did and when in different projects. Currently, I’m involved in several projects (actively contributing to around ~10), so I need a general overview of what I did in them and when.

Simply listing commits from all repositories

with ProjectsToAnalyze as (
    select
        dir2.FullName as FullName
    from #os.directories('D:\repos', false) dir1
    cross apply #os.directories(dir1.FullName, false) dir2
    where
        dir2.Name = '.git'
)
select
    c.Message,
    c.Author,
    c.CommittedWhen
from ProjectsToAnalyze p cross apply #git.repository(p.FullName) r 
cross apply r.Commits c
where c.AuthorEmail = 'my-email@email.ok'
order by c.CommitedWhen desc

I keep all repositories at the same level, and I identify whether a folder is a repository by checking for the existence of a .git subfolder.

The CTE expression naturally returns matching folders:

FullName
D:\repos\Musoq.DataSources.git
D:\repos\Musoq.git

The second part lists all commits for the found repositories. Of course, in descending order as that’s likely what I’ll be most interested in.

p.RepositoryName c.Message c.Author c.AuthorEmail c.CommittedWhen
Musoq.DataSources Update runtime, rework directories source, new tests for multi repo… puchacz puchala.czwa@gmail.com 11/21/2024 21:50:25
Musoq Merge remote-tracking branch ‘origin/master’ puchacz puchala.czwa@gmail.com 11/13/2024 21:23:20
Musoq Fixes around the documentation puchacz puchala.czwa@gmail.com 11/13/2024 21:23:13
Musoq.DataSources 1. Runtime Update 2. Git library update puchacz puchala.czwa@gmail.com 11/11/2024 15:51:31
Musoq.DataSources further git plugin implementations puchacz puchala.czwa@gmail.com 11/09/2024 00:02:07
Musoq.DataSources 1. Evaluator raised up + changes within git plugin puchacz puchala.czwa@gmail.com 11/06/2024 23:54:05

How many commits do all repositories have

Sometimes we want to learn some basic statistics about repositories - how many commits they have, when was the last commit, who are the authors. We can do this with the following query:

with Repositories as (
    select
        dir2.FullName as GitPath,
        dir2.Parent.Name as RepositoryName
    from #os.directories('D:\repos', false) dir1
    cross apply #os.directories(dir1.FullName, false) dir2
    where
        dir2.Name = '.git'
)
select
    r.RepositoryName,
    repo.Count(c.Sha) as CommitCount,
    repo.Length(repo.Distinct(repo.Split(repo.AggregateValues(c.AuthorEmail), ','))) as AuthorsCount,
    repo.StringsJoin(',', repo.Distinct(repo.Split(repo.AggregateValues(c.AuthorEmail), ','))) as Authors,
    repo.MaxDateTimeOffset(c.CommittedWhen) as LastCommitDate
from Repositories r
cross apply #git.repository(r.GitPath) repo
cross apply repo.Commits c
group by r.RepositoryName
order by repo.Count(c.Sha) desc
r.RepositoryName CommitCount AuthorsCount Authors LastCommitDate
Musoq 529 7 pu…ail.com,pu…om,4969…github.com,jth…com,… 11/19/2024 06:23:14
Musoq.DataSources 70 2 pu…@gmail.com,pu…@gmail.com 11/21/2024 21:50:25

Worth noting here is the somewhat enigmatic:

repo.Length(repo.Distinct(repo.Split(repo.AggregateValues(c.AuthorEmail), ','))) as AuthorsCount,

and

repo.StringsJoin(',', repo.Distinct(repo.Split(repo.AggregateValues(c.AuthorEmail), ',')))

which is simply a workaround for the current lack of support for the distinct operator. I therefore used the aggregating method AggregateValues(...) to join all emails in a given repository, then this aggregated string was split using the Split method which transformed the string into an array of strings, and using the Distinct method our emails were limited to only unique ones. On this we can count either Length or StringsJoin which counts how many elements are in the array and joins the array of strings using a comma character.