Performing Git Repository Analysis with PowerShell

It can be challenging to maintain a repository after it's been in use for months, through iterations, and by more and more teams. The online interfaces to team Git repositories - GitHub, Azure Repos, etc. - have several views into the current state of your repo.

  • Who last committed to that branch
  • How old is the branch
  • Quick buttons to delete stale branches

Even with these tools, it may not be easy to identify which branches truly require cleanup. In this article, we'll explore how to use PowerShell to assist with targeting and operating on Git branches that meet criteria we establish.

To demonstrate the commands being used, I will use PowerShell from within a Jupyter notebook by running a container authored by GitHub user @jaykul - jaykul/powershell-notebook-base.

Setting Up For This Demo

First I followed the instruction from the docker image's README, slightly adjusting for my environment.

docker run --rm -Pd jaykul/powershell-notebook-base

I was able to get the Jupyter installation token with an exec command:

docker exec container_name "jupyter notebook list"

Lastly, since Git was not installed in my container, I installed it with:

docker exec -u root container_name "apt update"
docker exec -u root container_name "apt install -y git"

Now we're ready to explore Git with the PowerShell notebook.

Initializing An Example

I will be using the repository at microsoft/angle on GitHub, because it is publicly accessible, and has a few branches to explore. You would replace this repository with the one you are working with.

In the following commands, you may already have a repository, or would simply git clone, but these are here to allow our example to be rerun from the top.

In [1]:
set-location $env:HOME
In [2]:
bash -c "rm -rf angle"
In [3]:
bash -c "git -C angle remote -v 2>&1 || git clone https://github.com/microsoft/angle.git 2>&1"
set-location $env:HOME/angle
Out[3]:
fatal: Cannot change to 'angle': No such file or directory
Cloning into 'angle'...

Getting Oriented to the Repository

With a repository in hand, let's see what we're working with. We observe a few branches. I will focus on remote branches, but you can also perform the same operations with local branches.

When setting the $branches variable, I am using a series of commands to trim the output because the Git version in the container is 2.7.4, but with newer versions of Git you can simply run one command:

$branches = git branch -r --format "%(refname:lstrip=3)"
In [4]:
git --version
git branch -r
Out[4]:
git version 2.7.4
  origin/HEAD -> origin/ms-master
  origin/future-dev
  origin/ms-holographic-experimental
  origin/ms-master
  origin/ms-win8.1
  origin/winrt
In [5]:
$branches = git branch -r | grep -v '\->' | sed -e 's;^\*;;' -e 's;^  ;;' -e 's;^[^/]*/;;'
$branches
Out[5]:
future-dev
ms-holographic-experimental
ms-master
ms-win8.1
winrt

Diving In

The example which follows addresses the question of "What branches are old (and how old), and/or have commits not already merged into my comparison branch?"

GitHub and Azure Repos provide a quick view of "commits ahead" and "commits behind" a comparison branch, so we'll recreate the same numbers. To do this, we'll use the git log command to inspect commits from one commit through another (git log commit1..commit2). We'll put one commit per line in the output using --format=oneline, and place all of it into a list variable so that we can get a count.

In the example, I am using the shorthand notation of %{ ... } for foreach-object { ... }.

In [6]:
$compareto = "future-dev"
In [7]:
$branches `
| %{
    $behind = @((git log "--format=oneline" "origin/${_}..origin/${compareto}" ));
    $ahead = @((git log "--format=oneline" "origin/${compareto}..origin/${_}"));
    write-output "${_}`t$($behind.length)`t$($ahead.length)"
  } `
| convertfrom-csv -delimiter "`t" -header "branch", "behind", "ahead" `
| sort-object -descending behind
Out[7]:
branch                      behind ahead
------                      ------ -----
ms-holographic-experimental 3      2016 
ms-master                   3      3938 
ms-win8.1                   3      2221 
winrt                       2065   491  
future-dev                  0      0

Fixing Up

The previous command wrote line-by-line the values for branch name, number of commits behind, and number of commits ahead. By using tab literals `t and convertfrom-csv, the output can be formatted into objects that we can work with later in the pipeline.

However, when sorting descending by the number of commits behind, the numbers are treated as strings, so "3" is sorted higher than numbers in the thousands.

To clean this up we can use select-object to recast our numeric fields before sorting.

In [8]:
$branches `
| %{
    $behind = @((git log "--format=oneline" "origin/${_}..origin/${compareto}" ));
    $ahead = @((git log "--format=oneline" "origin/${compareto}..origin/${_}"));
    write-output "${_}`t$($behind.length)`t$($ahead.length)"
  } `
| convertfrom-csv -delimiter "`t" -header "branch", "behind", "ahead" `
| select-object -property branch,
  @{name='behind'; expression={[int]$_.behind}},
  @{name='ahead'; expression={[int]$_.ahead}} `
| sort-object -descending behind
Out[8]:
branch                      behind ahead
------                      ------ -----
winrt                         2065   491
ms-holographic-experimental      3  2016
ms-master                        3  3938
ms-win8.1                        3  2221
future-dev                       0     0

Conclusion

Commands in Git have options to format their output and produce data-rich textual output to the command line. PowerShell is an expressive shell language which allows quick sorting, filtering, and fluid operation on rich datatypes. Combining the two makes quick work of performing repository maintenance, producing reports, and integrating your repository with other tools.

Using calculated metrics like commits ahead or behind creates the ability to automate process rules you might implement with your team. With all the rich data available through Git, there are bound to be several devops workflows your team can automate with PowerShell.

Return to the AIS Blog at https://www.appliedis.com/performing-git-repository-analysis-with-powershell/.