Who can use this?
Core | Plus | ||
✓ |
Data aggregation
Aggregation is a powerful function that returns a single value based on a set of values. A common use case is generating results based on aggregating values as you would in SQL. You should be familiar with the GROUP BY clause in SQL to use this feature. Flow provides a robust REST-based syntax that emulates SQL aggregates.
For example, you may want the average rework over the last year or the largest commit ever. Normally, you would have to download all the COMMITS and perform the aggregate function locally in your client. Using Flow's built-in aggregate function you can accomplish this with a single call.
Tip: The churn fields within the API correspond to the rework metric in Flow.
The Flow API provides aggregate functions to spare you from ingesting large quantities of data to perform your own aggregation. This functionality includes:
- Multiple aggregate functions in a single request
- Customizable grouping
- Ordering that includes aggregate fields
Aggregate functions that are currently supported:
- avg – Mean Average
- count – Count
- max – Maximum
- min – Minimum
- std – Standard Deviation
- sum – Sum
- var – Variance
Basic API usage
Let’s look at an example of what a basic (but common) request might look like: we want to count the number of new lines of code in the last 30 days by User Bob Smith who has a user_id of 12345. In order to accomplish this without the benefit of aggregates, we would likely need to write a function that first collects all of Bob’s commits documents by iterating through numerous sets of results using limit and offset in individual GET requests. In a second loop, we’d then have to iterate through each commit document and sum the value of the new_work field. That is a lot of work and overhead in order to simply add up a single set of values.
Request
Response:

Advanced aggregate usage
Request:
With this response:
{ “count”: 1 “results”: [ { “new_work_sum”: 8989 } ], “records”: 120 }
You’ll notice that a few things are different:
The .agg suffix on our COMMITS URL resource tells the API that we are going to process this request as an aggregated request.
- The aggregate query parameter key defines which aggregate function to apply, while the value after the equal sign is the field to perform the aggregate function on.
- Our response document has changed. count still reflects the number of items in the result array. records has been added and reflects the number of rows that match our filter criteria. In our example, you can see that Bob Smith had 120 matching COMMITS based on our 30-day filter criteria with a total “new_work” sum of 8,989 lines of code.
- The naming convention for fields in our aggregate results is {field_name}_{aggregatefunction}. In our example, we performed a sum against the field new_work, resulting in the field name new_work_sum.
More complex aggregate API example
Let’s say we wanted to see the count of COMMITS and average new_work lines of code per commit for each user on the team from a particular date forward. We also want this data grouped by the user's alias ID and the unique ID of the Repo.
Request:
Response:
{ “count”: 3, “results”: [ { “apex_user_id”: 12345, “repo_id: 1, “new_work_avg”: 560, “id_count”: 25 } { “apex_user_id”: 12345, “repo_id”: 2, “new_work_avg”: 39 “id_count”: 75, } { “apex_user_id”: 67891, “repo_id”: 1, “new_work_avg”: 14 “id_count”: 50, } ] "records": 150 }
You can see from this example, there are 3 items in the results array. User 12345 has been working on both Repo 1 and Repo 2, while User 67891 is working only on Repo 1. There have been 150 total COMMITS by the two users on our team. Also, note that you can see that the average quantity and distribution of their new_work differs significantly.
Advanced aggregates
These are relatively simple examples. You can use as many aggregate functions as you please and group by as many fields as you like based on the fields made available by each resource.
Additionally, ordering your aggregate results works in the exact same way as the conventional API calls, with the ordering query parameter and a comma-separated list of fields (fields prefixed with - will order descending). If you wish to order by one of your aggregate fields such as new_work_avg, you can do that by saying &ordering=-new_work_avg.
With this in mind, we could easily build more complex queries, for instance:
Request:
Which translates to:
Give me the total number of commits [for the given author_local_date], grouped by user. For each user I want to see the total lines of new work and rework, the average lines of new work and rework, and the most recent commit date of each user, ordered by total commits in descending order.
If you need help, please contact Pluralsight Support.