Excel GROUPBY Function Example
If you're familiar with PivotTables, you can think of the GROUPBY function as a PivotTable that doesn't have any fields going across the columns. In the image below we can see them side by side:
data:image/s3,"s3://crabby-images/5ba06/5ba064b4d046cebb28d4bfd27d6f302fa6e3d269" alt="compare pivottable and groupby function"
In its simplest form GROUPBY takes the following arguments:
=GROUPBY(
row_fields - Column(s) you want to group by,
values - Column(s) of values to aggregate,
function - how you want to aggregate them,
field_headers - include/exclude headers
)
To demonstrate, I'll use this data formatted in an Excel Table called, Table1:
data:image/s3,"s3://crabby-images/64930/649309ffbddfb97b5416cca5ce09cc020f4b7874" alt="example data in table"
The GROUPBY formula above is referencing the Table using Structured References which makes it relatively easy to read. Here it is again:
=GROUPBY(Table1[[#All],[Country]],Table1[[#All],[Units Sold]],SUM,3)
In English it translates to group the Country column and take the Units sold, sum them and include a header.
See the next section for more on headers.
GROUPBY Function Syntax
The full GROUPBY function syntax is:
=GROUPBY(row_fields, values, function, [field_headers], [total_depth], [sort_order], [filter_array])
The arguments are explained as
follows:
The Row Fields are the columns that contain the values which are used to group rows and generate row headers.
The Values are the columns of data you want to aggregate.
The array or range may contain
multiple columns. If so, the output will have multiple row group levels.
The Function argument is an explicit* or eta reduced lambda (SUM, PERCENTOF, AVERAGE, COUNT, etc) that is used to aggregate values. There's a long list of eta lambdas you can choose from.
data:image/s3,"s3://crabby-images/d3d4c/d3d4ce6a7a5b19a79c13d69f9e9f79a1d5f5c96b" alt="List of function arguments for GROUPBY function"
*You can insert your own custom LAMBDA in this argument. See this post for more on how to write custom Excel LAMBDA functions.
A vector of lambdas can also be provided. In which case the output will have multiple aggregations. The orientation of the vector will determine whether they are laid out row- or column-wise.
Field Headers allows you to specify whether you want to display headers and in what form:
- Omitted: Automatic headers*.
- 0: No
- 1: Yes and don't show
- 2:
No but generate
- 3: Yes and show
*Note: Automatic assumes the data contains headers based on the values argument. If the 1st value is text and the 2nd value is a number, then the data is assumed to have headers. Fields headers are shown if there are multiple row or column group levels.
Total Depth Determines whether the row headers should contain totals. The possible values are:
- Omitted: Automatic: Grand totals and, where possible, subtotals.
- 0: No Totals
- 1: Grand Totals
- 2: Grand and Subtotals
- -1: Grand Totals at Top
- -2: Grand and Subtotals at Top
Note: For subtotals, fields must have at least 2 columns. Numbers greater than 2 are supported provided field has sufficient
columns.
Sort Order is a number indicating how rows should be sorted. Numbers correspond with columns in row_fields followed by the columns in values. If the number is negative, the rows are sorted in descending/reverse order.
A vector of numbers can be provided when sorting based on only row_fields.
Filter Array is a column-oriented 1D array of Booleans that indicate whether the corresponding row of data should be considered.
Note: The length of the array must match the length of those provided to row_fields.
PIVOTBY Function
The PIVOTBY function enables you to generate a summarized version of your dataset with a formula.
PIVOTBY is essentially the same as GROUPBY except it has additional arguments for the columns making it adept at
organizing data across two axes and performing aggregation on the related values.
Syntax:
=PIVOTBY(row_fields, col_fields, values, function, [field_headers], [row_total_depth], [row_sort_order], [col_total_depth], [col_sort_order], [filter_array])
In the example below I've used PIVOTBY to summarise the sales by Segment in the rows, and Products in the columns:
data:image/s3,"s3://crabby-images/938c4/938c48058103a29771e1e62844435655f2a31560" alt="Using PIVOTBY function to summarize
data"
Returning Multiple Columns
You can reference multiple columns in the first argument of PIVOTBY and GROUPBY:
data:image/s3,"s3://crabby-images/6b3c4/6b3c46c10ec18fc6c2a6dd5d93a1db8ed92adf9e" alt="Reference multiple columns in first argument of GROUPBY and PIVOTBY"
Notice I've used 2 in the Total Depth argument to return totals and subtotals.
Tip: for non-contiguous columns use the HSTACK function to join them e.g. I can join the Segment and Product columns with HSTACK:
data:image/s3,"s3://crabby-images/a39c3/a39c334453910c64e6324782c6e989fd29792577" alt="Use HSTACK function to join non-contiguous columns"
Notice it automatically inserts subtotals at each change in group. We'll look at controlling subtotals soon.
Similarly, we can return multiple contiguous value fields as shown below (for non-contiguous fields use HSTACK):
data:image/s3,"s3://crabby-images/7d0d6/7d0d6e499fec33061981678261b09375eb004684" alt="Return multiple contiguous value fields"
Tip: you can also rearrange the order of the columns with HSTACK.
Sorting
The Sort Order argument is a number indicating how rows should be sorted. Numbers correspond with
columns in row_fields followed by the columns in values. If the number is negative, the rows are sorted in descending/reverse order.
A vector of numbers can be provided when sorting based on only row_fields.
In the example below I've sorted in ascending order by Segment then descending order by Country in GROUPBY, but it works the same in PIVOTBY:
data:image/s3,"s3://crabby-images/f3ea8/f3ea80cefdb2055fa585fe0a1cea7ae61c6153ef" alt="Setting the sorting order in GROUPBY"
Filtering
The Filter Array argument is a logical
test applied to a column that returns TRUE or FALSE Boolean values that indicate whether the corresponding row of data should be included.
Note: The length of the array must match the length of those provided to row_fields.
In the example below
I've filtered the formula to exclude 'Midmarket' from the Segment column.
data:image/s3,"s3://crabby-images/7b45c/7b45c9a21233e5321fcc45db61db602b97c4b370" alt="Using Filter in GROUPBY function"
Tip: you can filter on columns not included in the GROUPBY formula result e.g. I could filter based on Country in the above formula.
Connecting GROUPBY & PIVOTBY to Slicers
Slicers are one of the best things about PivotTables, so it'd be a shame not to be able to use them with these new functions. Thankfully,
we can use Slicers with Excel tables and leverage the filter argument to pass the filtered state to the GROUPBY and PIVOTBY functions.
First, select a cell in the Table and add the Slicers you need via the Insert tab > Slicers:
data:image/s3,"s3://crabby-images/85867/85867c51d2d059e6063fcbccf5f118080fb1a927" alt="Insert slicer for table"
Then add a column to your Table to detect if the row is filtered or visible using
the SUBTOTAL function or AGGREGATE function. These functions can include or exclude rows hidden by a
filter.
data:image/s3,"s3://crabby-images/345b5/345b549ff2d1fc5f8038f1ebb89682784879b355" alt="Add column to table to detect if row is filtered"
103 in SUBTOTAL is the COUNTA function that ignores hidden/filtered values. If the row is visible, SUBTOTAL returns 1, which is equivalent to TRUE and if it's hidden it returns 0 which is equivalent to FALSE.
Tip: the formula can reference any cell in the row that will never be empty.
Then simply reference the Filter column in the Filter argument of GROUPBY or PIVOTBY. Visible rows = 1 and are included in the GROUPBY formula:
data:image/s3,"s3://crabby-images/cd4cc/cd4cc9ef2a8780b9de2b2793a0b08bec18b9137c" alt="Reference the filter column in the filter argument of GROUPBY and PIVOTBY"
Now when you make selections in the Slicers, the GROUPBY or PIVOTBY formulas will filter accordingly:
data:image/s3,"s3://crabby-images/9efd7/9efd7753f0f7b68b48a4e3c59dd5ed32a3d5d60b" alt="Control GROUPBY and PIVOTBY filters using slicer"
Automatically Format Total Rows
Another nice feature of PivotTables is their built-in formatting that automatically highlights totals and
subtotals:
data:image/s3,"s3://crabby-images/19b9e/19b9ed2515f2f26e4e36a20e83d880044a69a307" alt="Default highlighting of totals in pivottable"
With GROUPBY and PIVOTBY we can use
Conditional Formatting to replicate this and have it dynamically update with the results of the formula:
data:image/s3,"s3://crabby-images/1b99c/1b99c68800e166de22631f7d88b584fd5662e2ff" alt="Ue conditional formatting to highlight totals"
We can rely on the second column having a blank cell on the subtotal and total rows. All we need to do is detect if the cell is blank and if so, format the row in bold font and a cell border.
To set up a conditional format, go to the Home tab > Conditional Formatting > New Rule > Use a formula to determine which
cells to format.
In the 'Format values where this formula is true' field, select the first cell in the second column and set the absolute reference to the column only and check if that cell is blank:
data:image/s3,"s3://crabby-images/13983/13983b182d57d117bae7a0e918912b475ec16ec0" alt="Creating conditional formatting rule using a formula"
Note: you can't use ISBLANK here because technically the cell isn't blank because it contains a
formula.
Then go to the Format tab and format the font bold and on the Border tab add a border to the top of the cell:
data:image/s3,"s3://crabby-images/f1d82/f1d8216cd98bd40a5b30a83b79d575a33dc3010d" alt="Set cell format for
conditional formatting rule"
Tip: you could create another rule and add fill colour to the Grand Total to replicate PivotTables.
PivotTable Advantages
There is no doubt that these two functions are game changers for grouping and pivoting data in Excel, but there are some good reasons to still use PivotTables:
- Working with Big Data: PivotTables don't require you to bring the data into the grid to summarise it. PivotTables can reference data in queries or external files enabling them to work with a lot of
data with a relatively small impact on the file size. GROUPBY and PIVOTBY require the data in the grid.
- Multiple Aggregations: if you want to add multiple aggregation types e.g. see the data summed, averaged, and counted it's super easy in a PivotTable, whereas writing this into a GROUPBY or PIVOTBY is complicated.
- Backward Compatibility: All versions of Excel support PivotTables. Whereas GROUPBY and PIVOTBY are only available to Excel users with Microsoft 365.
For these reasons, PivotTables will still be an important tool in your Excel skillset. Master PivotTables in my PivotTable Quick Start course.