Grouping data and descriptive statistics
How to set statistical aggregators for a single column
Aggregators provide summary statistics for grouped rows.
Command | Operation |
---|---|
+ aggregator |
adds aggregator to current column |
z+ aggregator |
displays result of aggregator over values in selected rows for current column |
The following statistical aggregators are available:
Aggregator | Description |
---|---|
min |
smallest value in the group |
max |
largest value in the group |
avg /mean |
average value of the group |
mode |
most frequently appearing value in group |
median |
median value in the group |
q3/q4/q5/q10 |
add quantile aggregators to group (e.g. q4 adds p25, p50, p75) |
sum |
total summation of all numbers in the group |
distinct |
number of distinct values in the group |
count |
number of values in the group |
keymax |
key of the row with the largest value in the group |
list |
gathers values in column into a list |
stdev |
standard deviation of values |
The follow howtos will have examples of workflows involving grouping of data and statistical aggregation.
How to create a pivot table
This example uses the file sample.tsv.
- Press
!
on the independent variable to set it as a key column. - Press
+
on a second column to add an aggregate. This configures which column is to be composed in relation to others. - Optional: Set the type for the column being aggregated by pressing
~
(string),#
(int),%
(float),$
(currency), or@
(date). - Press
Shift+W
on the dependent categorical variable to pivot on it.
How to create a frequency chart
The following examples use the file sample.tsv.
How to make a simple histogram
Question How many of each Item were sold?
- Move the cursor to the Item column.
- Press
Shift+F
to open the Frequency table.
How to use the Frequency table to view the results of statistical aggregation
Question What was the monthly revenue?
- On the OrderDate column, type
;
followed by(\d+-\d+)
to create a column with only the year and the month using regex capture groups. - Press
-
to hide the OrderDate column. - On the **OrderDate_re0** column, type
^
followed byOrderDate
to rename it. - On the Total column, press
$
to set its type to currency. - Type
+
followed bysum
to add a statistical aggregator to Total. - On the OrderDate column type
Shift+F
to open the Frequency table. - On the OrderDate column, press
[
to sort the table in chrononological order. - On the **sum_Total** column, type
^
followed byRevenue
to rename the column.
How to calculate some descriptive statistics
- Press
Shift+I
to open the Describe sheet.
How to filter for grouped or described rows
How to filter for grouped rows
- Press
Shift+F
to open the Frequency table. - Press
s
ort
on the groups you are interested in to select those entries in the source sheet. - Press
q
orCtrl+^
to return to the source sheet. - Press
"
to open a duplicate sheet with selected rows.
or
- Press
Shift+F
to open the Frequency table. - Press
Enter
on the grouping you are interested in to open a sheet of the source rows that are part of that group.
How to filter for described rows
- Press
Shift+I
to open the Describe sheet. - Use
zs
to select rows on source sheet which are being described in the cells of interest. - Press
q
orCtrl+^
to return to the source sheet. - Press
"
to open a duplicate sheet with selected rows.
or
- Press
Shift+I
to open the Describe sheet. - Press
zEnter
open copy of source sheet with rows being described in the current cell of interest.
How to filter for the rows aggregated in a pivot table
Enter
/zEnter
can both be used in the Pivot table to open a sheet of the source rows which are aggregated in the current pivot row/cell.
How to combine cells in one for column for matching rows in another
This tutorial shows you how to go from this dataset:
field | value |
---|---|
A | 1 |
A | 2 |
B | 3 |
C | 4 |
C | 5 |
to
field | value |
---|---|
A | 1;2 |
B | 3 |
C | 4;5 |
in VisiData.
- On the value column, type
+
followed bylist
to add to it a list aggregator. - Press
Shift+F
on the field column to open the Frequency table. - Type
=
followed by';'.join(value_list)
to add an expression column with the joined values.