Today, we celebrate 5 years of VisiData. In honor of the occasion, I thought I'd put together a little retrospective, and some thoughts on what the future might hold.
VisiData is an interactive tabulator. Like VisiCalc is an interactive calculator, and vim is an interactive text editor. It's the ultimate tool for dealing with tabular data: exploring, cleaning, analyzing, and transmuting. VisiData is not a spreadsheet, where cells are the first-class elements. However, it can load and interact with data in spreadsheet format, and many other formats too, even some things you might not think of at first: like filesystem metadata and API results and packet captures. Because everything is data.
I've been working on VisiData since October 2016, and in the beginning, there was only one user (me).
Back then it was a single 1000-line script that had loaders for csv, json, xlsx, hdf5, and could already create frequency tables and join sheets. Many people would be quite happy with just this functionality.
I kept going.
After only a few months, VisiData was already incredibly useful. But I had a lot more ideas, so in 2017 I worked on VisiData extensively, always thinking a 1.0 was just around the corner. The first major release prior to v1.0 was in June 2017, and made it to the top of Hacker News.
I didn't have firm criteria for "version 1.0" was, I just knew I didn't want it to be a version 0 project forever. I mainly wanted 1.0 to be robust and useful without major maintainance.
In Jan 2018, I released version 1.0, and then took a step back. Shortly after 1.0 I started getting some basic usage data. I kept plugging away at it, and Anja (who I met at the Recurse Center in 2017) became involved in the project. Jeremy Singer-Vine, who had discovered VisiData the year before, wrote the excellent Introduction to VisiData tutorial.
Here's what VisiData usage looked like for the first 2.5 years of its existence: from 40 daily M-F users in early 2018, to almost 100 in mid-2019.
During 2019 I worked full-time, but still added functionality to VisiData on the side, or as I needed features for my paid work. These stayed in the develop branch, and only used by a handful of dedicated users.
I wanted to clean up some elements of the API but I was aware of the responsibilities of maintaining a stable platform. So I let the develop branch accumulate changes, and we only made some internal beta releases.
Then 2020 happened, and I stopped working full-time to keep things together. Anja and I put a bunch of effort into cleaning up the API, and we released v2.0 in October 2020.
In 2021, we've continued to put out new releases every few months. The Devottys and I released the xdplayer terminal crossword player and DarkDraw (a Unicode drawing tool). I reorganized most of my VisiData addons into VisiData Plus.
Here's what the usage graph looks like now:
- ~3000 monthly users
- ~300 daily users
- ~30 community members
- ~3 regular contributors
In short, VisiData now has some ~15k lines of code, some thousands of regular users, and is frequently mentioned wherever people talk about tabular data tools and/or the terminal. Its architecture for working with row/column data is powerful, and the interface is remarkably flexible and ergonomic for experienced users. Users regularly call VisiData "delightful", "amazing", and say they "love" it. My favorite response (as for most of my work) is "holy shit".
According to Lindy's Law, the life expectancy of software is proportional to its age. So now that VisiData has been around for 5 years, we can look forward to and plan for another 5 years.
The second graph has 4x daily users on the Y axis. Let's extrapolate this graph another 4x and another 5 years:
This graph makes even a "modest" 4x usage growth seem unlikely. But consider this next graph, which is the view in 2019 (2.5 years old and 2.5 years ago), until today), with the axes expanded to what they are today:
Its growth to today would have seemed as unlikely then, and yet here we are.
What will VisiData usage look like in 5 years? If it continues to double every year, then in 5 years that would be 32x; possibly exceeding 10,000 daily users! That seems a little excessive. But with millions of terminal users and millions of Python users and millions of data professionals who could benefit from a quick dive into a new dataset and whip it into shape, it's not impossible.
Like many powertools, VisiData has so much functionality that it can be offputting to new users. Even though a handful of commands would already provide a large amount of power and flexibility, new users often bounce off of it because it just seems overwhelming.
So a major and ongoing effort, is to make VisiData even easier to use, while not degrading the experience for power users, and in fact guiding new users into becoming power users without a lot of effort. The new menu system in v2.6 is a big step towards better discoverability. Next on the list will be a sidebar for "wizard" commands like join, group, and plot; a more holistic and guided fancy chooser.
VisiData is great for dealing with a few million rows of data. But more than that, and it starts to buckle. Modern hardware and databases can readily work with 100 million rows on a single machine if they are stored and accessed correctly. In the next few years, VisiData will be able to access these larger datasets without having to load everything into memory. Imagine being able to browse a BigQuery table using
Shift+F for a Frequency Sheet and
Enter to dive into a subset, and then copying the resulting SQL into a notebook!
Like Emacs is to text, VisiData is to tables. If my wildest ambitions were fulfilled, VisiData would be the solution in the following analogy:
Emacs : text : Lisp :: VisiData : tables : Python
and with that in mind, there's lots of room to grow!
As always, what makes VisiData so great to use, is the absurd amount of polish and attention to detail. One by one, we find the frictions and try to remove them. The minor inconsistencies. The edge cases. The little things that are deemed "not worth it" in most software.