October 31, 2020

Producing histograms in terminal

§ tech     # cli histograms awk gnuplot

Quite often I end up with the need to quickly visualize information in a terminal as a histogram or a chart of some sort. Here are three methods I use most often.

Gnuplot

The first thing that comes to mind when it comes to charts is gnuplot – a versatile tool to produce all kinds of charts and graphs. The cool feature of it is the support for dumb terminals so you can easily have charts like this:

                                       ping google.com
     55 +---------------------------------------------------------------------------+
        |       +        +       +        +       +        +       +        +       |
        |                                                                           |
     50 |-+                                                                 *     +-|
        |                                                                   *       |
        |                                         *                        * *      |
        |                                        * *                       * *      |
     45 |-+                                      *  *                     *   *   +-|
        |                                       *   *                     *   *     |
        |                                       *    *                   *     *    |
     40 |-+                                    *      *                  *     *  +-|
        |                                      *       *                *       *   |
 ms     |                                     *         *               *       *   |
     35 |-+                                   *          *             *        * +-|
        |                                    *           *             *         *  |
        |                                    *            *           *          *  |
     30 |-+            *****                *              ***        *           *-|
        |          ****     **              *                 *      *            * |
        |**********           **           *                   **    *             *|
        |                       *          *                     ** *              *|
     25 |-+                      **********                         *             +-|
        |                                                          *                |
        |       +        +       +        +       +        +       +        +       |
     20 +---------------------------------------------------------------------------+
        0       1        2       3        4       5        6       7        8       9
                                            count

I’ve generated this graph with the following command:

$ ping -c 10 google.com -i 0.2 | awk '/time=/{ print $(NF-1) }' | cut -d= -f2 | \
  gnuplot -e \
  "set terminal dumb size 90, 30; set autoscale; set title 'ping google.com';
   set ylabel 'ms'; set xlabel 'count'; plot '-'  with lines notitle";

A little bit more useful example:

$ sar | awk '/^(09|10)/{print substr($1,1,5), $4}' | \
  gnuplot -e \
  "set terminal dumb size 90, 30; set title '% CPU User'; set autoscale; 
   plot '-' using 2:xtic(1) with lines notitle";

                                       % CPU User
  18 +------------------------------------------------------------------------------+
     |      +      +       +      +      +      +      *******+       +      +      |
     |                                                *       ***                   |
  16 |-+                                              *          *                +-|
     |                                               *            **                |
  14 |-+                                            *               **            +-|
     |                                              *                               |
     |                                             *                  ***           |
  12 |-+                                          *                      *        +-|
     |                                           *                        *         |
     |                                           *                         **       |
  10 |-+                                       **                                 +-|
     |                                       **                              *******|
   8 |-+                                   **                                     +-|
     |                                  ***                                         |
     |                              ****                                            |
   6 |-+                          **                                              +-|
     |                           *                                                  |
     |                          *                                                   |
   4 |-+                       *                                                  +-|
     |                         *                                                    |
   2 |-+                      *                                                   +-|
     |                       *                                                      |
     |*********************+*     +      +      +      +      +       +      +      |
   0 +------------------------------------------------------------------------------+
   09:00  09:10  09:20   09:30  09:40  09:50  10:00  10:10  10:20   10:30  10:40  10:50

Important difference here is that we’re using xtic for x tic labels. Let’s look at the input data sample:

$ sar | awk '/^(09|10)/{print substr($1,1,5), $4}'
09:00 0.44
09:10 0.44
09:20 0.41
09:30 0.37
09:40 6.37
09:50 7.35
10:00 9.58
10:10 17.54
10:20 16.42
10:30 12.82
10:40 9.33
10:50 8.85

Essentially, we have x tic labels in column one and the actual data in column two. We’re letting gnuplot know about this with using 2:xtic(1) instruction.


The key is to remember (or write down) this skeleton command:

gnuplot -e "set term dumb 120, 30; set autoscale; plot '-' with lines notitle"

It’s enough to draw a simple chart with a single data series. To account for xtic labels you just add using 2:xtic(1) to plot instruction.

Finally, for histogram data use set boxwidth 0.2; plot ... with boxes.

perl one-liner

gnuplot is an incredibly powerful tool, but often I find myself looking for a quick and dirty histogram-like representation for data. I don’t want go about fetching data from remote system to process it with gnuplot on my laptop, nor am I willing to install gnuplot on the remote system. In such situations I refer to a very old perl-based approach, thanks to the fact that perl is installed almost everywhere.

Let’s return to the above example with CPU usage information from sar utility. Taking in the same input, we can do this:

$ sar | awk '/^(09|10)/{print substr($1,1,5), $4}' | \
  perl -lane 'print $F[0], "\t", "=" x $F[1]'
09:00
09:10
09:20
09:30
09:40	======
09:50	=======
10:00	=========
10:10	=================
10:20	================
10:30	============
10:40	=========
10:50	========

It’s amazing how simple this command is for the results it produces! When the raw data numbers are to big for the terminal width, it’s easy to add scaling with ($F[1]/<scale_factor>), for example:

perl -lane 'print $F[0], "\t", "=" x ($F[1]/5)'

Sparklines

The third method for generating histograms and charts in terminal that I wanted to mention is sparklines. I first discovered it when I learnt about spark tool. It’s a simple bash script that you can use to generate Tufte’s sparklines.

Here’s an example (earthquakes and their magnitudes in the last 24 hours):

$ curl -s https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.csv | \
  sed '1d' | \
  cut -d, -f5 | \
  spark
▃█▅▅█▅▃▃▅█▃▃▁▅▅▃▃▅▁▁▃▃▃▃▃▅▃█▅▁▃▅▃█▃▁

— `If you knew Time as well as I do,' said the Hatter, `you wouldn't talk about wasting IT. It's HIM.'
$ Last updated: Feb 7, 2021 at 13:38 (EET) $