Introduction
The use case is typical: you find out storage space is running out and look for a way to identify where it's all being consumed.
A lot of people ended up making their own programs for that purpose. Some of them are graphical like the well known WinDirStat on Windows.
Let's review a few nice options for Linux systems in command line.
du
Part of the GNU Core Utils, du stands for "Disk Usage" and is the most accessible and direct utility to get quick storage use info.
From a starting directory you want to analyser, the basic use is:
du -h
Where h stands for "human readable" and will convert the size units to their most readable option (KiB, MiB, GiB, etc.).
However, it will list everything as in, every single directory and subdirectory, making in unpractical when ran from a large directory structure.
The very last line of the output, e.g.:
6.5G .
Is the total size of the current directory.
When only interested by that information alone, it's possible to add the "s" option:
du -sh
When people still want some overview of the directories inside the current directories but limit the output, they use the "d" argument with the value 1 to make it only list the summaries for the first level of directories:
$ du -h -d 1
5.8G ./.git
3.3M ./mm
8.5M ./kernel
200K ./virt
163K ./ipc
415M ./drivers
6.0M ./lib
1.3M ./block
1.6M ./rust
109M ./arch
54M ./tools
1.5M ./samples
60K ./usr
2.2M ./crypto
27M ./sound
21M ./net
564K ./io_uring
41M ./include
75K ./certs
65M ./Documentation
153K ./init
222K ./LICENSES
27M ./fs
2.5M ./security
3.6M ./scripts
6.5G .
One last common recipe is to pipe that into sort and either specify the unit to use for every entry (for instance use MiB everywhere) or take advantage of the "h" option of sort:
$ du -h -d 1 | sort -hr
6.5G .
5.8G ./.git
415M ./drivers
109M ./arch
65M ./Documentation
54M ./tools
41M ./include
27M ./sound
27M ./fs
21M ./net
8.5M ./kernel
6.0M ./lib
3.6M ./scripts
3.3M ./mm
2.5M ./security
2.2M ./crypto
1.6M ./rust
1.5M ./samples
1.3M ./block
564K ./io_uring
222K ./LICENSES
200K ./virt
163K ./ipc
153K ./init
75K ./certs
60K ./usr
The "r" option is to reverse the order and have the largest items appear first.
ncdu
The oldest interactive "disk usage" software is definitely ncdu.
It uses (and derives its name from) the ncurses library for the C language, a classic way to create TUIs.
It can be found in the Debian packages:
apt install ncdu
Notable features
- Interactive
- Old and safe
The main perk is being able to move around the filesystem with arrow keys (or vim direction keys) and quickly getting to find the big files.
For some reason the versions packaged in Linux distributions tend to be quite old. The latest versions of ncdu support multithreading but not the one in the Debian package repository, making it the slowest program of the bunch.
Interesting options
From within an interactive session:
- d — Remove selected directory; Probably better done directly using the command line, be careful
- n — Order by name (also toggle ascending/descending order); Sometimes useful to find something specific
- s — Order by size (default, also toggles ascending/descending order)
du-dust
The first of a series of newer utilities written in Rust.
This one being in the Debian package repository:
apt install du-dust
The actual command to invoke afterwards is dust. It's not an interactive tools and behaves more like a replacement to the traditional du command. Some people link or alias it to actually replace du.
Notable features
- Fast — Not the fastest of the selection, but the difference doesn't really matter
- Automatically limits the lines to display to current terminal height minus a set amount (about 10) — It tries to show as much as it can with the available single-screen space
- Visual representation of space consumption (if enough horizontal space is available)
- Defaults to "human readable" units
By default the output shows the heaviest items last:
Interesting options
- -d — The same max depth option du has, helps fit a lot more on the screen in some situations
- -r — Reverse the order so that the heaviest items appear first (usually what I want)
- -n NUMBER — Max number of lines to output, overriding the default behavior
- -z SIZE — Minimum file size to include in the output
dua
Another tool written in Rust that can both be interactive or one-shot. This one requires manual installation.
The easiest way to install it is from their Github releases page where the "unknown Linux MUSL x86_64" is usually what we want.
NB: To be completely safe it's always better to read the source code and compile it yourself than downloading binaries from Github unless it is from a very trusted repository.
dua also shows files in the output by default but otherwise works like du called with both the -h and -d 1 options.
The TUI is invoked using:
dua i
Interesting features
- Very fast
- The default options in command line mode are usually what we want
- Interactive mode can mark files for deletion
diskonaut
The latest entry for the article. It's another interactive tool written in Rust (yes, there is a trend) but its way of presenting the data in terms of surfaces gives the best visual feedback of them all (well, that might be a matter of opinion).
Has to be manually installed from their Github page.
Select the MUSL build from their releases page.
NB: To be completely safe it's always better to read the source code and compile it yourself than downloading binaries from Github unless it is from a very trusted repository.
Interesting features
- Shows disk usage as squares of relative surfaces, largest first
- Only shows significantly large directories, has very low side noise
- Quite fast
- Easy deletion with the backspace key (though that's somewhat dangerous)
The choice of directory isn't great as the ".git" directory dwarves everything else here but in my opinion that is good information to see right of the bat.
You can then click on a directory to enter it.
Conclusion
We hope you found something you may want to add to your servers or workstations.
As a final remark, do note that even though all of the multithreaded programs are faster than du or ncdu, they can max out your IO capacity and cause disruptions to other server operations.
Try being mindful of the IO pressure being applied when using these tools, some providers even specifically make you pay for IO and none of these tools are "caching" results (though the Linux kernel is in some capacity).