15 March 2024
Downtime Projects
My usual over-arching workflow is 1) Do a big main project, 2) Release of said project, 3) Take some downtime and work on smaller stuff. Well I’m basically in step 3 now that Benben 0.4.x is out, and if we ignore my Doom/Quake maps for a moment, I basically have two projects on the table for this downtime:
- Updating midi123 to have the same sort of interface as Benben. This would involve making the player multi threaded, then adding the S-Lang TUI. Partially done already.
- I’m scratching an itch I’ve had for probably over 10 years now and writing my own system monitoring daemon and clients1.
The first isn’t too exciting. midi123 is basically feature complete and just gets polish once in a while. Since Benben’s overall design is proven, it’s really just a matter of fighting procrastination, and porting the architecture over to midi123. So it won’t be that big of a project.
The second is more interesting…
I usually use two instances of GKrellM, one to monitor my desktop, and a second that connects to a daemon to monitor my server. I’ve tried other monitors in the past and I just keep coming back to GKrellM because 1) it shows me what I want, 2) it’s easy to configure, 3) IT PURDY (in that older skeuomorph way that I love so dearly). But, I often have more than one server up and running, and having a bunch of GKrellM instances on my desktop is kinda awkward to handle. Also, GKrellM is still using GTK2, and though I have no problems with this (it’s still probably my favorite UI toolkit), I’m not blind to the change going on around me. With the future of the Linux desktop looking quite different, I gotta start thinking about how to future proof myself and my setup. GTK2 has, after all, been EOL since December 2020, and X11 is ever so slowly being sent to pasture, regardless of how much I hate to admit it2. Plus I just like to do new things to learn and because it’s fun ^_^
So, I’ve been thinking about how I’d go about designing a monitoring package the architecture and I’ve decided on a solid foundation.
- Separated client and server. Always. The server is what gathers the information, and clients receive it from the server so they can display it.
- The server is made up of “monitors”, where each one is designed to monitor a specific thing. Each one does its monitoring on its own thread, and updates its information every X tenths of a second (configurable). All their info is accessible in a thread-safe way, preferably using lock-free atomics when possible.
- The server’s config file dictates what gets monitored.
- A “concentrator” takes all of the monitors, assembles their current data into an Update Protocol Message, and then sends this message to each client. This happens every N tenths of a second. The Update Protocol is a custom binary serialization format.
- Update Protocol Messages can be optionally compressed with LZ4 to save a few bytes of bandwidth.
- Clients listen for messages, then display the data how they wish. I’ll probably go for a snazzy S-Lang interface since I don’t foresee TUIs going away, and I don’t like web apps.
The Update Protocol is pretty simple where you get a short header, then a list
of monitors and their data fields. Each monitor has a unique Type ID (e.g. the
Network Usage monitor has a Type ID of 3), and then a set of data fields. Each
data field has a Data Type ID, followed by its data. To implement this, I’ve
elected to define a single monitor as a class that inherits from MONITOR
and
uses a special MONITOR-CLASS
metaclass to manage a “Type ID”. Then I use
CL-SDM’s “Pseudo Enum” feature to define the data types, and set up some library
functions for serializing the fields properly. I figure this way, I can update
the protocol separately from the monitor definitions themselves while I work on
it (though officially, the fields the monitors include are part of Update
Protocol).
Similar to Protocol Buffers, there’s no semantic meanings transferred with the data - it’s just raw data. The only way to interpret it is to know which version of Update Protocol is being used. For this reason, I plan to have a short preamble sent when a client first connects that lists the protocol version, as well as a few other things that probably won’t change while the system is running (like its CPU architecture). This will also allow a newer client to still connect to older servers since newer clients can just fall back to using an older Update Protocol version.
The LZ4 compression I have planned may get thrown out before I’m done. Right now, the total uncompressed size of an Update Protocol Message is 289 bytes, which includes the header, uptime, load averages, active and total processes, and data on three separate network interfaces plus their names. If I save this raw data to a file, then manually compress it with LZ4 at compression level 9, it becomes 143 bytes, 49.48% the original size. The size of each message will of course grow I add more monitoring data, but I’m not sure how much it’ll grow and if its growth will be worth the extra few CPU cycles. Though still, LZ4 compression is extremely cheap, so it may be a non issue. I need to do some more testing first.
Anyway, I already have a basic prototype server with three types of monitors (uptime, load averages + procs, and network usage) up and running in Common Lisp and it seems to be working perfectly fine. For now I’m just printing the data out in the SLIME repl in a loop, but as long as this proves to be stable, I’ll start working on designing a simple test client. I also need to look into how I’ll be encrypting the messages, if encryption will be optional, and how to handle authentication.
Footnotes
- 1: By “system monitor” I mean something that can tell me CPU usage, memory usage, disk I/O stuff, network I/O stats, etc.
- 2: No, I am not a fan of Wayland. It lacks things I need on a daily basis. I do not plan to move to it until I absolutely, positively have to.