04 January 2024
Benchmarking The New Crystal Compiler Options (-O0 to -O3)
I noticed that v1.11.0 of Crystal is going to ship with some new compiler options that are meant to control how much optimization is done. This should help compile times a bit, so I decided to check out the git trunk, compile Crystal, and see how it handles Benben. And, at the same time, I decided to check the performance of the Common Lisp version of Benben to see how it’s coming around, and to see if it’s still a good idea to continue working on that port.
For these benchmarks, I used my desktop computer, which has these specifications:
- Intel Core i9-10850K, 10 physical cores, 20 logical cores
- 64 GB RAM
- Slackware 15.0 + a custom kernel v6.6.8
- Far too much disk space on both HDDs and SSDs.
The software had these versions:
- Crystal: git commit e3200d9eb8814e92849503e646325b09647cfe9e (the current WIP v1.11.0)
- SBCL: v2.4.0
- Benben (Crystal): fossil commit 93e7e91c61c31f2738fb9ef5804ea32bd2c727a1f31f3fa04f9b5f482db234e0
- Benben (Lisp): fossil commit b9b57d54983253bf092ca319e2ac6205c210231e6ed8aa1a5c96315bc1c96b00
- YunoSynth: fossil commit dd7d35f6457a78a25f15dbbab958e096d7bce626
- SatouSynth: fossil commit 705f3f0624a216ccbf62ca134047c13c0d680efe (not online yet)
Things to remember:
- YunoSynth will always be in Crystal. Same repo as always.
- SatouSynth is a rewrite of YunoSynth in Common Lisp, and is in a separate repo.
- Benben is getting ported, not forked. Same repo, different branches.
- SatouSynth has about 9k lines of code right now, YunoSynth has 24,205.
The method I used for the Crystal benchmarks was this, in order:
- Remove the Crystal cache (
~/.cache/crystal
) - Remove the
/tmp/foo
directory rake clean
- Build
- Render a pre-defined playlist to
/tmp/foo
using the new binary.
For the Crystal version, I used this when building, adding on the correct -Ox and –single-module, as-needed:
time shards build -p -Dpreview_mt -Dremiconf_no_hjson --no-debug -Dyunosynth_wd40 -Dremiaudio_wd40 -Dremisound_wd40
This is the same as the Rakefile would do for a release build, except I removed
the --release
parameter. Those -Dxxx_wd40
defines are just to enable some
unsafe optimizations in my libraries.
Building Benben for Common Lisp is a bit different right now. I have some
prepare-deps-XXX.sh
scripts that will pre-build some of the dependencies for
either a debug or release build. Then, I use make release=1
or make debug=1
to build the binary. The two step approach is just to match my personal
workflow, and because I can get away with pre-building most stuff with Lisp and
then not rebuild it later. This is just temporary, but it does work.
So, building for Common Lisp was done like this:
- Remove the Common Lisp build cache (
~/.cache/common-lisp
) - Remove the
/tmp/foo
directory make clean
time ./prepare-deps-release.sh
time make release=1
- Render a pre-defined playlist to
/tmp/foo
using the new binary.
Finally, I rendered a playlist containing 79 songs to a directory on an SSD
(/tmp/foo
). The songs were mostly from X68000, with a few MSX2 songs thrown
in as well. These were chosen because both versions of Benben currently support
their chips (I could have thrown in the PC Engine as well, but oh well). So the
chip emulators exercised were the YM2151, the OKI MSM6258, and the AY-1-8910.
Reading the Results
The times are all in seconds. Lower is better.
The “Avg. Samples/sec” column is the average number of audio samples that Benben is rendering per second. Higher is better.
The “Mode” column indicates how Benben was built. For Crystal, this is -Ox
(so -O1
, -O3
, etc, without --single-module
) or -Oxsm
(same, but with
--single-module
added). With Crystal, --release
is the same as
-O3 --single-module
according to the docs, so I just labeled this one release
.
My Common Lisp setup only has two modes right now: release or debug.
Crystal Results
Build Times:
Mode Time O0 9 O1 15 O2 16 O3 33 O0sm 12 O1sm 81 O2sm 87 release 91
Run Times:
Mode Time Avg. Samples/sec O0 204 926,130 O1 93 2,022,830 O2 85 2,228,536 O3 87 2,178,511 O0sm 200 939,526 O1sm 32 5,864,521 O2sm 28 6,830,182 release 27 7,071,019
Common Lisp Results
Build Times:
Mode Time debug 66 release 38
Run times:
Mode Time Avg. Samples/sec debug 120 1,578,634 release 41 4,633,274
Thoughts
So there we have it. The new -Ox
options are… kinda cool, I guess? I can
see myself using -O1
or -O2
to build a quick debug binary during
development, and pretty much abandoning -O0
for most of my use cases. But
overall, the difference between them is pretty underwhelming unless you add
--single-module
. But, adding --single-module
doesn’t look like something I
would ever do because of the increased build times (I’m not counting --release
here). So really, I don’t see the new options being much use for me.
On the Lisp side, it’s interesting to see that the release build took less
time than the debug build. This is pretty neat, though not useful during
development since I’d then lack debug information. The speed of a release build
is quite good, though doing a -O1 --single-module
is enough to inch past it
with Crystal. Still, it’s impressive.
I’m not sure if I’ll use this info to decide what to do yet, or what I’ll even do. So for now, I’ll continue on my present course. Regardless, it was interesting to see the new compiler options for Crystal.