Experimental clang/llvm based build with nightly rust

2020-11-24

Ok, so this little side track has been on a slow burn for a while now, but I've currently got a rootfs that I can chroot into with the following features:

  • Built almost entirely with llvm/clang (see caveats below)
  • Rust nightly compiler installed early in the build, accessible for all users
  • Directory hierarchy flattened, directories in /usr are symlinks to directories in /
  • libc++ standard C++ library from the llvm project instead of libstdc++ from gcc
  • Both GNU binutils and the lld linker installed (default is still binutils ld for now)
  • x86_64 only at this time, but should be portable to Arm Some caveats:
  • llvm/clang wants to build against libstdc++, so not self hosting unless we go back to libstdc++
  • There were issues building the compiler-rt module, so llvm and clang link against libgcc
  • The GNU m4 package will not build with clang, so it had to be built with the gcc compiler from the temporary toolchain. I'm sure this is fixable, but I haven't researched it much yet.
  • The biggie: glibc refuses to try to build with any compiler except gcc. The. Sheer. Fucking. Hubris. ...again... Nevertheless, this is all quite promising and brings up a number of possibilities going forward. As llvm by default supports multiple backends for code generation, a huge benefit is that HitchHiker would come out of the box with cross compilation ability. Also, including rust by default gives us the ability to use Rust code in the base system, with numerous potential benefits that I will explain below.

The directory hierarchy is simple to understand, but is reversed in implementation from what the "big" distros have done. Basically most "modern" distros have ditched the /usr split by making /bin and /lib into symlinks to /usr/bin and /usr/lib. In this build we instead just give all programs a blank install prefix. Our root directory now takes not only all of our binaries and libraries but also has program data under /share, and we only have /usr because it is expected. everything in /usr is, in fact, just symlinks.

Rust is installed in a novel way using rustup. As root, we set the RUSTUP_HOME and CARGO_HOME environment variables to /rust, and make /rust/bin a symlink to /bin. That way, we can track a nightly toolchain system-wide and it will install binaries into /bin, for all users to access. We could, of course, build rustc from source, and I have in fact done so previously. However, apart from the ridiculous compilation time that it takes, this is not a great option until the language is truly stable, as a lot of the interesting bits if the Rust ecosystem need something newer than the stable branch. By using rustup and the binary toolchains, we always have access to the most up to date toolchain, and the ability to easily switch from one to another.

So what are the benefits of having rust available early on? For starters Rust can access C functions and export functions to C, making it relatively easy to replace parts of the software stack previously written in C with Rust code that enjoys the memory and concurrency safety checks which are not present in C. As an example, relibc is a complete C standard library written in Rust. This alternative C library is a potential future path of investigation. But on a smaller scale, there are a number of interesting cli utilities and programs written in Rust.

  • sd is a stream editor similar to sed, but much faster and with a much easier syntax and reduced scope.
  • fd is a file finding utility which is orders of magnitude faster than find, and has interesting features like skipping .git folders by default.
  • exa is a file lister with some unique features over ls
  • The amp editor, while somewhat vi-like, has interesting extra features fuzzy file finder and a token based jump which make moving around even faster than vim. I have also begun, partially as a learning exercise, writing my own Rust implementations of various small Unix utilities. In the future some may find their way into HitchHiker if we retain Rust in the base system.

While this is still in an early evaluation phase, I think it likely that at the very least we will make the switch from gcc to clang/llvm at some future point, with the possibility of bringing some of the other features along as well. What may not make the cut is libstdc++, as replacing the C++ standard library is almost as problematic as replacing the C standard library, leading to the need to port a large amount of third party software in the future. However, I will likely be going further with Rust, and with the flattened filesystem, as I want HItchHiker to be a cutting edge distro while still adhering to traditional Unix practices. Basically, we're going to push the envelope when it makes sense to do so, but not follow fads and trends in the Linux community that might very well be dead ends (like I believe Systemd in particular to be).


Tags for this post:

C Programming Rust llvm Clang NonGNU Packages