Porting NetBSD userland, infrastructure improvements

2020-08-17

As noted previously, I have some issues with the complexity of GNU autotools and the bloat that it introduces into building what should be very simple programs. I pointed out coreutils as one of the worst offenders at that time, and have been exploring ways to either port coreutils to a simpler build system or outright replace the package.

In context, there is nothing particularly wrong with the utilities themselves and functionally they work as expected. One benefit that coreutils provide over almost every other implementation is improved localization. However, there is complexity that is at times not really justified. Let's take the utility program "true" as an example, whose sole purpose is to do nothing and exit with success. A simple C program can be constructed to perform this function perfectly well with the following code:

#include <stdio.h>
int main() { return 0; }

This simple code compiles to a 16k executable (stripped) and does everything that we need it to do. But we can do even better actually, with a single line shell script:

#!/bin/sh
exit 0

This little gem reads as a whopping 4k on disk on my machine and also fulfills exactly what we expect the program to do. So why exactly is the file true.c in coreutils 80 lines that compiles to a 40k executable after stripping?

I had at one point tried replacing coreutils and util-linux completely with the sbase and ubase packages from suckless.org. The main issue with doing this is that not all of the utilities are feature complete yet. They are fine for day to day use navigating around in a shell environment, but begin to fail when running scripts that are meant to be portable or building software, when an unsupported flag causes the utility call to fail with an error. I do have a branch containing the entirety of sbase and ubase with the source reconfigured to match our build tree layout of one program per directory, and building using hhl.cprog.mk.

I have always had a lot of admiration for BSD systems and ran FreeBSD as my main OS on several machines for a number of years. Too much is actually made of the differences between a BSD userland nad a GNU userland, as the vast majority of the time they are functionally equivalent. As mentioned previously, the GNU utilities have better localization. They also accept GNU long options. I have never found the lack of long options to be an issue; on the contrary short options are faster to type and are generally the go to choice for those familiar with the shell interface. Therefore I decided to start working on porting BSD userland to Linux with Glibc, using our HitchHiker build system. As there are a great many small utilities this is a process that isn't going to happen overnight, but has already yielded a surprising amount of success with modest effort.

There are a few projects already in place that do something similar, such as lobase, that approach the issues of missing functions and macros by creating a compatibility library and then linking the utilities to it. While this is a perfectly valid approach, it has the drawback of increasing code size somewhat. I have also found so far that simply removing macros that don't exist on a Glibc system is enough to get the code to compile with a simple gcc -Wall file.c. At other times one can simply translate from one function to another, for instance from strlcpy to strncpy, the latter which is available with Glibc and does very near the same thing.

Without getting too further involved in the details, so far I have working copies of the utilities apply, banner, basename, cat, dirname, grep and tr taken from NetBSD. Grep was previously a separate package that has been removed. We previously already had the awesome pax utility imported from MirOS, and have now removed GNU tar in favor of a simlink to pax, which functions very well as a tar replacement. Additionally, the utilities true, false, and which have been replaced with one line shell scripts (which is a zsh builtin, which we can call from another shell with a one line zsh script).

The result so far is a mixed userland that is predominately composed of GNU utilities with a sprinkling of BSD licensed utilities thrown in. The endgame is to replace the bulk of the GNU utilities with ports of the BSD versions, including many utilities such as apply, banner and pax that do not traditionally even exist on a Linux system. I may supplement this over time with utilities taken from sbase, ubase, or lobase as difficulties are encountered in porting the NetBSD utils, but would like to do as much straight porting against the actual system libraries as possible, as opposed to linking against compatibility libraries.

On to the build tree infrastructure improvements mentioned in the post title. A few posts back I mentioned standardizing the way that we handle packages that need extra steps beyond "make install". Similarly, I'm working on simplifying the use of build systems other than autotools. One feature of autotools that we leverage by default is that we can build outside of the source tree in a separate object directory. This is the default, but not all packages support this. There are even a few packages out there that mostly mimic autotools from an end user perspective, having a home-brewed configure script, but generally do not support building in an object directly. So now by simply setting the variable ${no_objdir} we can handle those cases in a unified way, and our build system knows that ${objdir}/.dirstamp is not a dependency of the configuration stage for instance.

Additionally, some packages eschew any kind of configuration step entirely and build just by running make. We now handle this in a more unified manner as well, rather than on a per-package basis, by setting ${use_configure} to false.

These changes are minor, but are going to go a long way towards implementing a ports tree with a more concise and understandable codebase.


Tags for this post:

NonGNU Porting