Linux and Tiny C Compiler in the browser, part one
Introduction
Current C compilers running in the browser are experimental, though Clang In Browser is pretty impressive. Instead of porting a compiler to WASM, I'm going to take a different approach and use my favourite method for a lot of things: virtual machines. It's slower, especially since I'm using a JavaScript cpu emulator, but decent performance is possible with a fast compiler like Tiny C Compiler and a custom Linux.
Demo
Try cat /opt/test.c
and tcc -run /opt/test.c
Motivation
I could sit for hours back in the days and tweak the Linux kernel on my Pentium-something, in an attempt to make the system boot faster. Most of the time I just broke things and had to recompile Gentoo. But there's rarely a need today to compile Linux; if you need something barebone, you probably use Docker with Alpine Linux. Compiling Linux is still useful in the embedded space, and with a c compiler in the mix you get to learn the basics of how programs work.
In the mean time, unikernels such as MirageOS and Unikraft have surfaced as a supplement or even alternative to Docker. One of the ways they differ is that your code is compiled into an operating system instead of running on top of Linux. Imagine you could compile Linux into your code, having dead code elimination on every feature you don't use! The sales pitch is this: reduced attack surface, fast boot times and better performance. Building a custom Linux then becomes even more exciting because unikernels borrow many concepts from Linux, eg. Unikraft is configured in the same tui as the Linux kernel (and Buildroot), make and gcc are used to a great extent, and you can choose between multiple libc implementations, but what exactly is that?
So...
What to expect
This tutorial teaches you how to compile a small Linux image for running in the browser via v86; a 32-bit x86 cpu emulator in javascript. You get insights into cross compilation with a modern implementation of the c standard library, and c internals when we add a fast compiler to the image. Remote debugging via gdb is described in the end using gdbserver, virtual serial ports and qemu.
Prerequisites
Linux that is not wsl, at least an hour to spare for compilation and the following packages needed by Buildroot:
I've built this on Ubuntu 20.04 and 22.04 using bash, but most modern distro's should be fine.
Before you start, create a directory for the project, perhaps ~/my-v86-linux, then cd into that and run all commands from there. All commands will assume you are in that directory. Name it whatever you like, and it doesn't have to be in ~/.
The v86 CPU emulator
v86 runs in the browser and emulates an x86-compatible cpu and hardware where machine code is translated to WebAssembly modules at runtime. The list of emulated hardware is impressive:
- x86 instruction set similar to Pentium III
- Keyboard and mouse support
- VGA card
- IDE disk controller
- Network card
- virtio filesystem
- Sound card
View the full list of emulated hardware in v86's readme.
You're not limited to Linux on this emulator. It runs Windows (1.01, 3.1, 95, 98, 2000), ReactOS, FreeBSD, OpenBSD and various hobby operating systems.
v86 is a hobby project written by an anonymous developer under the pseudonym "copy". Previous work according to copy's webpage includes an impossible game, Game of Life and a brainfuck interpreter written in javascript.
Buildroot
Buildroot is a tool to generate embedded Linux systems through cross compilation. It's a huge work effort of cross compilation scripts and configuration files put together in a nice terminal ui, and you can tweak just about anything. It also acts as customizable toolchain, that provides us with all the necessary tools to cross compile applications that doesn't come in Buildroot packages. Read more on https://buildroot.org.
Let's get started.
cd into your project directory, then download and extract Buildroot:
Hint: Tab through commands and copy instead of using your mouse.
Instead of building Linux from the default Buildroot configuration, we use a template that sets the right cpu and architecture among other things:
Remove commands that compress licenses. We'll get to that later.
Tell Buildroot to create a new .config file with preloaded settings from the template
You're almost ready to build the initial image. Execute:
Go to Toolchain -> C library
and pick musl, exit and save. Then build
everything
This is going to take a while, but the good thing is that caching is enabled, so next time will be substantially faster.
About musl... It's an implementation of the c standard library, like uclibc and glibc. Your distro is probably using glibc, the GNU C Library, which is big in size and not well suited for embedded Linux where size matters. uclibc is better suited here, and so is musl which seems to be the clear winner in this (biased) comparison. I prefer musl's MIT license over (L)GPL, which makes it interesting for proprietary applications running in unikernels. It's developed and maintained by Rich Felker, with a long list of contributors, and the source code is said to be a reference code to look into for systems programming in this podcast (at 01:01:17).
Preparing the website
While waiting for Buildroot to compile, let's create the website that will host the emulator and run Buildroot Linux:
When Buildroot is done compiling, run
Then open a new terminal and start a simple webserver pointing to the web directory, eg.
and open http://localhost:8000 to see v86 in action. Log in as root, no password needed.
Customize your image
Buildroot is all about customization. Try the following commands:
There's a lot to explore.
menuconfig
menuconfig is where you configure Buildroot with things such as Linux kernel version, what bootloader to use (grub2, syslinux etc.), the libc library you want to use like when you chose musl and which architecture to compile for. There are multiple packages to choose from, ranging from small libraries and utilities to X11 and Qt.
busybox-menuconfig
Busybox combines hundreds of Linux utilities into one binary and is also highly configurable with busybox-menuconfig. It provides you with ls, grep, diff and many other utilities you're used to on Linux, and I'd encourage you to remove all the tools you don't use to create a smaller image. Ideally Busybox would come with the bare minimum instead of having to manually remove unnecessary things. This is where unikernels shine, because they take the opposite approach, where you start with almost nothing and add what you need.
linux-menuconfig
linux-menuconfig is where you configure the Linux kernel. There's a million things to configure, and you can easily break something unless you know what you're doing. In one of the following tutorials for this series, I'll show you how to tweak the kernel by trial and error, since that's how I do it: Remove one feature, test the system, rinse and repeat.
Resist the temptation to make changes for now.
rootfs_overlay
Located in buildroot-v86/board/v86/rootfs_overlay, this is where you place files that you want to add to the image. Our template includes two files: etc/fstab and etc/inittab.
Disable kernel messages after login
Some things are not critical for booting the system, but is run as part of the boot process anyway. They can be slow to start and clutter the terminal after login, potentially adding log messages in the middle of writing a command. To disable kernel messages after login, create the following file
All .sh files in etc/profile.d are run on login.
Auto login
etc/inittab prepares the file system and mounts etc/fstab, runs init scripts and "spawn" applications after boot. One of the commands for spawning ends with the comment "# GENERIC_SERIAL" and that line needs to be changed to not prompt for login and just start /bin/sh.
Notice that the command starts with console::respawn. Respawn means that if sh crashes, Busybox will keep restarting it until it succeeds.
getty is replaced here because it's the application that prompts for login. It also prevents us from sending messages between tty's, which only makes sense in a multi user system: If user A is logged into tty1 and user B is in tty2, then A shouldn't be able to bother B with `echo "Hi B!" >/dev/tty2`. Instead we spawn -/bin/sh, where the hyphen instructs Busybox to treat the shell as a login shell. Without it, /etc/profile and scripts in /etc/profile.d are ignored.
To add the new files to your image, you simply compile again
Add Tiny C Compiler
Tiny C Compiler, or tcc, is:
- ANSI C compliant, with most C99 extensions.
- Small, roughly ~300 KB.
- Fast according to the homepage, specifically 9 times faster than gcc.
I've used tcc to compile win32 applications with opengl and gdi+, and a pdf library that we'll use later to benchmark performance. There are limitations to what can be compiled, I haven't managed to compile libpng for instance, but you can use gcc to provide a shared library that tcc can link with.
The compiler is written by Fabrice Bellard, author of qemu, ffmpeg, quickjs, jslinux and the list goes on. You've likely used his software in one way or another. I will use the last version he released before abondoning tcc, but it's alive and well in this fork.
To get tcc working we have to compile it twice: The first time is to compile libtcc1.a. The way this happens according to the Makefile is that gcc is used to compile tcc, and then tcc builds and outputs libtcc1.a. If we start by compiling with musl, it's not going to run on the host, and thus libtcc1.a cannot be built. So first step is to configure the build with --enable-cross, which builds a cross compiler that compiles the right libtcc1.a. After that, we can compile for a single architecture and libc: x86 musl.
Configure tcc cross compilers for current cpu architecture to get i386-version of libtcc1.a
Malloc hooks have been removed in glibc 2.34 and Ubuntu 22.04 ships with glibc 2.35. The next two commands are unnecessary on Ubuntu 20.04, but harmless.
Then build libtcc on the host and copy to the file system overlay.
Next step is to configure and build the compiler for x86 musl.
--assume-old makes make skip libtcc1.a. Also skip steps that require makeinfo since documentation will end up in the directory "output-unused" as specified a bit hacky with --sharedir=-unused. DESTDIR is set when installing because configuring with --prefix=./output compiles tcc with search paths beginning with that prefix.
--elfinterp points to the dynamic linker in the image, responsible for locating shared libraries needed by an application, prepare it to run and then execute it. Because we use musl, this file is called ld-musl-i386.so.1, but on your glibc-based distro it's (likely) ld-linux-x86-64.so.2. Without it, the system won't know how to start applications and you'll get `/bin/sh: {your command}: not found`
For tcc to create executables, it needs startup routines that are linked into the executable. Those files start with crt, short for c runtime, and we have configured tcc to search for them in /lib. Since tcc supports running c without creating an executable via `tcc -run file.c`, you only need these files if you want to build executables (and if you plan on continuing this tutorial). Here's a quick summary of crt files from https://dev.gentoo.org/~vapier/crt.txt:
- crt1.o
- Contains the _start symbol which sets up the env with argc/argv/libc _init/libc _fini before jumping to the libc main.
- crti.o
- Defines the function prolog; _init in the .init section and _fini in the .fini section.
- crtn.o
- Defines the function epilog.
That is what's needed for running tcc in v86, but it doesn't do much without musl's standard c headers. We pick only the bare minimum, because all headers are ~5 mb uncompressed.
Hello world
With tcc compiled and installed into our image, it's time to prepare some code to test if the compiler works.
Rebuild image with the new files:
If you've closed your server, open a new terminal and run
Go to http://localhost:8000 and try this in the emulator:
Benchmarking
Time for a quick benchmark to see what performance we can expect. We'll use the excellent pdf writer library, libharu.