Compiling Rust for a RISC-V with rv32e ISA
One of the interesting things that came out with Rust 1.78 is that now it ships with LLVM 18. I had a particular interest on that update, because LLVM 18 supports the RISC-V's rv32e ISA.
The rv32e is not a very special instruction set (it is for me though ❤️). If you're not familiar, RISC-V has an interesting uptake on modular ISAs. The rv32e is one of the most basic ones (the "e" stands for embedded), focusing on a reduced chip area footprint. It reduces the number of general use registers in half (compared to the other ISAs), and it has no float-point instructions, amongst other simplifications. To the day I'm writing this, the rv32e instruction set is still not signed off, but there's a few implementations already in production.
Before Rust 1.78, if you wanted to compile your crate for rv32e, you'd have to build your own Rust compiler with a few patches. But now, given LLVM 18 and the "custom target" feature, that's not needed anymore.
I haven't written linker scripts in years, but let's give it a shot (using QEMU as our platform).
Creating a Custom Target#
Even though LLVM 18 came out on Rust 1.78, we'll still need some unstable features in order to make this work (we'll have to compile the "core library" for our new target), so make sure you have your nightly toolchain installed and updated. Also, make sure you have a linker available in your system (on Ubuntu, the build-essentials package should help with that).
Now, checking the custom targets our Rust compiler offers:
 | 
We can print JSON descriptions for each of those. Let's try riscv32i-unknown-none-elf, that's rv32i, it's the closest to the bare metal rv32e we want. We'll save that specification into a file called riscv32i-unknown-none-elf.json.
 | 
{
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
}
There's a few things we'll change in that file, features will become +forced-atomics,+e, and we'll set is-builtin to false. And then we'll rename the file to riscv32e-unknown-none-elf.json.
The Rust Code#
For the Rust code, we'll start with a basic binary crate.
We'll do some small, but impactful, changes to main.rs:
pub extern "C" 
! 
If you're not used to embedded/bare-metal Rust, this piece of code may look a bit weird. If you're not used to embedded at all, it's even weirder. But let's clarify why the structure of this program looks so different from ordinary Rust code you see around.
#![no_std]: this is necessary for "bare-metal programming". This tells Rust not to add the standard library to our crate. The standard library requires supports of an OS, which we don't have in this context, besides, it also adds runtime behaviour that we don't need or won't make any sense in our case.#![no_main]: this tells Rust we'll not use the conventional entry point functionmain.mainexpects runtime support, and again, we don't have this here. We want to start execution straight into our code!- The 
entryfunction: note that the entry function is using theextern "C"notation, as Rust doesn't have a stable ABI at the moment, it's common to enforce a C ABI in cases like this. Regarding the name of this function, it's arbitrary. The compiler or linker won't treat theentrysymbol as a special case. Later, we'll find a way to tell the linker or the loader, that that's the entry point of our code. And its implementation is trivial, just an infinite loop. #![panic_handler]: this annotation allows us to specify how panics will be handle by the system. As we don't havestdand no operating system is providing us default/error output facilities, Rust enforces the use of this handler. Our implementation in this example is just an infinite loop.
Building#
If we try to build the above code:
   )
And the reason is we don't have the core crate available for our custom target, and that makes sense, it's a custom target after all. But there's good news too, we can compile our own core crate for our custom target, and that's pretty simple to do. Let's use the unstable option build-std as suggested by the error message.
        
As if the borrow checker was not enough... But if we look into the error message again, what rustc is basically suggesting is that we install the standard library source code along our current Rust environment, so that's what we're missing for build-std.
You may find it kind of awkward that we're installing the source code for a "x86_64-unknown-linux-gnu" toolchain, but in fact rustc is a cross-compiler by default, we're just tinkering with the final product generated by the compiler passing some custom options in the JSON file we've created.
Enough with the talk.
And after trying to compile again, more errors...
   
   )
   
   
   )
) 
Though if you look at what's going on, we're compiling std, and that's not supported to our custom target, we're even using #![no_std] in our code, that doesn't make sense. The thing is, we can make the build-std option more restrictive, and try to compile only the core library.
   )
   
   )
   )
  |
  
  = 
) 
Oof, this is a long error, I've even stripped out some parts to make it a bit clearer. Initially I thought this was some sort of bug in rustc, as it seemed that the linker was trying to put together a binary using an stdlib that was not compatible with our rv32e ELF. After checking the compiler's code (using the diff of a previous patch supporting rv32e), it made sense it was just a memory layout issue. Long story short: we need to tell LLVM that we align the stack at 32 bits (instead of 128 originally for rv32i). And the reason for that is because we use the "ilp32e" ABI, and we also need to tell that to LLVM. So we'll change our target JSON once again:
...
"data-layout": "e-m:e-p:32:32-i64:64-n32-S32",
...
"llvm-abiname": "ilp32e"
Note that the llvm-abiname attribute wasn't there before. If you want more (boring) details about this ABI, an easy internet search will get you the RISC-V official documents. Now let's run the compilation again:
   )
   
   )
   )
    ) 
Awesome, that's looking much better. Let's check our output file. In order to do that we'll use LLVM's binutils.
  
  
  
  
  
  
  
  
  
  
 
 
 
That doesn't look good, it's like our file has no (or an empty) text sessions. ELF files are supposed to contain code sections, which are basically data or text (code to be executed) and how to load those into memory. Also, it tells where the execution should begin from. There's nothing like that in our output. We'll get that fixed.
Linker Script#
Linkers tend to be very aggressive when optimizing for dead code. As the linker doesn't know a starting point in our code, it assumes none of the functions we've implemented are reachable. We'll write a linker script and tell the linker where to start. So in our crate root directory we'll create a link.ld file with the following content:
ENTRY(entry)
SECTIONS {
  . = 0x80000000;
  .text : { *(.text); *(.text.*) }
}
This is a quite minimal linker script. It tells the linker to use our entry function as starting point. And we also create a .text section in the resulting ELF, and it'll place all our program's executable code in there, signaling that that section should be loaded at memory address 0x80000000.
There's a few ways we can tell cargo to pass that script as an input to the linker. The tidy one would probably use a build.rs script. But let's avoid that complexity for now using a workaround to keep things in a single command line. We'll override the build.rustargs Cargo configuration. That configuration passes extra arguments to rustc on every compiler call.
More specifically, we'll pass -Clink-arg=-Tlink.ld, which will, in turn, pass -Tlink.ld to the linker. Note that this is very platform independent, as Rust may use different flavors of linkers.
   )
   
   )
   )
    ) 
That seemed successful. Checking now if that solves our problem with the final ELF:
That's looking much better now. entry is there, at the right address. You can see the j instruction jumping to the same address in an infinite loop. Probably not the most smart piece of code you've ever written though.
Running on QEMU#
Let's give it a shot with QEMU now. We'll install qemu-system and gdb-multiarch.
In order to run our binary on QEMU, we do:
Some quick explanations of what's happening in the above.
-stells QEMU to listen as a GDB server on TCPlocalhost:1234.-Stells QEMU to wait for commands from the debugger before the CPU starts (this way we can have total control from GDB).-cpu rv32should be quite obvious. Note though we don't specify a subset here (in our case "e").-machine virtis the most generic machine from QEMU.-bios noneuse no BIOS for booting, meaning we can't go more bare-metal than this. There's no initial code setting up modes, basic IO or whatever, we're alone by ourselves here.-nographictells QEMU not to use their graphical interface.- Now my favourite: 
-device loader. It's the first time I use this device on QEMU (even though I used a lot of QEMU in the past, mainly for ARM), and I love it. It's able to load a couple of different formats in memory. But what I like the most is that it loads ELF binaries, which is just great in scenarios where you're exploring an architecture and you can just focus on your linker script. So all we need to do is to specify the output ELF from our compilation and it's done! Well, almost... 
As soon as you kick off that command, the emulated CPU will freeze, and we need GDB now to get things really running.
()
There we are, the beginning of everything, our rv32e core has the address 0x1000 loaded into its pc register. Even though we specify "no BIOS" for QEMU, there's a few instructions executed before it jumps to the actual address of entry.
(gdb) disassemble 0x1000,0x101e
Dump of assembler code from 0x1000 to 0x1024:
=> 0x00001000:  auipc   t0,0x0
   0x00001004:  addi    a2,t0,40 # 0x1028
   0x00001008:  .insn   4, 0xf1402573
   0x0000100c:  lw      a1,32(t0)
   0x00001010:  lw      t0,24(t0)
   0x00001014:  jr      t0
   0x00001018:  .insn   2, 0x
   0x0000101a:  .insn   2, 0x8000
   0x0000101c:  .insn   2, 0x
End of assembler dump.
If we keep stepping over these initial instructions (si command), voilà... We jump into our entry function.
...
(gdb) si
riscv32e_hello_world::entry () at src/main.rs:8
8           loop{}
(gdb)
Does it count as a "hello world"? probably not, but I believe this is a great start. There's a few caveats here if we want to do anything more complex than this. Fore example, we don't initialize the stack pointer, which means, if we do anything that produces a store operation to the stack, we're doomed (we'd be writing something to a, very likely, invalid memory address). Also, we created a very minimalist linker script, which basically strips out any data sections from our binary, meaning that, if any static data is needed, we would also get into trouble.
Putting together a more complete bare-metal program doesn't fit well with this post's format. Hopefully on an upcoming post, using everything done here as scaffolding, we could dig into more details on getting a proper Rust runtime running.