static-keys
Static key is a mechanism used by Linux kernel to speed up checks of seldomly changed features. We brought it to Rust userland applications for Linux, macOS and Windows! (Currently nightly Rust required. For reasons, see FAQ).
Currently CI-tested platforms:
-
Linux
x86_64-unknown-linux-gnu
x86_64-unknown-linux-musl
i686-unknown-linux-gnu
aarch64-unknown-linux-gnu
riscv64gc-unknown-linux-gnu
loongarch64-unknown-linux-gnu
-
macOS
aarch64-apple-darwin
-
Windows
x86_64-pc-windows-msvc
i686-pc-windows-msvc
-
Bare metal (No CI)
- Should work with above-mentioned architectures. For more detail, see FAQ.
Note that when using cross-rs to build loongarch64-unknown-linux-gnu
target, you should use latest cross-rs avaiable on GitHub. See Evian-Zhang/static-keys#4 for more details.
For more comprehensive explanations and FAQs, you can refer to GitHub Pages (中文版文档).
Motivation
It's a common practice for modern applications to be configurable, either by CLI options or config files. Those values controlled by configuration flags are usually not changed after application initialization, and are frequently accessed during the whole application lifetime.
let flag = CommandlineArgs::parse();
loop {
if flag {
do_something();
}
do_common_routines();
}
Although flag
will not be modified after application initialization, the if
-check still happens in each round, and in x86-64, it may be compiled to the following test
-jnz
instructions.
test eax, eax ; Check whether eax register is 0
jnz do_something ; If not zero, jump to do_something
do_common_routines:
; Do common routines
ret
do_something:
; Do something
jmp do_common_routines ; Jump to do_common_routines
Although the if
-check is just test
-jnz
instructions, it can still be speedup. What about making the check just a jmp
(skip over the do_something
branch) or nop
(always do_something
)? This is what static keys do. To put it simply, we modify the instruction at runtime. After getting the flag
passed from commandline, we dynamically modify the if flag {}
check to be a jmp
or nop
according to the flag
value.
For example, if user-specified flag
is false
, the assembled instructions will be dynamically modified to the following nop
instruction.
nop DWORD PTR [rax+rax*1+0x0]
do_common_routines:
; Do common routines
ret
do_something:
; Do something
jmp do_common_routines
If flag
is true
, then we will dynamically modify the instruction to an unconditional jump instruction:
jmp do_something
do_common_routines:
; Do common routines
ret
do_something:
; Do something
jmp do_common_routines
There is no more test
and conditional jumps, just a nop
(which means this instruction does nothing) or jmp
.
Although replacing a test
-jnz
pair to nop
may be minor improvement, however, as documented in linux kernel, if the configuration check involves global variables, this replacement can decrease memory cache pressure. And in server applications, such configuration may be shared between multiple threads in Arc
s, which has much more overhead than just nop
or jmp
.
Usage
To use this crate, currently nightly Rust is required. And in the crate root top, you should declare usage of unstable feature asm_goto
.
#![allow(unused)] #![feature(asm_goto)] fn main() { }
First, add this crate to your Cargo.toml
:
[dependencies]
static-keys = "0.6"
At the beginning of main
function, you should invoke static_keys::global_init
to initialize.
fn main() { static_keys::global_init(); // Do other things... }
Then you should define a static key to hold the value affected by user-controlled flag, and enable or disable it according to the user passed flag.
#![allow(unused)] fn main() { use static_keys::define_static_key_false; struct CommandlineArgs {} impl CommandlineArgs { fn parse() -> bool { true } } // FLAG_STATIC_KEY is defined with initial value `false` define_static_key_false!(FLAG_STATIC_KEY); fn application_initialize() { let flag = CommandlineArgs::parse(); if flag { unsafe { FLAG_STATIC_KEY.enable(); } } } }
Note that you can enable or disable the static key any number of times at any time. And more importantly, it is very dangerous if you modify a static key in a multi-threads environment. Always spawn threads after you complete the modification of such static keys. And to make it more clear, it is absolutely safe to use this static key in multi-threads environment as below. The modification of static keys may be less efficient, while since the static keys are used to seldomly changed features, the modifications rarely take place, so the inefficiency does not matter. See FAQ for more explanation.
After the definition, you can use this static key at if
-check as usual (you can see here and here to know more about the likely
-unlikely
API semantics). A static key can be used at multiple if
-checks. If the static key is modified, all locations using this static key will be modified to jmp
or nop
accordingly.
#![allow(unused)] fn main() { #![feature(asm_goto)] use static_keys::{define_static_key_false, static_branch_unlikely}; struct CommandlineArgs {} impl CommandlineArgs { fn parse() -> bool { true } } fn do_something() {} fn do_common_routines() {} define_static_key_false!(FLAG_STATIC_KEY); fn run() { loop { if static_branch_unlikely!(FLAG_STATIC_KEY) { do_something(); } do_common_routines(); } } }
References
- The Linux kernel official documentation : static-keys
- Linux
static_key
internals - Rust for Linux also has an implementation for static keys, please refer to Rust-for-Linux/linux#1084 for more information. My implementaion's
break
-based inline assembly layout is inspired by this great work.
Internal Design
As described in the introduction, the workflow of static keys is:
- Globally initialize related structures.
- Define a static key.
- Modify the static key according to user-specified values.
- Use the static key at the
if
-check.
In this section, we will use three terms to make things more clear:
-
Static key
A static variable storing information to control the enabling of static branches.
-
Jump entry
A static variable storing information about static branches. This is used to locate the static branch.
-
Static branch
A
if
-check which utilizes static keys.
Simplified Logics
In simplified logics, we can think of the static key and jump entry as the following structures:
struct StaticKey {
enabled: bool,
jump_entries: Vec<JumpEntry>,
}
struct JumpEntry {
code: &'static Instruction,
}
When we modify a static key, the following things happen:
- The static key's
enable
field is modified - We find all jump entries associated with this static key by
jump_entries
field - For each jump entry, we locate the static branch by
static_branch
field - We modify the static branch to
nop
/jmp
according to theenable
value
And when we use a static key at if
-check, we just push another jump entry into the static key, which records the location of current if
-check.
After understanding the simplified logics, we can then add some more comprehensive supplements to the simplified logic:
- Jump entry location.
- Static branch modification content.
- Static branch modification rule.
- Static branch modification approach.
Jump Entry Location
As described in Usage, we can use one static key at multiple if
checks. As a result, one static key may be associated with multiple jump entries. However, we cannot construct a compile-time vector in ad-hoc: we cannot define a static vector, and push to this vector at compile time across the crate. As a result, we must store the jump entries in the generated binary, and construct the static key's jump entries at run time to collect associated jump entries.
In practice, we store these jump entries in an individual section in generated binary. The name of such section differs on each OS. For example, in Linux ELF, this section is named __static_keys
.
Then at runtime, when initializing, we will collect jump entries to each static key. However, as the jump entries have already in a section loaded into memory, we don't want to double the memory usage to push those jump entries content into the static key's vector.
The solution is to make the jump_entries
field of StaticKey
a pointer instead of vector. It can just point to the jump entries in the individual section, thus decrease the memory usage. To do so, we then must sort the jump entries in such section to make sure jump entries associated with same static key are adjacent to each other, and then the jump_entries
field can point to the first jump entry which associated with the static key.
To make the sort work, we should add another field to the JumpEntry
: the address of static key. Then the sort can conduct according to static key address. Note that in implementaion, such addresses are all relative due to ASLR.
As a result, now the structure should be written as
struct StaticKey {
enabled: bool,
jump_entries: *const JumpEntry,
}
struct JumpEntry {
code: &'static Instruction,
/// Relative address to static key
key: usize,
}
The jump_entries
field is null
at beginning, and when initializing, this field is updated to point to the first jump entry which asscoiated with this static key.
Static Branch Modification Content
When modifying a static branch, we may modify nop
to jmp
, or jmp
to nop
. In most architectures, the nop
instruction can be of many byte length. For example, in x86-64, since the jmp
is usually 5-byte long, we select a 5-byte nop
to do the replacement. This can be used to make sure that we do not mess up with the following instruction.
However, when modifying nop
to jmp
, which target should be jumped to? This cannot be deduced trivially. As a result, we need another field in JumpEntry
to record the address of jump target:
#![allow(unused)] fn main() { struct JumpEntry { code: &'static Instruction, /// Relative address to jump target target: usize, key: usize, } }
To construct such jump entry at static branch, we use the following inline assembly instruction (take x86-64 for example). When using static_branch_likely!
and static_branch_unlikely!
, the following code snippet will be generated (details may be different).
'my_label {
::core::arch::asm!(
r#"
2:
.byte 0x0f,0x1f,0x44,0x00,0x00
.pushsection __static_keys, "awR"
.balign 8
.quad 2b - .
.quad {0} - .
.quad {1} + {2} - .
.popsection
"#
label {
break 'my_label false;
},
sym MY_STATIC_KEY,
const true as usize,
);
break 'my_label true;
}
It seems complicated, and let me break it down to explain.
Assembly part
The first line 2:
indicate an assembly label, which used to mark the location of current instruction: 0x0f, 0x1f, 0x44, 0x00, 0x00
, which is a 5-byte nop
instruction.
Then we use .pushsection
and .popsection
pair to switch to another section (current section is .text
, which is used to record instructions), which is used to store the jump entries.
Inside the new section, we use three .quad
to define three 8-byte values, which corresponds to three fields of JumpEntry
struct. The first 8-byte value is 2b - .
, where 2b
indicates the nearest label with name 2
, which is the address of nop
instruction we just defined. The .
represents current location, which is the address of this 8-byte value. Using 2b - .
to indicate a relative address to the nop
instruction, which is the code
field of JumpEntry
.
The second 8-byte value is {0} - .
, where {0}
indicates the first argument in this inline assembly, which is label { break 'my_label false; }
. This is the target address of jmp
instruction, which corresponds to the target
field of JumpEntry
. This will be explained later.
The third 8-byte value is {1} + {2} - .
, which stores information about static key and initial status of static branch (note that static key is always 8-byte aligned, so the last byte of its address is always 0x00
, which allows us to use this to record additional information). The initial status will be explained later as well.
By conducting this inline assembly, a jump entry will be generated in "__static_keys" section at compile time.
Jump label part
Since the inline assembly does not affect the control flow, let's simply the code snippet to see more clear about jump label part:
'my_label {
// Some inline-assembly
break 'my_label true;
}
This will be treated as a true
value expression by Rust compiler. Since we use the static_branch_likely!
and static_branch_unlikely!
in the if
-check, the if
-check then become
if true {
do_a();
} else {
do_b();
}
do_c();
As a result, the Rust compiler will optimize the instruction to be
nop ; 0x0f,0x1f,0x44,0x00,0x00
call do_a ; do_a()
However, do_b()
will not be optimized out: there is an argument in the inline assembly which reference it --- the label { break 'my_label false; }
. As described above, this argument is used as an address to the statement break 'my_label false;
. When fitting this statement into the if
-check, it become a false
condition. As a result, this statement is translated to a call to do_b()
, which this call is never executed in the static control flow. To make it more clear, let's see what is the generated assembly look like:
nop ; 0x0f,0x1f,0x44,0x00,0x00
call do_a ; do_a()
DO_C:
call do_c ; do_c()
ret ; End of this function
DO_B:
call do_b ; do_b()
jmp DO_C ; goto DO_C
The basic block at DO_B
will never be executed in the static control flow, while we do pass its address to a jump entry stored in the individual section.
When we modify the static branch to a jmp
, the assembly become:
jmp DO_B ; Modified!
call do_a ; do_a()
DO_C:
call do_c ; do_c()
ret ; End of this function
DO_B:
call do_b ; do_b()
jmp DO_C ; goto DO_C
Things go right!
Static Branch Modification Rule
Branch layout
As described in the static branch modification content, there are two branches to be executed: one can be executed after nop
, and is adjacent to the main part; another shall be executed with two additional jmp
, and its location is in the end of function. This difference will make a little impact on the performance. Usually, the branch that unlikely to be executed should be the latter one, and the other should be the former one. In this crate, the layout is controlled by static_branch_likely!
and static_branch_unlikely!
.
When using static_branch_likely!
, the true
branch will become a likely branch, which will be positioned near the main part, and can be just nop
ed to it. The false
branch is positioned in some other places, and is involved with two additional jmp
s.
In the inline assembly, the difference is represented by break 'my_label true
or break 'my_label false
in the end of block.
Initial instruction
After getting the right branch layout, then which instruction should be the initial instruction generated into the binary? It is used for the situation where, we do not update the static key, then its associated static branches need to take the correct path.
The rule is:
-
For
static_branch_likely!
- If static key is defined with initial value
true
, then generatenop
. - If static key is defined with initial value
false
, then generatejmp
.
- If static key is defined with initial value
-
For
static_branch_unlikely!
- If static key is defined with initial value
false
, then generatenop
. - If static key is defined with initial value
true
, then generatejmp
.
- If static key is defined with initial value
Modification direction
Another question is, when enabling/disabling a static key, what instruction should we update to? Should we update jmp
to nop
, or update nop
to jmp
? To solve this question, we shall use the initial status recorded in the last byte of static key address in key
field of JumpEntry
.
The initial status is a bool, which indicates whether the likely branch is true
branch. As described above, the likely branch should always be adjacent to the main part. And this status is controlled by whether we use static_branch_likely!
or static_branch_unlikely!
, and the initial value of static key.
Then when modifying static branches, the modification direction can be determined by xor
ing the new value of static key, and the initial status recorded in jump entry. For example, if the likely branch is true
branch, and the new value of static key is true
, then we shall update jmp
to nop
, since we need to execute the block adjacent to the static branch check.
Static Branch Modification Approach
The last question need to be solved, is how to modify static branch.
As a ground knowledge, the instructions are in text section. In most platforms, the text section has executable protection, and is non-writable to avoid attackers to modify instructions to execute malicious logic. This kind of protection mechanism is called DEP (Data Execution Protection) or W^X (Writable Xor eXecutable).
In order to modify static branch instructions, we then need to bypass the DEP in a short moment. This may be dangerous and vulnerable, while the DEP bypassing only happens in the modification of static key. After modification done, the protection is restored. So pay attention to the modification!
FAQs
Why static key should be applied to seldomly changed features?
Two reasons:
- The modification of static key will impose a potential security risk since we need to bypass DEP. The DEP will be restored after modification done.
- The modification is less efficient since many syscalls are involved.
Why static keys must only be modified in a single-thread environment?
In userland, it is very complicated to modify an instruction which may be executed by another thread. Linux kernel community once proposed a text_poke
syscall, but is still not available nowadays. BTW, Linus doesn't seem to like it, and his reasons do make sense.
Another reason is that we need to manipulate memory protection to bypass DEP, which may involves race condition on the protection itself in multi-thread environment. Mutex may be used to avoid data race, while if cargo resolves multi-version static-key crates dependencies, the mutexes would be duplicated for each version, and this approach is thus useless. This shall be resolved when RFC 1977: public & private dependencies is stabilized. rust-lang/cargo#2363 is also a reference.
Why is nightly Rust required?
We use inline assembly internally, where asm_goto
and asm_const
feature is required. As long as these two features are stablized, we can use stable Rust then.
Why static_branch_likely!
and static_branch_unlikely!
are implemented as macros?
Because when passing a static variable to inline assembly as sym
argument, it requires the argument to be a static path. We cannot construct such path in a function.
What OS-specific features are required to extend to new OSs?
-
Symbols indicating a custom section's start and stop
Used for sorting static keys section and iterating stop signs.
-
Approaches to keep custom section retained from linker's GC
-
Approaches to bypass DEP
Used to update static branch instructions.
What arch-specific features are required to extend to new architectures?
- A
nop
instruction with same length asjmp
(or can divide the length, e.g. 2-bytenop
and 4-bytejmp
) - Approaches to clear instruction cache in Linux. (Should be added to Evian-Zhang/clear-cache)
- Inline assembly supported by Rust
Can I use this crate in no_std
?
Yes.
How can I use this crate in bare metal?
You should modify your linker script to add __start
and __stop
prefix symbols for marking start and end address of corresponding sections. For more details, see Evian-Zhang/static-keys#6.