Avoid infinite looping when using interceptors

## Overview
BorrowSanitizer's runtime library has two pieces: [a Rust piece](https://github.com/BorrowSanitizer/rust/tree/bsan/src/tools/bsan/bsanrt), and [an LLVM piece](https://github.com/BorrowSanitizer/llvm-project/tree/d7e940021eb632c76473c10632e0193f1a045c74/compiler-rt/lib/bsan). The LLVM piece is where we can implement interceptors, which are functions that "intercept" a system call, replacing it with our custom functionality. Here's how to declare an interceptor for `malloc` that does nothing aside from returning the result from regular `malloc`.
```
DECLARE_REAL_AND_INTERCEPTOR(void *, malloc, uptr)

INTERCEPTOR(void *, malloc, SIZE_T size) {
  return REAL(malloc)(size);
}

INTERCEPT_FUNCTION(malloc)
```
The macro `DECLARE_REAL_AND_INTERCEPTOR` declares symbols for both the actual `malloc` system call and the interceptor. When we need to call the actual implementation of `malloc`, we wrap it with the `REAL` macro. This interceptor will not be activated until we invoke the `INTERCEPT_FUNCTION` macro. Afterward, the interceptor will be permanently in-place until the program exits.

## The Problem
Interceptors are useful because they are often more efficient than manually instrumenting system calls. They also work for uninstrumented programs. Every single library---statically or dynamically linked---that uses our runtime will now call our interceptor instead of `malloc`. This is wonderful and terrible. Our Rust library also needs to allocate memory, and we want to call it *inside* of our interceptor. Our interceptor for `malloc` needs to be able to call `malloc`. We need to make sure that it our runtime calls `REAL(malloc)` instead of `malloc`, otherwise it will trigger an endless loop. Statically linking our Rust component to our LLVM component breaks this loop on MacOS, where interceptors require dynamic linking. However, this is not the case on Linux—programs will still endlessly loop even if we statically link everything. 

## Our Current Solution - `#![no_std]`
There are a bunch of other system calls which we want to intercept—`memcpy`, `strcpy`, `mmap`—that are used throughout Rust's standard library. Even if we find a workaround for `malloc`, we'll still end up running into this issue elsewhere. To avoid this, we implement our runtime with the `#![no_std]` attribute. This attribute ensures that our run-time library does not depend on any part of libc by replacing all relevant functions with Rust's [compiler builtins](https://github.com/rust-lang/compiler-builtins). It also bars us from using Rust's standard library, but we can still access all of the data structures that we need through the crate `alloc`. 
```
extern crate alloc;

use alloc::collections::BinaryHeap;
use alloc::vec::*;
```
To use `alloc`, we need to bring our own allocator.  However, we need this allocator to call the function pointers `REAL(malloc)` and `REAL(free)` so that we can avoid endless looping.

In LLVM, we declare a struct containing pointers to the `REAL` instances of the system calls that we need to access. 
```
const BsanAllocator gBsanAlloc = 
  BsanAllocator {
    .malloc = REAL(malloc),
    .free = REAL(free),
    .mmap = REAL(mmap),
    .munmap = REAL(munmap)
  };
```
Then, we implement the `Allocator` trait for `BsanAllocator` within our Rust runtime, which will allow us to access the non-intercepted allocation functions whenever we need to allocate memory. This trait is experimental and it is only available on nightly, but so are we, so this is one of those rare situations where this usage is appropriate. There's [a stable fork](https://crates.io/crates/allocator-api2) that's being used in libraries [with millions of daily downloads](https://crates.io/crates/hashbrown), and the Rust for Linux project has shown interest in this trait recently, so it's likely to be stabilized at some point in the future. 

The `alloc` crate also requires us to declare a `GlobalAllocator` instance. Since this is a static object, it needs to be initialized before we call `bsan_init`; otherwise, a bunch of safe calls to functions in `alloc` that allocate memory will be inherently unsafe,since they will trigger undefined behavior if we forget to initialize our `GlobalAllocator`. There's no way for us to access our unintercepted functions statically, so instead, we implemented a ["dummy" global allocator](https://github.com/rust-lang/wg-allocators/issues/123) that panics whenever it is used. 

## Potential Changes

1. Don't use interceptors at all, avoiding the problem entirely.

2. **Current Solution** - Intercept everything and use function pointers to access the unintercepted versions of `malloc`, `free`, etc.

3. Intercept everything *except* functions that allocate or deallocate memory. Have our instrumentation pass use LLVM's [`isMallocOrCallocLikeFn`](https://github.com/llvm/llvm-project/blob/8885b5c0626065274cb8f8a634d45779a0f6ff2b/llvm/lib/Analysis/MemoryBuiltins.cpp#L306) and related operations to identify functions that allocate or free memory and then add the corresponding instrumentation, allowing us to use Rust's [`System`](https://doc.rust-lang.org/std/alloc/struct.System.html) allocator without needing the function pointer workaround. 

We cannot use solution 1 unless we manually replace a lot of system calls with instrumentation pass. Interceptors save a lot of implementation effort and are more robust to future changes to LLVM. 

We are currently using solution 2, but it's tedious, and there's also going to be a little bit of run-time overhead from calling `malloc` through function pointers (though it's unclear if this is the case; it could be mitigated through LTO). 

We *could* use solution 3, but that would potentially make us incompatible with other sanitizers that intercept `malloc`. However, that might not be a problem, since we already do everything that LSAN and ASAN do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid infinite looping when using interceptors #3

Overview

The Problem

Our Current Solution - `#![no_std]`

Potential Changes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid infinite looping when using interceptors #3

Description

Overview

The Problem

Our Current Solution - #![no_std]

Potential Changes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Our Current Solution - `#![no_std]`