Skip to content

Avoid infinite looping when using interceptors #3

@icmccorm

Description

@icmccorm

Overview

BorrowSanitizer's runtime library has two pieces: a Rust piece, and an LLVM piece. The LLVM piece is where we can implement interceptors, which are functions that "intercept" a system call, replacing it with our custom functionality. Here's how to declare an interceptor for malloc that does nothing aside from returning the result from regular malloc.

DECLARE_REAL_AND_INTERCEPTOR(void *, malloc, uptr)

INTERCEPTOR(void *, malloc, SIZE_T size) {
  return REAL(malloc)(size);
}

INTERCEPT_FUNCTION(malloc)

The macro DECLARE_REAL_AND_INTERCEPTOR declares symbols for both the actual malloc system call and the interceptor. When we need to call the actual implementation of malloc, we wrap it with the REAL macro. This interceptor will not be activated until we invoke the INTERCEPT_FUNCTION macro. Afterward, the interceptor will be permanently in-place until the program exits.

The Problem

Interceptors are useful because they are often more efficient than manually instrumenting system calls. They also work for uninstrumented programs. Every single library---statically or dynamically linked---that uses our runtime will now call our interceptor instead of malloc. This is wonderful and terrible. Our Rust library also needs to allocate memory, and we want to call it inside of our interceptor. Our interceptor for malloc needs to be able to call malloc. We need to make sure that it our runtime calls REAL(malloc) instead of malloc, otherwise it will trigger an endless loop. Statically linking our Rust component to our LLVM component breaks this loop on MacOS, where interceptors require dynamic linking. However, this is not the case on Linux—programs will still endlessly loop even if we statically link everything.

Our Current Solution - #![no_std]

There are a bunch of other system calls which we want to intercept—memcpy, strcpy, mmap—that are used throughout Rust's standard library. Even if we find a workaround for malloc, we'll still end up running into this issue elsewhere. To avoid this, we implement our runtime with the #![no_std] attribute. This attribute ensures that our run-time library does not depend on any part of libc by replacing all relevant functions with Rust's compiler builtins. It also bars us from using Rust's standard library, but we can still access all of the data structures that we need through the crate alloc.

extern crate alloc;

use alloc::collections::BinaryHeap;
use alloc::vec::*;

To use alloc, we need to bring our own allocator. However, we need this allocator to call the function pointers REAL(malloc) and REAL(free) so that we can avoid endless looping.

In LLVM, we declare a struct containing pointers to the REAL instances of the system calls that we need to access.

const BsanAllocator gBsanAlloc = 
  BsanAllocator {
    .malloc = REAL(malloc),
    .free = REAL(free),
    .mmap = REAL(mmap),
    .munmap = REAL(munmap)
  };

Then, we implement the Allocator trait for BsanAllocator within our Rust runtime, which will allow us to access the non-intercepted allocation functions whenever we need to allocate memory. This trait is experimental and it is only available on nightly, but so are we, so this is one of those rare situations where this usage is appropriate. There's a stable fork that's being used in libraries with millions of daily downloads, and the Rust for Linux project has shown interest in this trait recently, so it's likely to be stabilized at some point in the future.

The alloc crate also requires us to declare a GlobalAllocator instance. Since this is a static object, it needs to be initialized before we call bsan_init; otherwise, a bunch of safe calls to functions in alloc that allocate memory will be inherently unsafe,since they will trigger undefined behavior if we forget to initialize our GlobalAllocator. There's no way for us to access our unintercepted functions statically, so instead, we implemented a "dummy" global allocator that panics whenever it is used.

Potential Changes

  1. Don't use interceptors at all, avoiding the problem entirely.

  2. Current Solution - Intercept everything and use function pointers to access the unintercepted versions of malloc, free, etc.

  3. Intercept everything except functions that allocate or deallocate memory. Have our instrumentation pass use LLVM's isMallocOrCallocLikeFn and related operations to identify functions that allocate or free memory and then add the corresponding instrumentation, allowing us to use Rust's System allocator without needing the function pointer workaround.

We cannot use solution 1 unless we manually replace a lot of system calls with instrumentation pass. Interceptors save a lot of implementation effort and are more robust to future changes to LLVM.

We are currently using solution 2, but it's tedious, and there's also going to be a little bit of run-time overhead from calling malloc through function pointers (though it's unclear if this is the case; it could be mitigated through LTO).

We could use solution 3, but that would potentially make us incompatible with other sanitizers that intercept malloc. However, that might not be a problem, since we already do everything that LSAN and ASAN do.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions