Software Engineering

Auto Off-Target: Enabling Thorough and Scalable Testing for Complex Software Systems

By Tomasz Kuchta Samsung R&D Institute Poland

By Bartosz Zator Samsung R&D Institute Poland

Complex Low-level Systems are Hard to Test

Complex software systems powering OS kernels, firmware, baseband, IoT or automotive are the building blocks and a foundation on which many other systems depend. Ultimately, these systems are used and relied upon by billions of people every day.

The importance and the complexity of these systems call for an increased effort to eradicate software bugs which lead to reliability issues and—even more importantly—open up security holes. Thorough testing is crucial, given that the systems are often written in unsafe languages such as C/C++.

Unfortunately, the use of common automated software testing techniques such as fuzzing, or symbolic execution (symbex) poses significant challenges in these targets. Usually, there is no executable program or an easy entry point to run. Moreover, the systems often run on a custom hardware, e.g. System-on-Chip (SoC) for a smartphone, while most of the tools are built for x86_64. On-device testing and debugging is often hard due to the target device constraints. For example, we might not be able to use AddressSanitizer on a low-memory IoT chip, and it might be hard to debug a bootloader in an infotainment system of a car.

How about Virtualization?

Rehosting addresses these challenges by using virtualization, i.e., running the system or its part under a VM which emulates the hardware platform, e.g., in QEMU. Although rehosting has the advantage of capturing deep software and hardware interactions, it also has a significant shortcoming: the target hardware needs to be modeled.

While emulation exists for standard CPUs, e.g., ARM cores, emulation out of the box is not available for customized and new hardware such as a baseband, SoCs, micro-controllers or DSPs, due to custom architectures or peripherals. For example, no public emulators exist for such popular and important SoCs as Exynos, Snapdragon and M1. Creating and maintaining an emulator is a considerable and non-trivial effort which requires expert knowledge of the emulated hardware platform. Once the system is rehosted, we obtain a very good testing environment, but also one that is specific to the emulated target and unlikely to be useful for others.

Off-target Testing for The Rescue

Off-target (OT) testing is a promising technique in which parts of the source code are extracted and adapted to run on a different hardware platform. As an example let’s consider testing a baseband message parser on-target: in order to exercise functionality on the target device, we need to set up a network, generate and transmit protocol frames over-the-air. Once a bug is detected, we likely need to halt the test, download logs and start a non-trivial on-device debugging process. On the contrary, an off-target test of the same subsystem could be constructed by extracting the parsing code, providing stubs for the remaining functionality, compiling the code on an x86_64 machine and running a fuzzer which would automatically generate and feed the test data into the parser.

The off-target approach has a considerable advantage of focusing testing efforts on a security-critical functionality, e.g., complex input format parsers, in an environment which is much better equipped for debugging and root cause analysis and has a higher test throughput. In fact, we often do not need to run the entire system in order to discover critical security issues.

Unfortunately, the process of creating an OT program is manual and challenging. First, the code in scope needs to be extracted. Next, the code dependencies, e.g., types, need to be resolved by pulling in more code. Finally, the off-target approach comes with an inherent challenge: since we extract a part of a system and run it elsewhere, the original program state is missing. This involves memory allocations and the exact values in memory which need to be provided by the user. Missing allocations or incorrect constraints on the program state result in false positives (FPs) – bugs which are only present in the off-target code but not in the target system. As a result of all the mentioned challenges, the technique has mostly been used in an ad hoc manner for creating one-off OTs.

Auto off-Target: Automatically Generate Off-target Programs

In this blog post, we present an overview of our novel complex system testing approach called Auto Off-Target, or AoT for short. AoT can automatically generate off-target programs in C based on information extracted from the source code and the build process. This is our main contribution, however, AoT goes beyond the code generation and also helps to tackle the missing program state challenge. AoT generates memory allocation code, leverages dynamic testing techniques to discover program state and uses data flow analysis to reject false positives.

The generated OT code is independent of the original environment and decoupled from the build process, i.e., no special knowledge of the build flags is required to compile and run the OT. As a result, pieces of complex or embedded software can be easily run, analyzed, debugged and tested on a standard x86_64 machine.

How Does AoT Work?

A high-level overview of the proposed AoT testing approach is presented in Fig. 1. Two fundamental building blocks required to run AoT are build information (Build Info) and code information (Code Info).

Figure 1. An overview of the AoT approach

The build information includes a list of compiled files, linked modules and compiler flags used, as well as the information about dependencies between the modules, source and intermediate files.

This information is extracted during a full build of the target system. It is worth noting that the build needs to be performed only once per target as AoT operates entirely on the previously collected data.

The code information includes the source code of functions, types, global variables (globals), references among types, globals and functions as well as code metadata including variable assignments, casts and structural type member dereferences. The code information is generated from the original source files by the Code Processor with the help of build information or build configuration (Build Config) alone if the meta build system used provides enough details. The information required by the code processor is a list of compiled files and the build flags being used.

Build Info and Code Info databases are both generated by our CAS toolchain. We encourage you to see more details on CAS in our paper [1], talks [2, 3] and a project web page [4].

An OT is generated by taking a list of Base Functions that represent the functionality we would like to test, and pulling recursively (Pull) all the functions that are called by them, along with the required types and globals. Since AoT operates on precise information, only the necessary code is pulled in, unlike when including entire headers which could contain code the OT does not use.

Next, a decision is made about which functions should be kept and which should be left out based on the cut-off algorithm used (Cut-Off). The functions included entirely in the OT are called internal and those left out are called external as illustrated in Fig. 2.

For the external functions AoT generates function stubs, i.e., instances of the functions without a body. Since the original bodies of the external functions are removed, the functions further called by them are not included in the OT, unless they are internal.

The off-target testing approach comes with an inherent challenge: since a part of the code is extracted, the generated OT is missing the original system state, e.g., the values of global variables or function parameters that are normally set on the running target. This is an important problem as the lack of proper memory initialization could prevent correct execution (and testing) of the OT and result in false positives (FPs).

Figure 2. OT creation: base, internal and external functions

AoT implements three automated approaches to help tackle the open challenge of recreating program state in the OT code: smart init, state discovery and FPs rejection, as detailed in our paper [1].

Once an OT is generated, it can be debugged, analyzed or tested (Test in Fig. 1). AoT is independent of the testing technique used and it implements support for fuzzing and symbex.

AoT automates the laborious process of creating OT which usually involves multiple iterations of pulling in code, checking if it compiles, pulling in more code, initializing state, etc. As a result, with AoT the user can focus on creating meaningful tests rather than on creating the OT itself. Moreover, we can select any part of a very complex system and thoroughly test in a way similar to unit testing.

Testing Real-world Systems with AoT

We evaluate AoT on four target systems:

• T1: Android oriole Linux kernel, which is arguably one of the most complex and thoroughly tested pieces of publicly available software, powering Google Pixel 6 phone,

• T2: The Little Kernel Embedded Operating System (lk), which is a non-Linux operating system based on microkernel architecture,

• T3: Das U-Boot (uboot), which is a bootloader for embedded devices,

• T4: The IUH module of the Osmocom project, which implements the IUH interface for femtocell communication from a 3GPP standard.

Table 1. OT Creation stats. All the values are averages. Created OTs is the number of successfully generated OTs. LOC presents the average size of OT including the AoT library functions. Files, Types, Struct Types, Globals and Funcs show the code stats for OTs. Builds is the percentage of the generated OTs that compile.

All the selected systems represent complex software which is in large parts written in C. We selected the projects spanning operating systems, embedded software, bootloaders and telecommunications to illustrate wide applicability of AoT. The largest target by far is T1, as it contains over 166k functions; T2 contains 1,732 functions, T3—4,771 functions and T4—3,507 functions.

Table 2. OT Testing Stats. Testable OTs is the number of OTs for which tests were run and coverage data collected (in brackets we provide the value as the percentage of OTs that compile). The remaining columns present average numbers per OT.

As we can see in Table 1, AoT generated over 50k small/medium sized OT instances and we were able build the majority of the generated programs out of the box.

Furthermore, as presented in Table 2, we were able to run automated testing techniques: symbolic execution (KLEE) and fuzzing (afl++) out of the box in the majority of compiled OT programs. Both techniques generated a few test cases per OT on average.

Finally, we performed a bug finding campaign with AoT on Google Pixel phone kernels. In the campaign, AoT was used as an aid to security engineers. The campaign revealed three new security issues awarded CVEs and rediscovered one further issue, which has been already fixed.

AoT: A New Way of Testing Low Level Complex System Code

One of the open challenges in testing complex software systems is to provide strong software quality guarantees at scale and easily apply popular testing techniques. In our paper [1], we present AoT, a novel approach for an automatic creation of off-target programs.

AoT makes it possible to select arbitrary parts of the original target code, extract them and compile on a different target as pure C programs without dependencies and at scale. AoT goes beyond this main contribution and implements several techniques to address the challenge of missing program state in the OT programs.

We evaluated AoT on tens of thousands of functions from various complex and embedded targets and demonstrated that on average 86% generated OTs can be tested out of the box with popular tools. By using AoT in a bug finding campaign, we discovered seven bugs in two kernels powering Google Pixel phones.

We are not aware of any existing system that can generate compiling and testable OT code similarly to AoT. AoT enables and encourages unit-like testing, debugging and use of sophisticated techniques such as symbex on complex systems code. We believe it is a step towards deeper and thorough software testing.

AoT is still in active development. Currently, the main directions for future work are: (1) better program state initialization, and (2) adding support for C++. We plan to extend AoT with an ability to initialize program state by using memory dumps performed on target with our tool kflat [5].

Last but not least, the core engines of AoT [6], CAS [7], and KFLAT [8] are open-source. We hope you find our tools useful and we welcome your feedback and contributions.

References

[1] Tomasz Kuchta, Bartosz Zator
Auto Off-Target: Enabling Thorough and Scalable Testing for Complex Software Systems
https://dl.acm.org/doi/10.1145/3551349.3556915

[2] Code Aware Service in the service of vulnerability detection @ Linux Security Summit NA 2022
https://youtu.be/M7gl7MFU_Bc?t=648

[3] CAS & AoT: Enabling Symbolic Execution on Complex System Code via Automatic Test Harness Generation
https://www.youtube.com/watch?v=Xzn_kmtW3_c

[4] CAS Project Web Page: https://samsung.github.io/CAS/

[5] KFLAT - Selective Kernel Memory Serialization for Security and debugging
https://www.youtube.com/watch?v=Ynunpuk-Vfo

[6] AoT: Auto off-Target @ Samsung Github
https://github.com/Samsung/auto_off_target

[7] Code Aware Services @ Samsung Github
https://github.com/samsung/CAS

[8] KFLAT @ Samsung Github
https://github.com/samsung/kflat

#AOT #CAS