Security & Privacy

UTopia: From Unit Tests To Fuzzing

By Hayoon Yi Samsung Research

By Joonun Jang Samsung Research

By Bokdeuk Jeong Samsung Research

By WooChul Shim Vice President, Samsung Resarch

Automatic Fuzz Driver Generation For Library Fuzzing

Domain: What is Library Fuzzing?

Fuzzing, which is to test software with pseudo-random input and observe its execution for anomalous behavior, is an effective tool for finding real bugs and vulnerabilities within software. Conventional fuzzing, or end-to-end fuzzing, performs this task by feeding random input at the entry point of executable binaries so that it may test the executable with input that could actually be given by a user. However, as this approach requires an executable that accepts external input, it is not fit for testing code that do not come in executable form, namely, software libraries.

Software libraries are collections of code that provide specific functionalities through application programming interfaces (APIs) so that the functionalities may easily be reused throughout various software. Therefore, any bugs within library API code is potentially propagated to each and every software in use of the buggy API. Library fuzzing aims to enable fuzz testing for such library code so that we may find and fix bugs within libraries and prevent them from affecting other software.

In order to perform fuzzing on libraries, library fuzzing makes use of fuzz drivers, or fuzz harnesses, which are code containing calls to the APIs of a target library and code on how to relay pseudo-random fuzz input values to the arguments of each API call. The fuzz drivers act as executables for the library code and transforms the problem of library fuzzing into that of end-to-end fuzzing. However, though seemingly a straightforward approach, library fuzzing has not been as widely adopted as end-to-end fuzzing due to the manual effort required in addressing the challenges of fuzz driver generation.

Challenge: Fuzz Driver Generation and Scalability.

Figure 1. Incorrect API usage can render a fuzz driver useless.

The main challenges of fuzz driver generation can be boiled down to two: 1) valid API sequence synthesis (i.e., how to select and order library API calls), and 2) valid API parameter synthesis (i.e., how to handle the arguments for the selected API calls). Failing to address either of these is typically detrimental to the performance of a fuzz driver. For example, as in Figure 1, if a fuzz driver calls a termination API before an initialization API, any observed anomalous behavior would likely be credited to calling the APIs in the wrong order and would not be considered as the result of a bug. Likewise, a fuzz driver providing random fuzz values to arguments that should maintain a certain relationship (e.g., a pointer to an array and a value holding its length), would also waste fuzzing effort as random combinations will either end in a segfault (length value > array size) or wasted fuzzing effort in creating a large array (array size > length value). Therefore, one must take care in generating fuzz drivers with valid API usage because simply calling APIs at random along with random arguments would often provide meaningless results.

Due to the required care needed in driver generation, most fuzz drivers are carefully crafted by hand. A developer or a tester carefully studies the library in question, selects and forms a sequence of APIs that they deem proper for fuzzing and define how and where fuzz input values will be provided to the API arguments. As you can guess, this is quite a time consuming process and, as we mentioned earlier, is the main reason why library fuzzing has not flourished as much as end-to-end fuzzing. To give a ballpark example, the Tizen open source project has hundreds of libraries which contain thousands of APIs. If we simply assume it takes an hour for understanding a single API, we would need around 125 work days just studying 1,000 APIs. This approach is hardly scalable.

We believe this to be among the top reasons why even Google’s OSS Fuzz project (https://google.github.io/oss-fuzz/), the most successful fuzzing project for various open source libraries, only has one or two fuzz drivers for most of its target libraries. Though by only writing a couple of drivers per project may allow one to cover more projects, this means that among all the APIs that are not being fuzzed, bugs could lie dormant for years even though the library is constantly being fuzzed (which was actually the case for some of the bugs we had found in libraries fuzzed by OSS Fuzz: e.g., CVE-2021-30473, CVE-2021-30474, CVE-2021-30475). So, scalability in covering most APIs in a library is just as important as scalability in covering many libraries which makes the job tougher.

Our approach: Unit Tests to The Rescue!

In order to perform library fuzzing in scale, one would require a way to automatically synthesize valid API usage patterns and bypass the need of manual involvement. To this end, we have observed that unit tests (UTs) contain valid API usage patterns (both sequence and argument) as library developers design UTs to test specific usage cases and that many mature projects have UTs for testing most of their APIs.

Based on this observation, we have developed UTopia, which employs techniques to convert each existing UT into an effective fuzz driver in an automated and scalable manner. The key ideas behind UTopia are to 1) leverage UT specific properties to unravel the complexity in UT analysis, 2) perform root definition analysis, a new technique we introduce, to trace back the source of API arguments for proper fuzz input injection that maintains inter-procedural relations and data flow intended by developers, and 3) reflect in fuzz input mutation, the analysis of impacts each argument may have within its API’s internals. This enables UTopia to explore code space deeply and avoid crashes resulting from invalid API usage. Therefore, it is possible to provide a push-button solution that can automatically synthesize high-quality fuzz drivers with no human involvement.

UTopia analyzes both UT and target library code to transform UT into effective fuzz drivers. Figure 2 illustrates UTopia's overall workflow.

Figure 2. The workflow of UTopia to generate fuzz drivers.

(1) UTopia takes advantage of the architectural nature of the UT framework so that it is only required to analyze developer implemented test functions instead of analyzing across the entire UT framework.

(2) UTopia also analyzes the library to identify attributes of API arguments to better provide valid input to the arguments. We mainly look for five attributes: output, loop count, allocation size, file path, and array-length.

(3) Then, UT analysis is performed to identify root definitions where we can inject fuzzing input without affecting valid API usage semantics (figure 3). The identified root definitions are selected as fuzz targets (i.e., they will receive fuzzing input in the synthesized driver) and their original assignment values are extracted and collected into the initial seed corpus.

(4) Finally, driver synthesis is performed based on the analysis results. Based on the argument attributes found during (2) the code for relaying fuzz input to the root definitions is altered. If the root definition is related to an argument that has the output attribute, we do not provide any fuzzing value as the API will overwrite any value we provide. For loop counts and allocation size attributes, we limit the size that can be assigned to the root definition so the fuzz driver can avoid out-of-memory or timeout bugs. For file path attributes, instead of assigning a random string that would have little meaning when parsed as a path, we write fuzzing input into a file and assign the path to that file to the root definition. For array-len attributed root definitions, we find the corresponding array and assign the array’s length-1 while assigning a null terminator at the end of the array. By doing so we can avoid wasted fuzzing effort and enhance the fuzzing performance of our generated fuzz drivers.

With the resulting drivers and seed corpus, we can immediately perform fuzzing for the library.

Figure 3. Fuzz drivers generated naively (left) and with root definition analysis (right). Fuzz1 and Fuzz2 indicate where fuzzing input values will be inserted.

Does It Work?

To evaluate its automation capability (i.e., its scalability) we have evaluated its performance on 25 open source libraries from varying sources (see Figure 4). The 25 libraries were selected to determine if UTopia is truly a scalable approach that can be applied to libraries of various size, build systems, and unit test frameworks.

Figure 4. Projects used for evaluation and results for UTopia-generated fuzz drivers. Target Library: SR = Source repository (O:OSSFuzz / G:GitHub / A:Anroid / T:Tizen), BS = Build system (cm:cmake / gn:gnu make / nj:ninja / bz:bazel). eFn = Exported functions in a library. Unit Tests: TF = Testing framework (G:gtest / B:boost), TcCov. = Region coverage of the target library with the test cases from which fuzz drivers are generated, TC = Total number of test cases. UTopia-Generated Fuzz Drivers: Oths. = TCs implemented with macro functions other than TEST, TEST_F or BOOST_AUTO_TEST_CASE_FIXTURE, Ign. = TCs ignored by UTopia based on the exclusion criteria, AT = Per-core time to analyze library and unit test code, GT = Per-core time to generate fuzz driver code, UCov = Unique coverage of UTopia compared with TC Cov., AG = The ratio of the coverage with the aggregation of unique regions across all fuzzers to that of TCs from which the fuzzers were made, MG = The individual maximum growth ratio of a single fuzzer compared to execution with the initial seed.

Out of 5,523 total test cases (TCs) in the 25 libraries, 2,715 were valid candidate TCs (1,039 TCs were excluded due to containing test macros not handled in our prototype, and 1,769 TCs were excluded as they did not contain any API capable of accepting fuzzing input) and from those, UTopia could produce 2,715 fuzz drivers (100% of the valid candidates). This shows that UTopia can properly generate fuzz drivers from the various libraries. Furthermore, from the UCov, we can see that the UTopia generated drivers are even capable of exploring library code not initially covered by the original UTs.

However, automatic fuzz driver generation capability alone does not validate the value of UTopia. The ultimate goal of fuzzing is all about finding bugs in target code and therefore the bug finding capability of a proposed approach is most important in determining its merit. As this capability is near impossible to formally quantify, lists of actual bugs found by a given approach is typically accepted as a substitute for its capability. To this end, UTopia has found 109 bugs from the 25 open source libraries simply by running each generated fuzz driver for one hour on a single CPU core. We have reported 74 of the bugs (excluding 35 that were already found and patched at the time we have prepared a report) to the maintainers. This demonstrates that UTopia generated fuzz drivers have the capability of finding real bugs in real library code and therefore validates the merit of UTopia.

Furthermore, UTopia was easily applied to the Tizen open source project where it has fuzzed 30 libraries (from which 2,411 drivers were generated) and found 14 bugs which have been promptly reported and fixed. The fuzz drivers for Tizen has been adopted by the Tizen community and their results can be seen in their own dashboard (https://dashboard.tizen.org/fuzz.code). (To access the dashboard, a Tizen account is required which can be created free of cost.)

Can I Check Out UTopia More Closely?

Most definitely! A more detailed paper discussing the specifics of UTopia has been accepted to appear at the 44th IEEE Symposium on Security and Privacy scheduled for May of 2023. We will add a link to the paper as soon as it becomes available.

In the meantime, why don’t you check out the open sourced code of UTopia at https://github.com/Samsung/UTopia?

Yes, that’s right. We have opened up the code for UTopia because we at Samsung Research believe in the strengths and values of the open source community. We hope UTopia enables fuzz testing for library developers seeking security, and provides a good starting point for future security researchers.

We are still actively applying UTopia to dozens of open source libraries and internal Samsung libraries. All bugs found, reported, and fixed with UTopia, including new bugs found after our evaluation of Figure 4, can be seen at our Trophies page (https://github.com/Samsung/UTopia/blob/main/Trophy.md). If you find a bug while trying out UTopia, please feel free to ask to add it as a trophy alongside your github ID! Any questions or contributions are welcome as well : )

Happy bug hunting!

#Security #Fuzzing