Security & Privacy

KFLAT - Selective Kernel Memory Serialization for Security and Debugging

By Bartosz Zator Samsung R&D Institute Poland
By Paweł Wieczorek Samsung R&D Institute Poland

Introduction


Linux OS is a very complex piece of software which requires advanced testing and debugging capabilities. Memory dumps come very handy at that task: whether that’s a core dump in gdb or a memory image fed into a VM. However, the commonly used memory dump tools lack granularity: they dump an entire process or system memory without the understanding of which source code structures it represents. In Mobile Security Group at Samsung R&D Institute Poland, we developed a novel tool called KFLAT [1] which allows making a fine-grained copy of the kernel memory for selected variables and structures. Such a copy can be used to recreate the layout of kernel memory in the user space process on another machine. KFLAT produces a flattened memory image which makes the loading process almost instantaneous and allows for a high test throughput.

How does KFLAT work?

Imagine there is a specific structure in the Linux kernel memory that is of particular interest to you. This structure is further represented by a concrete variable inside the kernel. You can instruct the KFLAT engine to serialize the variable with all the data it contains and save the produced memory image into a file. First, KFLAT engine copies the memory of the selected variable. Second, it recursively follows all the dependencies, i.e., pointer members for a given structure. KFLAT knows how to dump memory based on a set of data descriptions, called recipes. All the copied memory regions are then assembled into a single memory block and all the pointers inside the block are adjusted to point to new locations (offsets) within the block. Finally, the generated memory block is saved into a file which can be further used to restore the memory contents.

Figure 1. High-level KFLAT operation

What could you use it for?


The produced portable memory dump can be loaded into an application and used like any other C memory. What could you use it for then? You can use it for fast memory initialization. For instance, you can dump some memory from a running system and use it as an initial corpus for fuzzer (automatic testing tool). You could use it for debugging as you can dump some internal kernel structures and then view them in user space just like with kgdb except that you don't have to recompile the whole kernel with CONFIG_KGDB enabled or use any additional hardware. You could use it for gathering stats as you can extract new metrics from the kernel by accessing its internal structures and viewing them in the user space. Finally, you can also use KFLAT to serialize memory of the user space process (it’s called UFLAT). For example, you can have an application that spends a lot of time reading or computing large chunks of data before actually starting doing anything useful. A good example of that is a large build system that parses a lot of Makefiles before running very small incremental build. In such a case, the memory state of the application can be computed once, memory image can be created and saved to the file and restored instantaneously in a multitude of future executions.

Figure 2. Restoring KFLAT image in the user space process

How do we use it?


One of the projects that we developed and use extensively is the Auto Off-Target (AoT) project [2]. It makes possible to extract code of a particular function (with selected dependencies) from a larger system, compile it and test it on a development machine which provides a higher throughput and an easy access to debugging and testing toolchain. One of the possible usages of AoT in the realm of security testing of the Linux kernel is to generate test harnesses for the Linux kernel entry points, e.g., ioctl syscalls. The generated harnesses can be extensively tested using fuzzers or other automatic testing techniques, e.g., symbolic execution, in a scalable fashion. One of the challenges when it comes to running such harnesses is to correctly initialize the program state such that it reflects the kernel state in the original entry point. This is where KFLAT comes in handy. KFLAT makes it possible to save the kernel state – i.e., global variables and function parameters used in the interesting function – at the point of the entry point’s execution and restore it back in the test harness. When the harness state is properly initialized, the fuzzing is significantly more precise as it results in fewer false positives.

Figure 3. Using AoT to test the Linux kernel entry points

More detailed documentation that explains some intricacies of KFLAT can be found at the below link [3]:
https://samsung.github.io/kflat/

Related work


We are not aware of any existing solution which could selectively dump memory from a complex low-level system while being aware of the source code structures similarly to KFLAT.
Memory dumps have been a subject of study in the context of debugging and crash analysis as well as digital forensics. KFLAT is a generic tool which could be applied in both domains.
In the field of crash analysis, memory dumps are used to reproduce and debug issues based on limited information developers have after a crash [4], [5], [6], [7], [8], [12], [13], [14], [15], [16]. Crash analysis tools try to infer information from the program executable, raw memory dump, backtraces and core dumps. KLFAT is a code-aware tool which can dump and restore memory on a live system with the precision of exact C source code structures.
Memory dumps are also extensively used in digital forensics [9], [10], [11], [17], [18]. The focus there is on extracting higher-level information which might be of value, e.g. images, audio files, critical OS information. On the other hand, KFLAT operates on a level of source code structures and can dump data residing in memory of a live system, rather than data stored on a hard drive. However, given the right recipe definitions KFLAT could be used to dump selected parts of OS state.

KFLAT is open source


KFLAT project was developed from scratch at Samsung R&D Institute Poland and has been released to open source [1]. Our goal is to build a community around the project, enhance the available tooling for finding and analyzing S/W problems in the Linux kernel and improve the general security of the Linux kernel code. We encourage all developers interested in the Linux development to play with the tool, possibly improve the engine or try to write libraries of recipes for common kernel structures. We eagerly welcome PRs that create new functionalities or fix existing bugs. KFLAT was first presented at one of the top conferences on Linux and open source in general, i.e., the Open Source Summit North America 2023 [19].

Open Source Summit NA 2023


Quoting the organizers: "Open Source Summit is the premier event for open source developers, technologists, and community leaders to collaborate, share information, solve problems, and gain knowledge, furthering open source innovation and ensuring a sustainable open source ecosystem. It is the gathering place for open-source code and community contributors". More specifically, a part of the Open Source Summit that focuses on the Linux kernel development is the LinuxCon conference: "LinuxCon is an event for maintainers, developers and project leads in the Linux community to gather for updates, education, collaboration, and problem-solving to further the Linux ecosystem". In 2023, Open Source Summit / LinuxCon was held in Vancouver, Canada. We gave there a talk regarding the KFLAT project titled: "KFLAT - Selective Kernel Memory Serialization for Security and Debugging". The video recording [20] and the slides [21] are now available. In the talk, we provide more details how KFLAT works under the hood and how to use it to retrieve valuable information from the Linux kernel internals. We also discuss more advanced topics of automatic recipe generation for the Linux kernel structures.

Figure 4. Open Source Summit 2023 @ Vancouver, Canada

Other tools


If you’re interested, please also check other tools developed at Samsung R&D Institute Poland. We’ve already mentioned the Auto Off-Target project [2]. More information on AoT can be found in the dedicated blog post [22]. AoT is based on another project we created: Code Aware Services [23]. CAS is a set of tools for extracting information from the build process and the source code. This includes data on how a particular software image is created and information on functions, types and dependencies across them. CAS makes this data easily accessible to external applications. CAS is also a cornerstone of the Developer Productivity Engineering (DPE) concept for the low-level native part of the software stack. More information can be found in another blog post [24] as well as our recent presentation from the DPE Summit 2023 in San Francisco [25]. Last but not least, SEAL [26] is a tool for collecting information about files on a running Linux system that might be useful, e.g., during security assessment. In particular, SEAL makes it possible to automatically match Linux device nodes with the functions inside the kernel designed to handle various operations on these nodes.
We believe that the KFLAT project, as well as other tools described here, can be really helpful in finding issues in the large S/W systems (like Linux kernel) and can highly contribute to the overall security of the end products.

Selected References


[1] KFLAT @ Samsung GitHub
https://github.com/samsung/KFLAT
[2] Auto Off-Target @ Samsung GitHub
https://github.com/Samsung/auto_off_target
[3] KFLAT - selective kernel memory serialization for security and debugging (documentation)
https://samsung.github.io/kflat/
[4] Fu, Y., Lin, Z. and Brumley, D., 2015, August. Automatically deriving pointer reference expressions from binary code for memory dump analysis. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (pp. 614-624).
[5] Liebler, L. and Breitinger, F., 2018, May. mrsh-mem: Approximate matching on raw memory dumps. In 2018 11th International Conference on IT Security Incident Management & IT Forensics (IMF) (pp. 47-64). IEEE.
[6] Duffey, D.W. and Andresen, D., 2002. DUMP: Dump User Memory, Please.
[7] NetworkX, L., 2005, July. Kdump, A Kexec-based Kernel Crash Dumping Mechanism. In Linux Symposium (p. 169).
[8] Yang, H., Zhuge, J., Liu, H. and Liu, W., 2016. A tool for volatile memory acquisition from Android devices. In Advances in Digital Forensics XII: 12th IFIP WG 11.9 International Conference, New Delhi, January 4-6, 2016, Revised Selected Papers 12 (pp. 365-378). Springer International Publishing.
[9] Dangi, S., Ghanshala, K. and Sharma, S., 2023, June. Approaches to Selective Imaging of Live Systems via Memory Forensics. In 2023 3rd International Conference on Intelligent Technologies (CONIT) (pp. 1-5). IEEE.
[10] Faust, F., Thierry, A., Müller, T. and Freiling, F., 2020. Technical report: Selective imaging of file system data on live systems. arXiv preprint arXiv:2012.02573.
[11] Stüttgen, J., Dewald, A. and Freiling, F.C., 2013, March. Selective imaging revisited. In 2013 Seventh International Conference on IT Security Incident Management and IT Forensics (pp. 45-58). IEEE.
[12] Cui, W., Peinado, M., Cha, S.K., Fratantonio, Y. and Kemerlis, V.P., 2016, May. Retracer: Triaging crashes by reverse execution from partial memory dumps. In Proceedings of the 38th International Conference on Software Engineering (pp. 820-831).
[13] Zamfir, C. and Candea, G., 2010, April. Execution synthesis: a technique for automated software debugging. In Proceedings of the 5th European conference on Computer systems (pp. 321-334).
[14] Artzi, S., Kim, S. and Ernst, M.D., 2008. Recrash: Making software failures reproducible by preserving object states. In ECOOP 2008–Object-Oriented Programming: 22nd European Conference Paphos, Cyprus, July 7-11, 2008 Proceedings 22 (pp. 542-565). Springer Berlin Heidelberg.
[15] Cao, Y., Zhang, H. and Ding, S., 2014, September. Symcrash: Selective recording for reproducing crashes. In Proceedings of the 29th ACM/IEEE international conference on Automated software engineering (pp. 791-802).
[16] Wang, H., Xie, X., Lin, S.W., Lin, Y., Li, Y., Qin, S., Liu, Y. and Liu, T., 2019, August. Locating vulnerabilities in binaries via memory layout recovering. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (pp. 718-728).
[17] Google Rekall project
https://github.com/google/rekall
[18] Volatility Framework - Volatile memory extraction utility framework
https://github.com/volatilityfoundation/volatility/wiki
[19] Open Source Summit North America 2023
https://events.linuxfoundation.org/archive/2023/open-source-summit-north-america/
[20] KFLAT - Selective Kernel Memory Serialization for Security and Debugging (OSS2023 presentation video)
https://www.youtube.com/watch?v=Ynunpuk-Vfo
[21] KFLAT - Selective Kernel Memory Serialization for Security and Debugging (OSS2023 presentation slides)
https://static.sched.com/hosted_files/ossna2023/95/KFLAT_OSS2023.pdf
[22] Auto Off-Target: Enabling Thorough and Scalable Testing for Complex Software Systems @ Samsung Research Blog
https://research.samsung.com/blog/Auto-Off-Target-Enabling-Thorough-and-Scalable-Testing-for-Complex-Software-Systems
[23] Code Aware Services @ Samsung GitHub
https://github.com/samsung/CAS
[24] Developer Productivity Engineering in the Complex Low-level Systems World @ Samsung Research Blog
https://research.samsung.com/blog/Developer-Productivity-Engineering-in-the-Complex-Low-level-Systems-World
[25] CAS @ DPE Summit'23
https://dpesummit.com/sessions/bartosz-zator/developer-productivity-engineering-in-the-complex-low-level-systems-world/
[26] SEAL @ Samsung GitHub
https://github.com/samsung/SEAL