Software Engineering

Developer Productivity Engineering in the Complex Low-level Systems World

By Bartosz Zator Samsung R&D Institute Poland

By Adrian Niec Samsung R&D Institute Poland

Introduction

Software systems powering sophisticated embedded devices are very complex these days. Systems running on mobile phones, operating system in cars, software that runs the production processes in factories, etc., build the foundation of infrastructure that billions of people rely on every day. Development of such systems poses significant challenges. Usually, these systems consist of many distinct H/W components with corresponding S/W modules running on each component. Let's take a mobile phone as an example. We have an application processor running the main OS and applications. We have a modem processor that communicates with the mobile network. We have specialized processors for digital signal processing (DSP) as well as recently added neural processing unit (NPU) for accelerating AI-related operations. Finally, we have specialized processors and operating systems dedicated for security operations and so on.

Having such diversity of modules requires a multitude of software systems to be developed and integrated to handle the operation of an entire product. And so we have secure OS, bootloaders, operating system kernels, specialized firmware S/W for components such as WLAN or touchpad, middleware layer of libraries, applications, web engines, etc. Large parts of these systems are usually written in C or C++ languages, there is Java in the service and application layer, and low-level parts are programmed with assembly on bare metal.

Testing these software systems is very difficult for several reasons. First, their complexity and scale are challenging. Second, setting up testing and debugging on the device might be non-trivial – for example, imagine testing a modem on a mobile phone, or a memory constrained IoT device. Furthermore, we are often dealing with custom hardware, for which there might be no emulator available to test the code off the target device. As a result, the commonly used testing techniques and tools are not always easily applicable.

Last but not least, creating the final S/W image that runs on the end product is a very difficult task in itself. First, we have to download the entire product source code from the repository. Usually, the number of source files needed at this point is extremely large. For example, for the latest Android Open Source Project (AOSP) platform with common Linux kernel we need to download more that 1 million files from the repository and for the actual mobile end-products this number can be significantly larger. Once we have the source code, we need to build it. The build process is commonly handled by a plethora of build systems producing large number of artifacts with non-obvious dependencies, usually glued together by programs written in various scripting languages. For example, the AOSP build produces a few thousand linked modules. When building of all individual components completes, there's usually a large final step of assembling the produced binary code into a single S/W image file which can be then transferred to the product H/W. There might also be some additional steps along the way, such as signing the produced binaries to ensure the security of the running S/W.

Considering all of the challenges discussed above, it is reasonable to pose the following questions:
      - What’s precisely going on during the product creation?
      - Which exact parts of the source code are incorporated into the final image and how are the source files processed?
      -Finally, how could we improve the productivity of a myriad of engineers working on large, complex product code to
      speed-up the development process and help to better test it?

Developer Productivity Engineering (DPE)

Figure 1. DPE Summit 2023 Conference

We might be able to address these questions with the help of Developer Productivity Engineering (DPE) techniques. The DPE concept was first introduced by Gradle and described in a white paper [1]: "Developer Productivity Engineering (DPE) is a software development practice used by leading software development organizations to maximize developer productivity and happiness". The two core concepts of the DPE are: (1) ensuring fast feedback cycles of the development process, i.e., how long it takes the developer to verify the introduced change into the final product, and (2) troubleshooting failures in the development process itself. The first concept is implemented primarily by making the build and test process as fast as possible, e.g., by caching the build artifacts and running the tests in parallel. The second concept is implemented by providing insight into the build process itself through build instrumentation technology and data collection. Gradle achieves both goals through the Build Cache and Build Scan® technologies as a part of the Gradle Enterprise (now Develocity [3]) solutions.
We believe that two additional concepts could be added to the core DPE list: (3) tools that boost the productivity of developers working with the code, as well as (4) tools for automation of test execution and issue detection.

DPE techniques are getting more and more attention in the S/W Engineering community. They've been adopted by a number of software companies and are now at the core of their development process. Since 2022 Gradle has been organizing an annual conference dedicated entirely to the DPE topics: The DPE Summit in San Francisco, California (https://dpesummit.com/). The summit is attended by members of top engineering teams across the world who gather to discuss various DPE concepts. This year we represented Samsung at the summit.

A wide range of topics has been covered: from productivity metrics, through build observability, developer productivity tools, improving the testing experience, project management techniques used to boost developer productivity, to the use of AI in the DPE context. There were also some discussions about potential interactions with customers and the role of internal communication to improve the development process. We’ve also seen some interesting cases of infrastructure incidents that we could draw conclusions from.

Currently, the DPE concepts are applied to higher layers in the product S/W stack, e.g. Gradle build system is dedicated mostly to build Java or JVM based projects using Kotlin, Groovy or Scala. It can be extended via plugins, but in order for the supporting tools like Build Cache or Build Scan® to work we would still need to describe the entire build hierarchy in the Gradle language. Similarly, we would have to do the same thing if we want to use other build systems like Bazel or CMake. Unfortunately, this might be a big problem when an aggregation of build systems is used in a combined manner which is the case for a large mobile product S/W image. Porting a large Makefile-based build system might be unfeasible or at best can make the shipment of the product significantly delayed. Considering this challenge, we ask the following question: could we employ some of the DPE techniques and create (at least partially) the counterparts of the described existing DPE tools for such a multifaceted product builds?

DPE in the Complex Low-level System World

We have been trying to bridge the gap and attack this problem from several angles in the last few years. Our job is to perform security assessment of the entire product code before it is released to the market. This involves security code review of the low-level system code and automatic testing to find S/W problems that could be exploited as security vulnerabilities. When performing these tasks we frequently have to wander into the unknown territory of code, i.e., we often face the code that we have never seen before. Usually the code area that requires scrutiny is very large. And finally, the analysis and verification need to be done quickly (way before the production process of the product is completed). All of the above points required us to think of ways to improve the development process and transform the way security engineers works with the code on a daily basis. This had led us to the realization that we have to acquire (at least to some degree) the competence and responsibilities of the DPE Team. Over the years we've introduced several tools to support low-level developers of a large mobile product. The cornerstone of that ecosystem is Code Aware Services (CAS) project [4] which significantly improved the security code review process for very large code bases.

So what is the CAS project? CAS is a set of tools for extracting information from the build process and the source code. This includes data such as how a particular software image is created or information on functions, types and dependencies across them. CAS makes this data easily accessible to external applications. It is composed of two parts. The first part is called Build Awareness Service (BAS) and it is a system which provides detailed information acquired during the full build of a product. The second part is called Function/Type Database (FTDB) which provides code information extracted from the original source files of the product build.

Recently, we’ve had the utmost pleasure to present the CAS project during The DPE Summit 2023. CAS was first introduced at the Linux Security Summit NA'22 [5], where we put the emphasis on how CAS supports the security code review process and the automation of vulnerability detection. During our recent DPE Summit'23 talk we've put emphasis on how CAS can be used in a more general S/W Engineering context at the scale of a very large product code base.

Figure 2. Bartosz Zator & Adrian Nieć: Developer Productivity Engineering in the complex low-level systems world

We’ve shown a number of examples how CAS project which serves as a foundation for creating other tools can improve the productivity of engineers working with system code. We demonstrated that our process and dependency visualization tools equipped with execution time measurements and the Python BAS API that gives easy programming access to the raw database can provide deep insight into the product build operation. This insight can help solve build failure problems and investigate build speed bottlenecks. We also demonstrated that custom build script generation can directly address the feedback cycles problem by providing means to replay specific parts of the build where the configuration didn't change. That enables the possibility of fine-grained incremental rebuild of selected pieces of the underlying systems. Our presented code search and IDE indexing improvements boost the productivity of developers working with the code. Finally, we demonstrated that leveraging FTDB helps us automate the code review process as well as opens new range of possibilities in automated testing. We showed our novel Auto Off-Target approach that makes it possible to generate off-target test harnesses at scale. Finally, we presented our KFLAT system which allows us to capture and serialize data and use it for off-target testing to help fuzzing tools keep the bounds and the structure of the input data in check or to speed up the load time in I/O-intense applications.

More details regarding the CAS project and comprehensive examples of how it can be used to implement some of the DPE concepts for a wide range of large, complex low-level system products can be found at the following page [6]: https://samsung.github.io/CAS/

We believe that DPE is a very promising direction in software engineering. More tools and techniques are needed to address the challenges of DPE, especially for low-level system code. We believe that our solutions we presented during the DPE Summit’23 conference (and described in details in [6]) are a step towards better DPE for complex systems.

References

[1] The Developer Productivity Engineering Handbook
A Complete Guide to Developer Productivity Engineering for Practicioners
https://gradle.com/wp-content/uploads/2019/09/Developer-Productivity-Engineering-eBook.pdf

[2] Press Release
EngFlow and tipi.build Reveal CMake Remote Build Execution Solution for C and C++ Community
https://www.engflow.com/news/2023-10-05

[3] Gradle Inc’s Gradle Enterprise is Now Develocity!
https://gradle.com/press-media/gradle-enterprise-is-now-develocity/

[4] Code Aware Services @ Samsung Github
https://github.com/samsung/CAS

[5] Code Aware Service in the service of vulnerability detection @ Linux Security Summit NA 2022
https://youtu.be/M7gl7MFU_Bc?t=648

[6] Code Aware Services project page
https://samsung.github.io/CAS/

#DPE #CAS