Open Source

Where Supply Chain Security and License Obligations Meet

By Alexander Mazuruk Samsung R&D Institute Poland

(Source)

One could defiantly call 2021 the year of Supply Chain. Suez Canal obstruction halted traffic for 6 days on one of the busiest trade routes globally. Both the pandemic and severe weather disrupted supply chains all over the world [4]. And it’s not just the freighters, trucks and planes - the digital world was affected too.

Right before the year started the famous SolarWinds [5] cyberattack has been disclosed. This operation is an example of digital supply chain attack, where malicious code is inserted into trusted third-party software, infecting all of the hacked software company's customers. It affected 99 out of top 100 Fortune 500 companies. The most devastating fact about this attack is that it took over a year to discover and disclose the vulnerability.

The number of supply chains attack has been rapidly increasing through the whole year. It is estimated that the trend will continue to grow exponentially over the coming years... Which is why the US president, Joe Biden issued an executive order as a strong signal to the government agencies and the industry to immediately improve the digital supply chain security. One of the very first steps to achieve that is to ensure that the exact software composition is known at each and every stage of the pipeline by providing Software Bill of Materials (SBoM) together with the product.

Software Bill of Materials is a list of all software components with additional metadata (see image below). Components depend on how the application is packaged. For example, a typical application SBoM would consist of the application itself as well as all of its dependencies.

Figure 1. Example on the components of SBoM in industries

Nowadays, in the network industry software is often packaged into a container. This additionally complicates things. Shipping a software package in a container usually means that the image is based on some publicly available, pre-built base image with technology relevant to the package that "just works". Yet there are many other things on those images that are rarely screened on any stage of development or release. Generating and providing SBoM alongside the container image provides data needed to check a container against known vulnerabilities that could be exploited by malicious actors.

As mentioned above, containers are a bit more problematic – there are many ways to deploy an app in a container, making automatic inspection much more complicated. One of such examples is stripping metadata from packages or not providing them in the first place. And sometimes the build systems and/or packages were not designed to include information needed for SBoM.

Another issue that can be solved with SBoM is automation of open source license compliance for containers. According to legal analysis published by the Linux Foundation when shipping software as part of container image, vendor is responsible for the compliance process not only for what was added by the vendor, but every underlying layer that were inherited from the "just works" base image.

What is inherited from the base image is often problematic for many companies licensing-wise. It is still unclear how GPL-3 stipulations relate to cloud environment, but neither vendors nor operators want to accept the risk. This is why in ONAP community, there has been a shift from Debian-based base images to Alpine-based ones. But there is a problem with Alpine packages - their license provenance data is often insufficient.

In reality, SBoMs are unavailable for most of the containers served in the web, there is a huge demand for tools that can automate container inspection process and provide a proper bill of materials with all provenance data required for both security and license compliance processes. One of best-in-class tools that can fulfill those needs is ScanCode-toolkit together with its new web-interface scancode.io.

It is open-source python3 solution that has been used by many well-known projects and organizations: Eclipse Foundation, OpenEmbedded.org, the FSFE, the FSF, OSS Review Toolkit, ClearlyDefined.io, RedHat Fabric8 analytics.

Samsung Open Source Group Engineers are hosting a public instance of ScanCode.io at scancode.onap.eu. Currently it is available for all community members to scan their images either by placing a link to nexus (ONAPs docker image repository) or uploading an image tarball. This instance is in the process of being integrated with ONAP CI to have every container image built be subject to software composition analysis and have SBoM available.

Figure 2. Results of scan of ONAP Integration Java11 (Source)

There is still much to be done both in ONAP and ScanCode.io. We want to encourage as many Java projects are placed in containers without license metadata.

Samsung Open Source Group has also created and published a PoC pipeline for ScanCode.io which scans Alpine-based containers. It works by downloading package source code and analyzing it which provided missing meta-data such as copyright info as well as more precise license information and proved that it is possible to gather than information automatically.

This year we are working on a pipeline to scan Alpine packages on their own and we want to fix those issues upstream. We plan to scan Alpine packages from main and community namespaces on Samsung Open Source Group infrastructure and publish the results to the Alpine package repository (aports) [6]

There is still much to be done both in ONAP and ScanCode.io. Alexander Mazuruk and Krzysztof Opasiak had given 2 talks about it on Open Source Summit/Embedded Linux Conference/OSPOCon 2021 [7] and Open Networking & Edge Day Summit [8].

If you’d like to help with SBom generation in ONAP, feel free to write to a.mazuruk@samsung.com. There is also a lot to do in ScanCode-toolkit as well as ScanCode.io, please take a look at their issue lists [9], [10] if you are interested in contributing!