We live in an online world where users sign up to multiple online accounts but are constantly worried about identity theft that may potentially happen from these online accounts. User credentials are especially an important aspect of ensuring the security of an online user, and according to a recent survey on users’ password habits conducted by Data Insider, 44% of people change their password only once a year or less. Although people consider it extremely cumbersome to change passwords to protect their user credentials, we see a completely opposite behavior for managing user credentials across multiple websites. Moreover, 61% of people admitted to using their passwords across multiple websites because of the difficulties in remembering multiple passwords for each site. The habit of reusing the same password may lead to catastrophic consequences. In the past, we’ve seen many celebrity online accounts hacked because of the reuse of the same leaked password. Whether it’s dealing with the unbearing burden of password management or the reuse of the same password for multiple websites, we need to find better ways to improve our online account management.
There are ongoing activities that address improving the security of online accounts. Moving away from passwords is certainly one key initiative, and this initiative identified the issues and problems of managing passwords from the perspectives of both users and online service providers. The use of biometrics is one promising approach to move away from passwords, and it is believed to be pivotal by many experts. Biometrics, if well protected, are extremely hard to recreate while relieving the burden of the user by alleviating the need to remember credentials. However, there is one key drawback with biometrics—they are hard to revoke because of their immutable nature. Combining biometrics with mutable data sources addresses this issue, and several techniques that address this issue have demonstrated robustness from various cyber-attacks.
For a biometric to be handled as a credential and ensure the security of an online account, we need to construct cryptographic primitives from biometric input. Constructing cryptographic primitives from biometrics ensures formal proof and stronger assurance for the protocol and design that use the biometric as a credential.
One promising cryptographic technique that produces meaningful security primitives is the “fuzzy extractor.” The fuzzy extractor is a method to extract a fixed cryptographic key from a user’s biometrics when they are close but not identical to each other. This key could be exploited for user authentication or data encryption, which is stored in a cloud. Let me explain in more detail about the fuzzy extractor described in Figure 1.
The fuzzy extractor consists of two key functions: “Generator” and “Reproducer.” The generator constructs a cryptographic key pair from biometric and public helper data, which is essential for key recovery. Meanwhile, the reproducer reconstructs a cryptographic key (secret key) from biometrics and public helper data. The fuzzy extractor ensures that only the user who generates public helper data can reproduce the correct secret key while no one will be able to recover the biometric and secret key from public helper data.
Figure 1. Fuzzy extractor with 2 sub-algorithms
There are additional security objectives that should be met for an end-to-end fuzzy extractor system. The data sovereignty of the user’s biometric is the most significant aspect in an end-to-end fuzzy extractor system. Said differently, the user should have direct control over his or her biometric. To handle the data securely, we should consider the secure management of the cryptographic key. Especially with using a multi-device environment, a fuzzy extractor is the most promising candidate with respect to the following requirements: 1) secure multi-device access, 2) a specific user access, and 3) a publicly available template. We describe the application of a fuzzy extractor to a cloud environment in Figure 2. In the upload phase, a user generates using a biometric (such as a face), a secret key, and public helper data. Then, the user encrypts the personal data using this key for data elements such as images, videos, and documents. The encrypted personal data is uploaded to the cloud along with the public helper data. In the download phase, encrypted personal data is downloaded onto a device along with public helper data. Using the user’s biometric and public helper data, the personal data is recovered on a device that captured the user’s biometric.
Figure 2. Application of fuzzy extractor to secure cloud system
From the end-to-end fuzzy extractor system we previously described, a user’s personal data can be securely protected and only revealed on biometrically authenticated and authorized devices. Now, let’s talk about some of the challenges we face in designing a fuzzy extractor.
The main challenge in putting together a fuzzy extractor is how one applies privacy protection techniques on templates while minimizing the matching accuracy. Thus, we would like to describe the details regarding the template composition. If the templates consist of the bit string, the promising candidate is a binary error correction code such as Reed-Solomon, BCH (Bose–Chaudhuri–Hocquenghem), and Hamming code, which are used for controlling errors in binary data over unreliable and noisy communication channels. An original template w is mapped onto a random codeword, and its hash value becomes a secret key. The original template w is recovered from error correction on noisy template w, and then the secret key is finally derived from hashing on w. There are several variants for bit string templates, and the implementation shows practicality on biometrics such as iris.
On the other hand, there is no error correction algorithm for real-valued templates, such as deep learning–based face templates. To devise a fuzzy extractor for a real-value vector, one may select an approach that converts real-valued templates into binary templates. However, this leads to loss of discriminatory information on the original template and degradation of matching performance. To improve matching performance as well as security, several studies have proposed CNN-based approaches, which minimize intra-user variability and maximize inter-user variability using neural networks. These works essentially require one or multiple face image(s) of the target user when training the neural networks, and this requirement is impractical in some applications, such as dynamic “Generation” systems. Furthermore, we should consider the privacy of public helper data. Because the attacks on public helper data may leak the secret key or biometrics, we should argue several mathematics attacks, including brute-force attacks.
Figure 3. Binarization strategy to apply error correction code to real-valued templates
We designed a novel real-valued error-correcting code (ECC) that maintains the advantages of state-of-the-art biometric recognition which measures the closeness of two templates by angular distance metric (or cosine similarity). Here, we explain the idea briefly using Figure 4. For an original template t on the unit n-sphere Sⁿ⁻¹, our construction generates a real-valued codeword c on Sⁿ⁻¹ and linear transformation P mapping from t to c, which has the form of an n-by-n matrix. The hashed value H(c) is a secret key and P is public helper data. If we set P to be an orthogonal matrix, then it becomes an isometry that preserves the inner product, so that any noisy template t’ with distance d from t can be transformed via P into the one with the same distance from c. Thus, no additional performance degradation is imposed by this transformation.
Figure 4. How the error-correcting code for real-valued template works
To clarify the design rationale for error correction code for real-valued templates, we list requirements that ECCs should satisfy. 1) (Discriminative): all codewords are well-spread on Sn-1 so that any two of codewords are sufficiently far from each other. 2) (Efficiently-Decodable): There exists an efficient algorithm called Decode that takes any vector in Sⁿ⁻¹ as an input and returns a codeword with the shortest angular distance from the input vector.
3) (Template-Protecting): The number of codewords is sufficiently large in order to prevent the brute-force attack, e.g., 2⁸⁰ for 80 bits security. We devised a new ECC satisfying all the above three requirements. We have left the details on concrete construction in the full paper for curious readers.
We provide a new error-correcting code on real-value templates, and this removes the degradation of biometric matching performance. Unlike previous constructions using binary error-correcting code, our construction fully enjoys the merits of an underlying recognition system. Furthermore, it could be applicable to any fuzzy data whose matching metric is the angular distance (cosine similarity), and we believe that our construction provides high-quality accuracy and privacy in using a cloud system. Details on how we designed and verified our fuzzy extractor can be found in the full paper.