Consistency Analysis of Data Usage Purposes in Mobile Apps
Published
ACM Symposium on Computer and Communications Security (CCS)
Abstract
While privacy laws and regulations require apps and services to disclose the purposes of their data collection to the users (i.e., why do they collect my data?), the data usage in an app's actual behavior does not always comply with the purposes stated in its privacy policy. Automated techniques have been proposed to analyze apps' privacy policies and their execution behavior, but they often overlooked the purposes of the apps' data collection, use and sharing. To mitigate this oversight, we propose PurPliance, an automated system that detects the inconsistencies between the data-usage purposes stated in a natural language privacy policy and those of the actual execution behavior of an Android app. PurPliance analyzes the predicate-argument structure of policy sentences and classifies the extracted purpose clauses into a taxonomy of data purposes. Purposes of actual data usage are inferred from network data traffic. We propose a formal model to represent and verify the data usage purposes in the extracted privacy statements and data flows to detect policy contradictions in a privacy policy and flow-to-policy inconsistencies between network data flows and privacy statements. Our evaluation results of end-to-end contradiction detection have shown PurPliance to improve detection precision from 19% to 95% and recall from 10% to 50% compared to a state-of-the-art method. Our analysis of 23.1k Android apps has also shown PurPliance to detect contradictions in 18.14% of privacy policies and flow-to-policy inconsistencies in 69.66% of apps, indicating the prevalence of inconsistencies of data practices in mobile apps.