Security & Privacy
Introduction: The need to improve transparency and consistency in mobile app privacy
Can you tell what personal data your mobile app collects and uses?
While many mobile applications persistently collect various data from users, it is generally unclear what information is collected and how this information is used. Privacy policies are intended to outline these details by disclosing in legal terms, the usage, management and collection of data. However, despite the original intent, it is largely unclear to the user in many instances, thus making it harder for the user to trust the application with his/her personal information. Some users may favor an application collecting certain personal information to offer better services, while others may be interested in targeted recommendations and offers personalized for the user.
Figure 1. Inconsistency between user expectation and reality about purposes of data collection.
The current state of privacy policies
Thus, it is our goal to use technology to automate this process so that we can help the consumer, developer, regulator and policy-maker better understand privacy policies and application behaviors.
Figure 2. PurPliance Overview
Figure 3. PurPliance system workflow.
Figure 4. Distribution of purpose classes in the privacy statements and data flows of mobile apps.
Constructing data flows from network data traffic
PurPliance also creates data flows from the results of data traffic analysis. A data flow contains information about the recipient, the data type and the data usage purpose. We re-implemented a prior work and made improvements to the original approach, from which data types and purposes can be inferred from data content, destination and app description. Data purposes of the prior work have been adapted to our purpose taxonomy that we described earlier. In our survey of data flows of 17.1k Android apps, device information is the most popular data type. Figure 5 shows the distribution of data types in applications’ data flows.
Figure 5. Distribution of data types in apps' data flows.
Finding contradictions between the statement and actual data flow
We use provable techniques to detect these contradictions made possible by PurPliance’s programmatic analysis. We constructed a rich set of relationship tables between different purposes and data types, to deal with the various subtleties in wording.
We conducted a survey of approximately 17k Android applications from Google Play Store. We found potential inconsistency between data flows and privacy statements in nearly 70% of these applications. This indicates prevalence of inconsistencies in mobile application data practices.
Below is the summary of numeric details for our experimental findings:
• Applications: 23k applications selected from Google Play Store
• 16.8k policies with 1.4M sentences; 1.7M network requests from 17.1k applications
• Policy contradiction: Found 29k potentially contradictory sentence pairs in 3.1k (18.14%) of the policies surveyed
• Flow-statement inconsistency: Found 95.1k (13.56%) potentially inconsistent data flows in 11.4k (69.66%) applications
Details about the purpose taxonomy and consistency analysis can be found in the full paper.
 Haojian Jin, Minyi Liu, Kevan Dodhia, Yuanchun Li, Gaurav Srivastava, Matthew Fredrikson, Yuvraj Agarwal, and Jason I. Hong. 2018. Why Are They Collecting My Data?: Inferring the Purposes of Network Traffic in Mobile Apps. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT ’18). doi: 10.1145/3287051