Stack Overflow Considered Harmful?
Written by Sue Gee   
Wednesday, 11 October 2017

What proportion of Android apps in the Play store include security-related code snippets copied directly from Stack Overflow? Does the copied code increase or decrease application security?

These questions are addressed by a team of researchers from the Fraunhofer Institute for Applied and Integrated Security and the Center for IT-Security, Privacy and Accountability at Saarland University led by Felix Fischer in the paper, Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security.

soharmful

The results of the team's analysis are alarming: 15.4% of the 1.3 million Android applications analyzed in the study contained security-related code snippets from Stack Overflow. Out of these 97.9% contained at least one insecure code snippet.

Using a machine-learning approach Fischer and his colleagues first identified all Android posts on Stack Overflow and then extracted security-related code snippets. A code snippet, which are easily identified by being surrounded by <code> tags  was considered security-related if it called to one of the following Java security libraries:

  • Cryptography: Java Cryptography Architecture (JCA), Java Cryptography Extension (JCE)

  • Secure network communications: Java Secure Socket Extension (JSSE), Java Generic Security Service (JGSS), Simple Authentication and Security Layer (SASL)

  • Public key infrastructure: X.509 and Certificate Revocation Lists (CRL) in java.security.cert, Java certification path API, PKCS#11, OCSP 

  • Authentication and access control: Java Authentication and Authorization Service (JAAS)

Code snippets that referred to BouncyCastle or SpongyCastle, specially designed for as security libraries for Android, so were snippets for the Apache TLS/SSL package as part of the HttpClient library, which is one of the most used libraries on GitHub, and snippets that referenced keyczar and jasypt two security libraries specifically designed with usability in mind. Finally the analysis searched for snippets that used GNU Crypto. As such snippets are difficult to integrate into Android, this inconvenient alternative was included to provide a contrast. 

This table summarizes the cryptographic libraries and their features:

soharmfultable

Having uncovered over 4000 security-related code snippets, the researchers used supervised learning to train a support vector machine to distinguish between secure and insecure snippets using a conservative labeling scheme that classified only the definitely vulnerable code snippets as insecure, specifically:

  • Snippets that contained used outdated algorithms or static initialization vectors and keys for symmetric cryptography, weak RSA keys for asymmetric cryptography, insecure random number generation or insecure SSL/TLS implementations.

copypastepipeline

Overall processing pipeline: (1) Code extraction; (2) Filtering; (3) Classification; (4) Program dependency graph generation; (5) Clone detection.

Having filtered security-related code snippets from Stack Overflow and classified them either as secure or insecure, steps (1) to (3) in the processing pipeline, the next steps were to detect the code snippets in compiled Android applications from Google Play, i.e steps (4) and (5).

Given that snippets are provided as source code and are not usually complete programs and Android applications are only available as high-level binaries (i. e. DEX files) the researchers had to find a means of converting incomplete snippets to the same intermediate representation as Android applications.  Once this was done they approached code snippet detection on finding similar Program Dependency Graphs (PDG).

For the final phase of the research the toolchain for snippet detection was applied to 1,305,820 free Android apps on Google Play. Overall, they detected copied and pasted snippets in 200,672 (15.4%) apps, the majority of which were question snippets rather than answer snippets. At least one insecure snippet was found in 196,403 (15%) apps and the top offending snippet, which uses an insecure custom TrustManager, was found in 180,388 (13.81%) apps:

topsnippet

The remaining insecure snippets were found in 43,941 (3.37%) distinct apps.

The researchers also discovered that 506,922 (38.82%) apps contained a secure snippet, the most frequent of which was detected in 408,011 (31.24%) apps while the remaining snippets were in 73,839 (5.65%) apps. They reported that on average, an insecure snippet is found in 4,539.96 apps, while a secure code snippet is found in 10,719.83 apps. 

Summing up they write: 

So should Stack Overflow be considered harmful?
From classical risk evaluation perspective, the answer to this question depends on domain specific assets: A banking application with flawed cryptographic key initialization causes severe damage to the respective bank, even if the application has a relatively small user group. The same flaw in a set of popular gaming apps with very high download counts might not represent a major threat to the individual game developer studios, but has the potential to impact the Android ecosystem on a large scale. So depending on perspective, domain specific assets, and associated risks, the concrete threats posed by copying crowd-sourced code into applications must be evaluated individually.
Finally, we want to stress the benefits of including secure code snippets into real-world applications. We identified several secure code snippets in critical applications, which is of great good for the community.

 

The main finding of the paper were presented by Felix Fischer at the 38th IEEE Symposium on Security and Privacy held on Oakland in June:

 

More Information

Stack Overflow Considered Harmful? The Impact of Copy & Paste on Android Application Security (pdf on arxiv.org)

Resources for paper on Fraunhofer AISEC site 

Related Articles

Stack Overflow: A Code Laundering Platform?

Do You Have To Attribute Stack Overflow Code?

Stack Overflow Reveals Hiring Trends

Stack Overflow Helps You Tell Your Story To Land Your Next Job

Stack Overflow Developer Survey 2016

Devpost Diversifies Into Developer Jobs 

Stack Overflow Developer Characteristics

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


IBM Opensources AI Agents For GitHub Issues
14/11/2024

IBM is launching a new set of AI software engineering agents designed to autonomously resolve GitHub issues. The agents are being made available in an open-source licensing model.



Apache Lucene Improves Sparce Indexing
22/10/2024

Apache Lucene 10 has been released. The updated version adds a new IndexInput prefetch API, support for sparse indexing on doc values, and upgraded Snowball dictionaries resulting in improved tokeniza [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

 

Last Updated ( Wednesday, 11 October 2017 )