Tabnine Adds Code Provenance And Attribution Checks
Written by Kay Ewbank   
Tuesday, 07 January 2025

Tabnine has added a feature intended to reduce the risk of IP infringement. The new Provenance and Attribution feature checks that code suggested by AI code assistants doesn't use code with copyright restrictions.

Tabnine is a code completion tool that uses generative AI for automatic code completion. It has AI-powered tools for code generation, testing, and code review, and supports 80 programming languages and frameworks.

tabnine

The Tabline team says that while state-of-the-art LLMs like Claude 3.5 Sonnet and GPT-4o have greatly improved the performance of generative AI applications, including AI code assistants, they have increased the risk of including code that is copyright restricted.

The reason is that the code these LLMs are trained on is collected without taking into account restrictions on how it can be used. The data the models are trained includes content from code repositories, some of which contain permissively licensed code while other repos contain code that has restrictions on how it can be used (for example, code with copyleft licensing like GPL). Copyleft licensing grants some freedoms over copies of copyrighted works so long as the same rights are passed on to works derived from the original.

Because LLMs tend to replicate patterns from their training data, third-party models like Claude 3.5 Sonnet and GPT-4o can regenerate code that exists in their training dataset, including code with copyleft licensing. If you inadvertently accept such code suggestions, then it introduces nonpermissive code in your codebase, resulting in IP infringement.

The Topline developers say that since the copyright law for the use of AI-generated content is still unsettled, there's a need to minimize the chance of including restricted code while still benefiting from the performance gains that come from these models.

In recognition of this, Tabnine has announced Provenance and Attribution, a new feature that can drastically reduce the risk of IP infringement when using models like Anthropic's Claude, OpenAI's GPT-4o, and Cohere's Command R+ for software development. Tabnine now checks the code generated within its AI chat against the publicly visible code on GitHub, flags any matches it finds, and references the source repository and its license type. Developers can then use this information to review code suggestions and decide if they meet the organisations specific requirements and policies.

In the past, Tabnine solved for this by offering a license-compliant model, Tabnine Protected 2, an LLM purpose-built for software development and trained exclusively on code that is permissively licensed. The new Provenance and Attribution feature offers an alternative for teams that are comfortable using a wider variety of models as long as they specifically don't inject unlicensed code.

codeass2

More Information

Tabnine Website

Related Articles

AI Code Assistants

Developers Positive About Using AI Tools

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Banner


Explore Programming Idioms
03/01/2025

Introducing a web collection of programming idioms in a variety of languages. How useful is that?



Learn To Code With Scrimba
27/12/2024

Scrimba is an online interactive coding platform that aims to make learning to code fun and accessible. It offers courses on web development, including HTML, CSS, JavaScript and React and also has an  [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 07 January 2025 )