Tabnine Adds Code Provenance And Attribution Checks

Written by Kay Ewbank

Tuesday, 07 January 2025

Tabnine has added a feature intended to reduce the risk of IP infringement. The new Provenance and Attribution feature checks that code suggested by AI code assistants doesn't use code with copyright restrictions.

Tabnine is a code completion tool that uses generative AI for automatic code completion. It has AI-powered tools for code generation, testing, and code review, and supports 80 programming languages and frameworks.

tabnine

The Tabline team says that while state-of-the-art LLMs like Claude 3.5 Sonnet and GPT-4o have greatly improved the performance of generative AI applications, including AI code assistants, they have increased the risk of including code that is copyright restricted.

The reason is that the code these LLMs are trained on is collected without taking into account restrictions on how it can be used. The data the models are trained includes content from code repositories, some of which contain permissively licensed code while other repos contain code that has restrictions on how it can be used (for example, code with copyleft licensing like GPL). Copyleft licensing grants some freedoms over copies of copyrighted works so long as the same rights are passed on to works derived from the original.

Because LLMs tend to replicate patterns from their training data, third-party models like Claude 3.5 Sonnet and GPT-4o can regenerate code that exists in their training dataset, including code with copyleft licensing. If you inadvertently accept such code suggestions, then it introduces nonpermissive code in your codebase, resulting in IP infringement.

The Topline developers say that since the copyright law for the use of AI-generated content is still unsettled, there's a need to minimize the chance of including restricted code while still benefiting from the performance gains that come from these models.

In recognition of this, Tabnine has announced Provenance and Attribution, a new feature that can drastically reduce the risk of IP infringement when using models like Anthropic's Claude, OpenAI's GPT-4o, and Cohere's Command R+ for software development. Tabnine now checks the code generated within its AI chat against the publicly visible code on GitHub, flags any matches it finds, and references the source repository and its license type. Developers can then use this information to review code suggestions and decide if they meet the organisations specific requirements and policies.

In the past, Tabnine solved for this by offering a license-compliant model, Tabnine Protected 2, an LLM purpose-built for software development and trained exclusively on code that is permissively licensed. The new Provenance and Attribution feature offers an alternative for teams that are comfortable using a wider variety of models as long as they specifically don't inject unlicensed code.

codeass2

More Information

Tabnine Website

AI Code Assistants

Developers Positive About Using AI Tools

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Gemini Code Assist Adds Free Layer
10/03/2025

Google has announced the public preview of Gemini Code Assist for individuals, a free version of Google's AI-coding assistant together with Gemini Code Assist for GitHub, which provides free, AI- [ ... ]

+ Full Story

Eclipse Adds AI To Theia
13/03/2025

The Eclipse Foundation has announced that its Theia IDE will now have AI-based features. Theia AI is an open framework that lets tool builders integrate Large Language Models (LLMs) into custom tools [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 07 January 2025 )

More Information

Related Articles

Comments