We Built A Software Engineer
Written by Mike James   
Wednesday, 20 March 2024

One of the most worrying things about being a programmer today is the threat from AI. It has gone so far that NVIDA CEO Jensen Huang proclaims that you really shouldn't start training as a programmer because the writing is on the wall. So far the experience of AI programmers has been mixed, but what if you set out to build an LLM dedicated to programming

and not just cashed in on those that are generally available.

cognition

This is what Cognition, a Peter Thiel backed company claims to have done and the new engineer on the block is called Devin.

"Devin is an autonomous model that can plan, analyze, and execute complex code and software engineering tasks with a single prompt. It has its own command line, a code editor, and a separate web browser."

Compare this to the way that we use AI co-pilots at the moment. All of the current co-pilots are general purpose LLMs that have been trained on a corpus of data from the web. They might be able to help you with a program, but they can also answer more general questions and generally engage in conversation about most topics with you. Devin on the other hand is focused on coding. The details of how it has been trained or its architecture aren't clear from the announcement, but this is what it does:

When a prompt is entered, Devin goes into “Planner” mode, where a step-by-step guide explains how to tackle the problem.

    • one which has all the input prompts
    • second is the command line section
    • third, its own code editor and
    • fourth, it has its own browser, which thoroughly analyzes resources to derive inferences. 
    • Finally it gives a visualization of the solution.

While we don't have much information on how it works Cognition  wants to convince us that it does work:

We evaluated Devin on SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open source projects like Django and scikit-learn.

Devin correctly resolves 13.86%* of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%. Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues.

At the moment trying Devin out is invitation only and there isn't too much insight into how it feels. This is to be expected. When the first co-pilots became widely available the press was all good, then it turned a little sour and now it seems realistic in its conclusion that there are some good things about co-pilots but they aren't the whole deal.

Is Devin the whole deal?

Without more detail it is difficult to say, but a simple consideration of what we do suggests that to solve coding you need to solve the general AI problem and restricting the ability of an AI programmer to just programming is likely to produce something worse not better. It is also clear that our jobs are not safe, but once there is a good replacement for a skilled and well educated computer programmer/engineer then just about every other "thinking" job is going to be under threat by the same system.

Yes we are doomed - but so is everyone else. 

cognitionicon

More Information

Introducing Devin, the first AI software engineer

Related Articles

AI Code Assistants
Amazon's AI Wake-Up - Free Code Assistant

Copilot Chat Improves Confidence and Enjoyment

GitHub Copilot Provides Productivity Boost

The Impact of Copilot Chat

Copilot Research Asks Who Will Clean Up The Mess

Codex - English To Code

GitHub Copilot Your Programming Pal

AI Helps Generate Buggy Code

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Rust 1.82 Improves Apple Support
24/10/2024

Following Rust's six-week release cycle, version 1.82 has been released with higher level support for Apple, and a new Info subcommand for Cargo.



IBM Updates Granite Models
28/10/2024

IBM has released new Granite models that it says provide state-of-the-art performance relative to model size. The Granite 3.0 collection includes a new, instruction-tuned, dense decoder-only LLM.


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

  

Last Updated ( Wednesday, 20 March 2024 )