We Built A Software Engineer |
Written by Mike James | |||
Wednesday, 20 March 2024 | |||
One of the most worrying things about being a programmer today is the threat from AI. It has gone so far that NVIDA CEO Jensen Huang proclaims that you really shouldn't start training as a programmer because the writing is on the wall. So far the experience of AI programmers has been mixed, but what if you set out to build an LLM dedicated to programming and not just cashed in on those that are generally available. This is what Cognition, a Peter Thiel backed company claims to have done and the new engineer on the block is called Devin. "Devin is an autonomous model that can plan, analyze, and execute complex code and software engineering tasks with a single prompt. It has its own command line, a code editor, and a separate web browser." Compare this to the way that we use AI co-pilots at the moment. All of the current co-pilots are general purpose LLMs that have been trained on a corpus of data from the web. They might be able to help you with a program, but they can also answer more general questions and generally engage in conversation about most topics with you. Devin on the other hand is focused on coding. The details of how it has been trained or its architecture aren't clear from the announcement, but this is what it does: When a prompt is entered, Devin goes into “Planner” mode, where a step-by-step guide explains how to tackle the problem.
While we don't have much information on how it works Cognition wants to convince us that it does work: We evaluated Devin on SWE-bench, a challenging benchmark that asks agents to resolve real-world GitHub issues found in open source projects like Django and scikit-learn. Devin correctly resolves 13.86%* of the issues end-to-end, far exceeding the previous state-of-the-art of 1.96%. Even when given the exact files to edit, the best previous models can only resolve 4.80% of issues. At the moment trying Devin out is invitation only and there isn't too much insight into how it feels. This is to be expected. When the first co-pilots became widely available the press was all good, then it turned a little sour and now it seems realistic in its conclusion that there are some good things about co-pilots but they aren't the whole deal. Is Devin the whole deal? Without more detail it is difficult to say, but a simple consideration of what we do suggests that to solve coding you need to solve the general AI problem and restricting the ability of an AI programmer to just programming is likely to produce something worse not better. It is also clear that our jobs are not safe, but once there is a good replacement for a skilled and well educated computer programmer/engineer then just about every other "thinking" job is going to be under threat by the same system. Yes we are doomed - but so is everyone else. More InformationIntroducing Devin, the first AI software engineer Related ArticlesAI Code Assistants Copilot Chat Improves Confidence and Enjoyment GitHub Copilot Provides Productivity Boost Copilot Research Asks Who Will Clean Up The Mess GitHub Copilot Your Programming Pal To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.
Comments
or email your comment to: comments@i-programmer.info |
|||
Last Updated ( Wednesday, 20 March 2024 ) |