AI vs Freelance Coders: Who Wins?

AI models are making strides in real-world coding tasks, but freelance coders still hold the edge in reliability and accuracy. A recent study by researchers at PeopleTec compared AI models and human coders in a series of programming challenges, shedding light on the strengths and limitations of AI in the freelance coding industry.

The study, titled "Can AI Freelancers Compete? Benchmarking Earnings, Reliability, and Task Success at Scale," evaluated four AI models—Claude 3.5 Haiku, GPT-4o-mini, Qwen 2.5, and Mistral—against a dataset of 1,115 coding tasks sourced from Freelancer.com. The tasks, valued at an average of $306 each, covered a range of programming and data analysis challenges.

AI Model Performance

The AI models were tested on their ability to complete the tasks accurately and efficiently. Claude 3.5 Haiku emerged as the top performer, solving 877 tasks with an accuracy rate of 78.7%. GPT-4o-mini followed closely, solving 862 tasks with a 77.3% success rate. Qwen 2.5 and Mistral trailed behind, solving 764 and 474 tasks, respectively.

"Claude 3.5 Haiku narrowly outperformed GPT-4o-mini, both in accuracy and in dollar earnings," the researchers noted. However, none of the models matched the estimated 95% success rate of human software engineers.

SWE-Lancer Benchmark

The study was partly inspired by OpenAI's SWE-Lancer benchmark, which accumulated over $1 million worth of software tasks. The PeopleTec researchers aimed to create a more universal benchmark by evaluating the AI models on a broader range of tasks.

In the OpenAI benchmark, the AI models struggled with more complex problems. For instance, Claude 3.5 Sonnet earned only $208,050 on the SWE-Lancer Diamond set, resolving just 26.2% of issues. This highlights the current limitations of AI in handling high-difficulty coding tasks.

AI Assisting Freelancers

While AI models are not yet capable of replacing freelance coders, they are increasingly being used to assist with tasks. David Noever, chief scientist at PeopleTec, noted that AI models are already helping with generating job requirements, answering queries, and scoring responses.

"AI models are being integrated into the freelance workflow, from generating requirements to evaluating solutions," Noever said. He predicts that full automation of these tasks may be achievable within months, though human oversight will remain essential.

Open Source Model Limitations

The study also revealed that open source AI models, such as Qwen 2.5 and Mistral, tend to underperform compared to commercial models like Claude and GPT-4o-mini. This disparity is particularly noticeable at parameter sizes of 30 billion or more, suggesting that advanced infrastructure may be required to optimize AI performance.

Despite these limitations, the findings underscore the potential of AI in the coding industry. As AI models continue to improve, they could significantly transform the freelance coding landscape, offering both opportunities and challenges for human coders.