Star Tech Corner: 2025

Monday, February 24, 2025

The Two-Hour Prompt: Why Good AI Instructions Take Time

I need a script to build a graph of some Wikipedia pages. Instead of coding it myself, I experimented with using an LLM to generate it for me. After spending two hours in front of my computer, here is the prompt I have written:

It is a short prompt if you remove the JSON sample. You might think it shouldn't take two hours to write—but think again.

Programming is the process of instructing a computer on exactly what to do. So, when I was writing the prompt, I was designing how the script would work. At the same time, I had to figure out how to gather the right data by studying the best way to retrieve it.

The resulting prompt is a pseudocode-level specification. The time invested was worth it because it worked right away. There were minor bugs, but they were very easy to fix.

Here is what Grok 3 commented:

In short, the prompt’s apparent simplicity hides a dense web of interlocking steps, each requiring careful thought, validation, and articulation. Two hours is reasonable for distilling such a process into a coherent set of instructions, especially if you were simultaneously designing the workflow and documenting it. It’s a bit like writing code and its documentation at the same time—except you’re doing it in natural language, which adds an extra layer of effort to keep it intuitive yet precise.

Yes, you can do Vibe Coding—blindly accepting AI suggestions, copying error messages, and hoping for the best for hobby projects. If you are working on real projects with deadlines, it is better to learn software engineering properly.

Monday, January 20, 2025

The Reality Check on AI Coding Agents: Lessons from Devin

A recent blog post from Answer.AI titled "Thoughts On A Month With Devin" has sparked intense discussion in the development community, particularly on Hacker News. Their detailed experiment with Devin, an AI coding agent, offers valuable insights into the current state of autonomous AI development tools.

The Promise vs. Reality

When Devin first appeared on the scene, it seemed to represent a breakthrough in AI coding assistance. The Answer.AI team was initially impressed by Devin's capabilities: pulling data between platforms, creating applications from scratch, and even navigating API documentation – all while maintaining natural communication through Slack. It appeared to be the autonomous coding assistant many had dreamed of.

However, their extensive testing revealed a more complex reality. Out of 20 real-world tasks, Devin succeeded in only three cases, with 14 outright failures and three inconclusive results. More concerning was the unpredictability – there seemed to be no clear pattern to predict which tasks would succeed or fail.

The Autonomy Paradox

Perhaps the most interesting insight from Answer.AI's experiment is how Devin's supposed strength – its autonomy – became its greatest weakness. As one Hacker News commenter, davedx, pointedly asked: "Why doesn't Devin have an 'ask for help' escape hatch when it gets stuck?" Another commenter, rsynnott, noted that Devin embodied "the worst stereotype" of a junior developer – one who won't admit when they're stuck.

The Tools vs. Agents Debate

The Hacker News discussion revealed a clear divide in approaches to AI coding assistance. As CGamesPlay pointed out, tools like GitHub Copilot (which they described as "better tab complete") and Aider (for more advanced edits) are proving more practical because they assist developers rather than try to replace them. This aligns with Answer.AI's conclusion that developer-guided tools are currently more effective than autonomous agents.

Current Sweet Spots for AI in Development

Despite the setbacks, both the original blog post and Hacker News comments helped identify where AI truly shines. As rbren, the creator of OpenHands, noted, about 20% of their commits come from AI agents handling routine tasks like fixing merge conflicts. Other commenters highlighted success with:

Generating boilerplate code and repetitive patterns
Assisting with specific, well-defined problems like complex SQL queries
Supporting documentation and test writing
Helping developers learn new technologies

The Future Outlook

The Hacker News discussion revealed interesting perspectives on AI's future in development. As commenter bufferoverflow suggested, LLMs might reach mid-level developer capabilities in 2-3 years and senior-level in 4-5 years. However, Zanfa countered that progress isn't linear, drawing parallels to self-driving cars. As npilk noted, the situation resembles AI image generation in 2022 – showing obvious flaws but with potential for rapid improvement.

The Human Element Remains Critical

Both the Answer.AI blog post and subsequent discussion emphasize that successful software development isn't just about writing code. As jboggan pointed out in the Hacker News thread, if humans can't learn to use the tool effectively and discern patterns of best practices, then it isn't really a useful tool. This highlights the continuing importance of human judgment and oversight.

Economic Implications

The Hacker News discussion raised important points about the economic impact of AI in development. While some commenters like the_af expressed concerns about job displacement and salary deflation, others like lolinder drew parallels to previous fears about offshoring, which didn't lead to the predicted negative outcomes. This debate reflects the broader uncertainty about AI's impact on the software development profession.

Conclusion

The Answer.AI team's experiment with Devin, and the subsequent Hacker News discussion, serve as both a reality check and a roadmap. While fully autonomous coding agents may not be ready for prime time, the experiment has helped clarify where AI can most effectively support development work. The future of software development likely lies not in replacement but in synergy – finding the sweet spot where AI amplifies human capabilities rather than attempting to supplant them entirely.

As we move forward, the focus should be on developing tools that maintain this balance, keeping developers in the driver's seat while leveraging AI's strengths in handling routine tasks and generating initial solutions. This approach promises to enhance productivity while maintaining the quality and reliability that professional software development demands.