Hacker Newsnew | past | comments | ask | show | jobs | submit | while1's commentslogin

We're building AI testing tools at QA.tech and this matches my experience. Great post. The hard part was never generating code. It's figuring out if what came out is actually correct. Our team runs multiple AI agents in parallel writing code and honestly we spend way more time on verification than generation at this point. The ratio keeps getting worse as the models get better at producing plausible-looking stuff.

The codebase growth numbers feel right to me. Even conservative 2x productivity gains break most review processes. We ended up having to build our own internal review bot that checks the AI output because human review just doesn't keep up. But it has to be narrow and specific, not another general model doing vibes-based review.


> way more time on verification than generation

Was generation a bottle neck previously? My experience has been verification is always the slow part. Often times it’s quicker to do it myself than try to provide the perfect context (via agents.md, skills, etc) to the agent.

The times it’s able to 1 shot things is also code that would take me the shortest amount of time to write.


We are using neo4j to power our agents at QA.tech.

Essentially to make it behave more like a human so that it learns and builds up an understanding of the pages it should test we map interactions into a knowledge graph stored in Neo4j. The consists of Pages and Actions on the page as well as links to documentation sections and other relevant info, together with descriptions, metadata and embeddings for search.

To make the agents better at planning and understanding the context of the page it can search the graph for relevant information and expand through the graph for more context.

This works remarkably well. I think our agents (when they have interacted a bit w the page) are some of the best browser agents I have tested.

I would highly recommend this but you need to put some effort into a nice ontology for the graph and making the tolling right for your use case. Its really not just plug and play. :)


Really cool tool! Great job!


This is such a great tool for developing something quickly to visualize an idea you have to other ppl. So cool!


thanks while1


Loving this! Very surprising that the LLMs of today are so bad at understanding interfaces but it also makes it a very interesting case for finetuning!


Neither does markdown. Just put it in a git repo.


The global environment is chosen here, because this is the will of God.


Cool! This looks really nice and definitely useful!


This looks awesome! I quite often stumble upon the task of automating or porting spreadsheets into code. This is always a pain as it might in some ways be hard to visualize the flow in a chart. This tool would greatly simplify the task. Looks really sweet!


Same for me as well.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: