I have and use both Claude Code and Gemini CLI, and still don't consider Gemini worth starting for coding except to review Claude's output in critical commits (on a security boundary, maybe broad refactors, etc.), though I try side-by-side every now and then just to see the state of things. I also use Gemini Pro in a security scanning harness to act as a second set of eyes, but Opus is better at finding security bugs than Gemini, so I don't know that it's accomplishing anything beyond just using Opus.
Gemini Pro 3.1 for agentic coding is still clumsy. It chews a lot, has a harder time with tools and interacting with the codebase. I haven't tried any 3.5 version, yet, though. The benchmarks look promising.
I'll note I like the Google models' prose better than any others at the moment, though. Even the small open models (Gemma 4 family) have excellent prose, relatively speaking, that doesn't stink of the LLMisms that I find so annoying about OpenAI (especially) and Anthropic models. So, I'll probably start using Gemini for writing API docs, even if all code is Claude.
I would argue that prose is just a prompt issue. GPT 5.5 outout is easier to control whan Gemini by prompting. Having better defaults does not make it necessarily better.
I would disagree. I think it'd take a lot of prompting to make GPT 5.5 not have the underlying personality of GPT, which I find awful. They have knobs in ChatGPT to choose a "professional" tone, which improves it somewhat, but even that is still the worst prose of any leading model.
My default AGENTS.md/CLAUDE.md/etc. is a few sentences from Strunk and White, to try to make all the models not suck at writing. It helps keep the models brief, but it doesn't actually make models with shitty prose have good prose. The relevant portion of my agents file is: "Omit needless words. Vigorous writing is concise. A sentence should contain no unnecessary words, a paragraph no unnecessary sentences, for the same reason that a drawing should have no unnecessary lines and a machine no unnecessary parts." Which might add up roughly the same as "be brief" in the weights, I don't know.
If you have a prompt that makes GPT a decent-to-good writer, I would like to see it.
Gemini produces decent-to-good prose without prompting, which improves if instructed to be concise. The other models, even the frontier models, do not have decent-to-good prose without prompting, and even with prompting, rarely elevate to what I would consider Good Enough. Part of this may be that GPT and Claude models get used a lot more heavily, and so I'm highly tuned into their idiosyncrasies. The heavy use of emojis, the click-bait headline style, etc. that they both use unprompted. All of that is repugnant to me, so anything that doesn't do all that by default, or at least not as aggressively, has a huge leg up.
Gemini models have consistently disregarded rules and gone their own way for me. They will finish a task and get it done frequently way above the scope that you gave it, but they take a million shortcuts to get there. e.g. deciding the linter isn't important and disabling the pre commit hook. coding features you didn't ask for.
It is the smartest for creative web stuff like HTML/CSS/JS.
But it has been very stubborn with following instructions like AGENTS.md.
And architecturally for large projects I tested, the code isn't on par with Opus 4.5+ and GPT 5.3+.
I would rather use DeepSeek 4 Flash on High (not max) than Gemini even if they had the same cost.
I currently use GPT 5.5 + DeepSeek 4 Flash.
BUT I didn't test Gemini 3.5 Flash yet. And it seems, from another comment in this post, that the Antigravity quota for is bricked for Google Pro plans which is the plan I have. So I don't have high hopes.
This intentionality in the application of AI is very confusing for folks because at first glance it seems like it should just work.
It seems to, even.
Whereas if you hand a router to someone with a flush trim but in it and ask them to clean up the edge of a table they will take one look at it and nope away from that dangerous spinning thing.
If they have the mind to give it a shot and despite a quality tool and bit they bite into the table and ruin the line (or something much worse) no one will be surprised—-they have no experience or recognition of what expertise is in woodworking.
But with AI, it is much more hazy what expertise is.
The methodology for quality results is changing each week and the articulation in personal tooling involved makes it challenging to adopt another “expert”’s workflows.
And most people can’t just spin up a furniture factory at their whim and call themselves a designer. AI gives everybody with the slightest gumption a fully-functional, “initially plausible crap” factory at their fingertips, so everybody with actual skills gets lost in a sea of useless garbage.
Luckily the same AI tools that are generating the content can be used to build better tailored discriminators for it as well. If I can define what it is I dislike about an essay, video, etc., and give it to an LLM, it can tell me whether a piece I present to it is worthwhile to consume according to my standards. This even applies to things the LLM can't generate things for that meet the same standard for its programmed/prompted discrimination.
Yeah works out great for HR departments and job hunters.
Assuming that the LLM search will be meaningfully better at cutting through bullshit than the generating model was at avoiding creating it is, charitably, dubious. Assuming that it won’t be every bit as gamable as Google results is as or more dubious.
It also enables a similar model to Facebook's insight into third party mobile app growth. The state could look for early growth trends in a given category or model type.
Then their org has the option to burnish or bury models that align with their goals.
Though I'm not OP, I will say that it seems like there are two brands that mostly have the market cornered.
Chamberlain/Liftmaster/MyQ is all the same company; they are a gross company that hates the idea of giving you control over your device. Zero LAN control story, Zero Homekit story, zero Home Assistant and no possibility of any of these.
Genie - whose "app" thing is called Aladdin Connect is the other one. There is a HA integration[1] for it, though it's cloud-dependent, no LAN story so again your ability to control it is subject the company's cloud servers being available, and to any future whim they may have. The Github for the plugin has issues reported, but no idea how widespread they are.
Looking at places like Home Depot it seems there's a brand called SkyLink[2] but it seems cheap in the bad way, and while it has its own "app" there seems to be no HA story whatsoever, so I assume kinda the worst of all.
Deeply uncomfortably, I would have to grudgingly acknowledge the practicality of buying from the gross Chamberlain, never using its MyQ BS, and connecting a RatGDO to it instead, which would give the best experience, even though giving them any business deeply offends me.
Please read my comment again - I mentioned RatGDO and literally own one. My point was that there's no possibility Chamberlain would ever give you any of these things in the built-in Wi-Fi-connected hardware you've already paid them for.
It is unfortunate to reward those weasels for their bad behavior by buying their device, even if one substitutes their own "brain" and never uses MyQ. But yeah, that may be the only practical option.
A sign of weariness in the rapid evolution of tooling, where people got off the train a stop too early?
A confusing overloaded acronym (cli) and term (skill) lacking the marketability / easy mind share of a unique acronym?
These all fail to establish a hearty reason to be.
The walking dead are still dead.
reply