I think the biggest winner of this might be Google. Virtually all the frontier AI labs use TPU. The only one that doesn't use TPU is OpenAI due to the exclusive deal with Microsoft. Given the newly launched Gen 8 TPU this month, it's likely OpenAI will contemplate using TPU too.
Many labs use TPUs, but not exclusively. Most labs need more compute than they can get, and if there's TPU capacity, they'll adapt their systems to be able to run partially on TPUs.
AMD’s software experience is riddled with bugs rendering out of the box training with AMD is impossible. We were hopeful that AMD could emerge as a strong competitor to NVIDIA in training workloads, but, as of today, this is unfortunately not the case. The CUDA moat has yet to be crossed by AMD due to AMD’s weaker-than-expected software Quality Assurance (QA) culture and its challenging out of the box experience.
[snip]
> The only reason we have been able to get AMD performance within 75% of H100/H200 performance is because we have been supported by multiple teams at AMD in fixing numerous AMD software bugs. To get AMD to a usable state with somewhat reasonable performance, a giant ~60 command Dockerfile that builds dependencies from source, hand crafted by an AMD principal engineer, was specifically provided for us
[snip]
> AMD hipBLASLt/rocBLAS’s heuristic model picks the wrong algorithm for most shapes out of the box, which is why so much time-consuming tuning is required by the end user.
etc etc. The whole thing is worth reading.
I'm sure it has (and will continue to) improved since then. I hear good things about the Lemonade team (although I think that is mostly inference?)
That’s insane. There should be a big team of people at AMD whose whole job is just to dogfood their stuff for training like this. Speaking of which, Amazon is in the same boat, I’m constantly surprised that Amazon is not treating improving Inferentia/Trainium software as an uber-priority. (I work at Amazon)
> “Are we afraid of our competitors? No, we’re completely unafraid of our competitors,” said Taylor. “For the most part, because—in the case of Nvidia—they don’t appear to care that much about VR. And in the case of the dollars spent on R&D, they seem to be very happy doing stuff in the car industry, and long may that continue—good luck to them.
Where's the scope for an L7 promo in "Fixed a bunch of tiny issues that were making it hard to use Tranium/Inferentia with PyTorch"?
Amazon's compensation strategy, in which you primarily get a raise years in the future for tricking your management chain into promoting you is definitely bearing its rotten fruit.
Anecdotal but over several years with an AMD GPU in my desktop I've tried multiple times to do real AI work and given up every time with the AMD stack.
Im running fine on my AMD 7800xt 16gb... Yes memory is a bit limited, but apart from the i have found that it works great using Vulcan in LM studio for example.
ROCm works great too, the only issue i have had is that my machine froze a couple of times as it used 100% of the graphics and the OS had nothing left. Since moving to vulcan i stopped getting these errors apart from a little UI slowdown when i had 4 models loaded at the same time taking turns.
Im also on a i7 6700 with 32gb DDR4 so im sure that is causing more slowdowns then the graphics card.
Yet another reason to doubt claims that ”software is solved”.
Anthropic did retire an interview take-home assignment involving optimising inference on exotic hardware, because Claude could one shot a solution, but that was clearly a whiteboard hypothetical instead of a real system with warts, issues and nuance.
i'm doing inference on a free mi300x instance from AMD right now. not sure if the software stack is just old or what, but here's what i've observed: stuck on an old version of vllm pre-Transformers 5 support. it lacks MoE support for qwen3 models. oss-120b is faaaar slower than it should be.
int8 quantization seems like it's almost supported, but not quite. speeds drop to a fraction of full precision speed and the server seems like it intermittently hangs. int4 quantization not supported. fp8 quantization not supported.
again, maybe AMD is just being lazy with what they've provided, but it's not a great look.
right now the fastest smart model i can run is full precision qwen3-32b. with 120 parallel requests (short context) i'm getting PP @ 4500 tokens/sec and TG @ 1300 tokens/sec
From the papers I've read and the labs that I have worked in personally, I would say that most scientists developing Deep learning solutions use CUDA for GPU acceleration
Yeah, historically it’s been software that’s limited AMD here. Not surprised to hear that may still be the issue. NVidia’s biggest edge was really CUDA.
I don’t know what’s a chicken and what’s an egg here. But ROCm support is often missing or experimental even in very basic foundational libraries. They need someone else to double down on using their chips and just break the software support out of the limbo.
This is what I've heard on the "street". Building a CUDA-compatible stack for AMD's hardware requires highly-paid SWEs. It's a very niche field, and talent is hard to come by.
But AMD does not want to pay these specialized SWEs the market rate. Their existing SWEs would be up in arms saying, basically, "what are we, chopped liver??", or so the thinking goes.
So AMD is stuck with a shitty software stack which cannot compete with CUDA.
If I were making such decisions, I would just cull the number of existing SWEs down by 50%, and double the pay for remaining ones. And then go out and hire some top talent to build a good software stack.
Google is in a different position to others in that they're the only frontier lab with a cloud infra business. It obviously makes sense to sell GPUs on cloud infra as people want to rent them. In that respect Google buys a ton of GPUs to rent out.
What's unclear to me is how much Google uses GPUs for their own stuff. Yes Gemini runs on GPUs now, so that Google can sell Gemini on-prem boxes (recent release announced last week), but is any training or inference for Gemini really happening on GPUs? This is unclear to me. I'd have guessed not given that I thought TPUs were much cheaper to operate, but maybe I'm wrong.
Caveat, I work at Google, but not on anything to do with this. I'm only going on what's in the press for this stuff.
It mentions that Gemini can run on eight NVIDIA GPUs, but not which GPU and which Gemini model. Either way, this puts an upper bound of 288 * 8 = 2304 GB on the size of the Gemini model, which as far as I know has been a secret until now.
I have most likely outdated info, I left Google Research 4y ago. Back then, available TPU instances were plenty and GPU scarce. Nobody wanted to mess with an immature crashing compiler and very steep performance cliffs (performance was excellent only if you stayed within the guardrails, and being outside was supported and not even resulting in a warning - as it was so common in code).
But I believe most of it has changed for the better for TPUs.
And almost by happenstance Apple. Turns out they have a great platform for inference and torched almost nothing comparatively on Siri. The Apple/Gemini deal is interesting, Google continues to demonstrate their willingness to degrade their experience on Apple to try and force people to switch.
If you do the math (I did), in 2 years, open source models that you can run on a future MacBook Pro will be as capable as the frontier cloud models are today. Memory bandwidth is growing rapidly, as is the die area dedicated to the neural cores. And all the while, we have the silicon getting more power efficient and increasingly dense (as it always does). These hardware improvements are coming along as the open source models improve through research advancements. And while the cloud models will always be better (because they can make use of as much power as they want to - up in the cloud), what matters to most of us is whether a model can do a meaningful share of knowledge work for us. At the same time, energy consumption to run cloud infrastructure is out-pacing the creation of new energy supply, which is a problem not easily solved. I believe scarcity of energy will increasingly drive frontier labs toward power efficiency, which necessarily implies that the Pareto frontier of performance between cloud and local execution will narrow.
People had this "why you probably can't run a GPT-4 (or even GPT-3.5) class model on your MBP anytime soon" conversation before.
Today's LLMs are able pack much more capabilities into fewer parameters compared to 2023. We might still be at the very rudimentary phase of this technology there are low-hanging efficiency gains to be had left and right. These models consume many orders of magnitude more energy than a human brain, this all seems like room for improvement.
The right question: is there a law in information theory that fundamentally prevents a 70B model of any architecture from being as smart as Opus 4.7?
The OP said "as capable as the frontier cloud models are today" which might assume model improvements that do more with less. Opus 4.7/Gpt5.5 performance might be achievable with a fraction of the parameters.
Exactly. I also feel like being able to choose a model for the use case could be worth an idea. So instead of trying to squeeze all kinds of knowledge into a single model, even if it's moe, just focus models on use cases. I bet you only need double digit billion parameter models for that with same or even better performance
Opus and Gpt are generic LLMs with knowledge on all sort of topics. For specific use cases you probably don't need all the parameters? Suppose you want to generate code with opencode, what part of the generic LLM is needed and what parts can be removed?
As far as I can tell Minimax M2.7 is better than anything available a year ago, but it runs on an ordinary PC. Will that continue? Not sure, but the trend has continued for the last two years and I don't know of any fundamental limits the models are approaching.
I wish more people were more aware of this. I think so much of the current optimism is based on "it doesn't matter if companies are raising prices since I'm just going to run the model locally", doesn't fly.
> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters.
Or so they say.
If it's true then that just shows how far behind the cloud providers are lagging while wasting investor money.
(There's a huge amount of diminishing returns in increasing parameter counts and the intelligent AI company should be hard at work figuring out the optimal count without overfitting.)
Do that will only be possible with something like better 3D NAND flash memory, needs a new hardware. People are already trying to bring that the market. Contemplated taking a compiler position in such a company.
HBF is a non-starter, it runs way too hot compared to DRAM (which only pays for refresh at idle) for the same memory traffic. Only helps for extremely sparse MoE models - probably sparser than we're seeing today.
> A Opus 4.7/Gpt5.5 class model is 5 trillion parameters[1].
You could run it on a cluster of nodes that each do some mix of fetching parameters from disk and caching them in RAM. Use pipeline parallelism to minimize network bandwidth requirements given the huge size. Then time to first token may be a bit slow, but sustained inference should achieve enough throughput for a single user. That's a costly setup of course, but it doesn't cost $900k.
True but a cluster built on pipeline parallelism can naturally stream from multiple SSD's in parallel. That probably makes offload somewhat more effective. And you also have RAM caching available as a natural possibility.
I did this calculation a bit ago and don't think frontier models are just a few MacBook Pro generations away. Yes numbers reliably go up in tech in general but in specific semiconductors & standards have long lead-times and published roadmaps, so we can have high confidence in what we're getting even in 3-4 years in terms of both transistor density and RAM speeds.
In mid-2028 we have N2E/N2P with around 15% greater transistor density than today's N3P, and by EOY2028 we'll likely have A14 with about 35-40% density improvement.
Meanwhile, we'll be on LPDDR6 by that point, which takes M-series Pros from 307GB/s -> ~400GB/s, and Max's from 614GB/s -> ~800GB/s.
Model improvements obviously will help out, but on the raw hardware front these aren't in the ballpark for frontier model numbers. An H100 has 3TB/s memory bandwidth, fwiw
What do you need 3 TB/s memory bandwidth for in a single user context? DeepSeek V4 pro (the latest near-SOTA model) has about 25 GB worth of active parameters (it uses a FP4 format for most layers) which gives 12 tok/s on a 307 GB/s platform as the current memory bandwidth bottleneck, maybe a bit less than that if you consider KV cache reads. That's not quite great but it's not terrible either for a pro quality model. Of course that totally ignores RAM limits which are the real issue at present: limited RAM forces you to fetch at least some fraction of params from storage, which while relatively fast is nowhere near as fast as RAM so your real tok/s are far lower (about 2 for a broadly similar model on a top-end M5 Pro laptop).
So long as you don't require deep search grounding like massive web indexes or document stores which are hard to reproduce locally. You can do local agentic things that get close or even do better depending on search strategy, but theoretically a massive cloud service with huge data stores at hand should be able to produce better results.
In practice unless you're doing some kind of deep research thing with the cloud, it'll try to optimize mostly for time and get you a good enough answer rather than spending an hour or two. An hour of cloud searching with huge data stores is not equivalent to an hour of local agentic searching, presumably.
I think that problem will improve a little in the coming years as we kind of create optimized data curation, but the information world will keep growing so the advantage will likely remain with centralized services as long as they offer their complete potential rather than a fraction.
Also, all the cloud models don't have to be the best frontier models, and you don't need to focus on hitting the benchmark of shrinking Opus 4.7 down to a single MBP to make significant improvements. If you get it so that an Opus 4.7 benchmark-compatible model can run in $250k of datacenter capex (and associated reduced opex for power+cooling) that'd be a massive cost improvement that makes the cloud models cheaper. And for most consumers that'll probably be good enough. You don't need to run on a $5k laptop to make a big difference.
Apple is basically in the same boat as AMD and Intel. They have a weak, raster-focused GPU architecture that doesn't scale to 100B+ inference workloads and especially struggles with large context prefill. TPUs smoke them on inference, and Nvidia hardware is far-and-away more efficient for training.
The GPUs are bottom-barrel for compute-focused industries. It is mobile-grade hardware that arguably can't even scale to prior Mac Pro workloads.
> The GPU is monstrously good. Depending on the workload, the M1 series GPU using 120W could beat an RTX 3090 using 420W.
You're just listing the TDP max of both chips. If you limit a 3090 to 120W then it would still run laps around an M1 Max in several workloads despite being an 8nm GPU versus a 5nm one.
> It is kind of sad Apple neglects helping developers optimize games for the M-series
Apple directly advocated for ports like Death Stranding, Cyberpunk 2077 and Resident Evil internally. Advocacy and optimization are not the issue, Apple's obsession over reinventing the wheel with Metal is what puts the Steam Deck ahead.
Edit (response to matthewmacleod):
> Bold of them to reinvent something that hadn't been invented yet.
Vulkan was not the first open graphics API, as most Mac developers will happily inform you.
I'm confused how anyone ever thought the NPU would be a good idea. The GPU is almost always underutilized on Mac and could do the brunt of the work for inference if it embraced GPGPU principles from the start. Creating a dedicated hardware block to alleviate a theoretical congestion issue is... bewildering. That goes for most NPUs I've seen.
Apple had the technology to scale down a GPGPU-focused architecture just like Nvidia did. They had the money to take that risk, and had the chip design chops to take a serious stab at it. On paper, they could have even extended it to iPhone-level edge silicon similar to what Nvidia did with the Jetson and Tegra SOCs.
I think they built the NPU with whatever models they needed to run on the iPhone in mind vs trying to build a general purpose chip, and then got lucky it was also useful for LLMs.
(Like “I want to do object detection for cutting people into stickers on device without blowing a hole in the battery, make me a chip for that”.)
I'm not sure even Apple thought that, given that they don't officially provide access to ANE internals under macOS (barring unsupported hacks). But if that was fixed, it could then be useful for improving the power efficiency of prefill, where the CPU/GPU hardware is quite weak (especially prior to the M5 Neural Accelerators).
I very recently ran the numbers on these GPUs for an upcoming blog post. The token generation performance is bad, but the prefill performance is _really_ bad.
For a Qwen 3.6 35B / 3B MoE, 4-bit quant:
- parsing a 4k prompt on a M4 Macbook Air takes 17 seconds before generating a single token.
- on an M4 Max Mac Studio it's faster at 2.3 seconds
- on an RTX 5090, it's 142ms.
RTX 5090 uses more power than an M4 Max Mac Studio but it's not 16x more power.
Somehow Apple has always been able to sell their stuff as somehow Magic. Remember the megahertz myth? Apple hertzes and apple bytes are much better than PC hertzes and bytes because they are made by virgin elves during a full moon.
> Apple hertzes and apple bytes are much better than PC hertzes and bytes because they are made by virgin elves during a full moon.
The thing that Apple has always been excellent at is efficiency - even during the Intel era, MacBooks outclassed their Windows peers. Same CPU, same RAM, same disks, so it definitely wasn't the hardware, it was the software, that allowed Apple to pull much more real-world performance out of the same clock cycles and power usage.
Windows itself, but especially third party drivers, are disastrous when it comes to code quality, and they are much much more generic (and thus inefficient) compared to Apple with its very small amount of different SKUs. Apple insisted on writing all drivers and IIRC even most of the firmware for embedded modules themselves to achieve that tight control... which was (in addition to the 2010-ish lead-free Soldergate) why they fired NVIDIA from making GPUs for Apple - NV didn't want to give Apple the specs any more to write drivers.
> NV didn't want to give Apple the specs any more to write drivers.
I think that's a valid demand, considering Nvidia's budding commitment to CUDA and other GPGPU paradigms. Apple, backing OpenCL, would have every reason to break Nvidia's code and ship half-baked drivers. They did it with AMD's GPUs later down the line, pretending like Vulkan couldn't be implemented so they could promote Metal.
Apple wouldn't have made GeForce more efficient with their own firmware, they would have installed a Sword of Damocles over Nvidia's head.
> They did it with AMD's GPUs later down the line, pretending like Vulkan couldn't be implemented so they could promote Metal.
It was even worse than that, they just stopped updating OpenGL for years before either Vulkan or Metal existed at all. Taking a Macbook and using bootcamp would instantly raise the GPU feature level by several generations just because Apple's GPU drivers were so fucking old & outdated.
On Geekbench 5, the M1 hits 483 FPS and the RTX 3090 hits 504 FPS.
There are other workloads where the M1 actually beats the 3090.
Apple does plenty of hyping but it's always cute when irrational haters like you put them down. The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.
What geekbench 5 fps are you talking about? Geekbench only has OpenCL and Vulkan scores for the 3090 as far as I can tell, and the M1 Ultra is less than half the OpenCL score of the 3090. And the M1 Ultra was significantly more expensive.
Find or link these workloads you think exist, please
> The M1 was (well, is) a marvel and absolutely smokes a 3090 in perf per watt.
The GTX 1660 also smokes the 3090 in perf per watt. Being more efficient while being dramatically slower is not exactly an achievement, it's pretty typical power consumption scaling in fact. Perf per watt is only meaningful if you're also able to match the perf itself. That's what actually made the M1 CPU notable. M-series GPUs (not just the M1, but even the latest) haven't managed to match or even come close to the perf, so being more efficient is not really any different than, say, Nvidia, AMD, or Intel mobile GPU offerings. Nice for laptops, insignificant otherwise
Here you go[0]. 'Aztek Ruins offscreen'. Although I misremembered the exact FPS, the 3090 is at 506 FPS.
Also note how the M1 Ultra is pushing 2/3 of the FPS of the 3090 despite 1/3 of the power budget and the game itself being poorly optimized for the M-series architecture.
And here[1] you have it smoking an Intel i9 12900K + RTX 3900. The difference doesn't look too impressive until you realize the power envelope for that build is 700-800W.
Also, the GTX 1660 (technically an RTX 2000 series, but whatever) is about 26% less efficient than an 3090[2].
> Being more efficient while being dramatically slower
That's my whole point and what you're refusing to see. The M1 is not dramatically slower than an i9 or 3090 despite having dramatically lower power use.
The proof for this will really start to come once Qualcomm and Mediatek have gotten a handle on their PC ARM chips and Valve decides they're good enough for a Steam Deck 2 or 3. You'll get to see 2-3x the battery life along a modest performance increase.
> Here you go[0]. 'Aztek Ruins offscreen'. Although I misremembered the exact FPS, the 3090 is at 506 FPS.
Oh, GFXBench not geekbench.
Realistically that 506 fps result is probably CPU bottlenecked, not that aztec ruins is all that relevant. It's a very old benchmark, released in 2018, that was destroyed for mobile GPUs, so realistically is using a 2010-ish GPU feature set.
If that's your use case, great. But it's not significant at all.
> And here[1] you have it smoking an Intel i9 12900K + RTX 3900.
Not using the GPU, so irrelevant. Also not using 700-800w
> Also, the GTX 1660 (technically an RTX 2000 series, but whatever) is about 26% less efficient than an 3090[2].
"bestvaluegpu" I've never heard of but holy AI slop nonsense batman. Taking 3dmark score and dividing it by TDP is easily one of the worst ways to compare possible.
Apple is in a much better boat than AMD or Intel. They have a gigantic warchest and can just snap up whoever looks like a leader coming out of the bubble burst.
It's becoming increasingly clear that there is no moat on models. The winners will be the ones who have existing products and ecosystems they can tie AI in to. You will pay adobe for credits because that will be the only AI that works in Photoshop, you will pay microsoft because only theirs will work on your microsoft cloud apps.
Open AI has nothing. Their tech will rapidly be devalued by free models the moment they stop lighting stacks of cash on fire.
I kind of agree with you at this point. When ChatGPT was rapidly gaining popularity I thought that they will eventually replace search (esp. for shopping), which would have given them a huge ad revenue. Maybe they could have even tried social networking e.g., to help you sort out the huge flow of information that today's social networks are and get to the important/rewarding/whatever posts. But now ChatGPT is kind of getting commoditized. I would even dare say that gemini feels to me a bit better now, so the search route for ChatGPT is clearly gone.
The parent post was arguing that they can do this now because they are lighting stacks of cash on fire. And once they stop doing that, their LLM lead will be gone in a hurry. They appear to not have a moat, like other more established players do.
Counterpoint: Apple's opportunity to invest in GPGPU architectures was ~2017 when Apple Silicon was in it's design stages. Apple always knew they were surrendering a large market segment by depreciating CUDA and OpenCL, they just never knew how big the market segment would get. Their liquid cash is wasted mid-bubble, and waiting for it to pop is not a realistic timeline anymore.
Arguably, Apple isn't even in a boat right now. At least AMD and Intel both ship hardware that synergizes with CUDA - Apple jumped off that ship, their hardware doesn't even come up in infrastructure discussions where AMD, Intel and Nvidia are taken for granted.
They also degrade their own direct services with little warning or thought put into change management, so, to be fair, Apple may be getting the same quality of service as the rest of us.
I think that's just how Google is, by nature. They don't intentionally degrade their services. They just aren't a customer centric company. They run on numbers. As a corporate, it doesn't really encourage support and maintenance work either.
Indeed. I'm wondering if Apple's "miss the train" with AI ended up being a blessing for them. Not only in the Google deal but also there's a lot of people doing interesting stuff locally..
This exclusivity went away in Oct 2025 (except for 'API' workloads).
OpenAI has contracted to purchase an incremental $250B of Azure services, and Microsoft will no longer have a right of first refusal to be OpenAI’s compute provider.
I wish Google would launch Mac Mini-like devices running their consumer-grade TPUs for local inference. I get that they don't want it to eat into their GCP margins, but it would still get them into consumer desktops that Pixel Books could never penetrate (Chromebooks don't count and may likely become obsolete soon due to MacBook Neo).
> Microsoft will no longer pay a revenue share to OpenAI.
> Revenue share payments from OpenAI to Microsoft continue through 2030, independent of OpenAI’s technology progress, at the same percentage but subject to a total cap.
Had written a blog post on the same a few days back, if anyone's interested in readng (hardly 5 minute read): Can Google Win the AI Hardware Race Through TPUs?
Because they are based on [the west coast of the US](https://en.wikipedia.org/wiki/American_frontier). DeepSeek, Z.ai, Moonhsot, and Mistral are never called frontier because they aren't based in California.
Dont forget Elon, i am sure this news will come up on the up and coming OpenAI vs Elon Musk trail starting soon! I cant wait to hear all the discovery from this trail
Some on this forum will be working for companies with conflicts of interest on the topic, and if an employees words were construed to be the opinions of the company that could be bad for that person.
I was once almost fired for saying a little too much in an HN comment about pentesting. Being dragged into an office and given a dressing-down for posting was quite traumatic.
The central issue (or so they claimed) was that people might misconstrue my comment as representing the company I was at.
So yeah, I don’t understand why people are making fun of this. It’s serious.
On the other hand, they were so uptight that I’m not sure “opinions are my own” would have prevented it. But it would have been at least some defense.
Good on you. I’m happy to hear you got out of that kind of environment. It’s soul-draining.
Also a relief to hear that other people had to deal with this nonsense. I was afraid the reaction would be “there’s no way that happened,” since at the time I could hardly believe it either.
it's like people are LARPing a Fortune company CEO when they're giving their hot takes on social media
reminds me of Trump ending his wild takes on social media with "thank you for your attention to this matter" - so out of place, it makes it really funny
> it's like people are LARPing a Fortune company CEO when they're giving their hot takes on social media
At least in large tech companies, they have mandatory social media training where they explicitly tell employees to use phrases like "my views are my own" to keep it clear whether they're speaking on behalf of their employer or not.
Why would they be speaking on behalf of their employer? That is what would need a disclaimer not the common case. Besides, he can put it one time in his profile, not over and over again in every comment like he does. There is no expectation that some random employee is a spokesperson for Google on tech message board comment threads. It's just a way to brag.
> Why would they be speaking on behalf of their employers?
Disclaimers aren’t there for folks who are thinking and acting rationally.
They are there for people who are thinking irrationally and/or manipulatively.
There are (relatively speaking) a lot of these people. They can chew up a lot of time and resources over what amounts to nothing.
Disclaimers like this can give a legal department the upper hand in cases like this
A few simple examples:
- There is a person I know who didn’t renew the contract of one of their reports. Pretty straightforward thing. The person whose contract was not renewed has been contesting this legally for over 10 years. The outcome is guaranteed to go against the person complaining, but they have time and money, so they tax the legal team of their former employer.
- There is a mid-sized organization that had a small legal team that had its plate full with regular business stuff. Despite settlements having NDAs, word got out that fairly light claims of sexual harassment and/or EEO complaints would yield relatively easy five-figure payments. Those complaints exploded, and some of the complaints were comical. For example, one manager represented a stance for the department to the C-suite that was 180 degrees opposite of what the group of three managers had agreed to prior. Lots of political capital and lots of time had to be used to clean up that mess. That person’s manager was accused of sex discrimination and age discrimination simply for asking the person why they did that (in a professional way, I might add). That person got a settlement, moved to a different department, and was effectively protected from administrative actions due to it being considered retaliation.
Sounds like the company in the latter example really screwed up, but how does that connect to disclaimers? Is it just an example of malicious behavior?
> Sounds like the company in the latter example really screwed up
Interesting. I think they made an unfortunate but sound decision based on their circumstances.
> but how does that connect to disclaimers?
It doesn’t directly.
> Is it just an example of malicious behavior?
Yes. It’s an example of how absolutely bat-shit crazy people can behave in ways that can tax a company’s legal team. Having folks use a disclaimer will almost certainly lighten some of this load in terms of defending against folks who weaponize online comments made by employees.
Exactly. There is no scenario where we should expect some random anon to be speaking for Google. When that is the case a disclaimer is warranted, not the common case of speaking for oneself. He can write it once in his profile if he's so worried about it, not every other comment like he does. It's just inflated self importance
I am not sure what context Jensen said that. But midjourney uses tpu. Apple uses tpu. They are no other frontier labs that use it, but Google + Anthropic is 2 out of 3 frontier lab so.....
You could reasonably say that "A majority of frontier labs uses TPU to train and serve their model."
Mayhaps! But I think as far as google, anthropic[1] and apple[2] goes, they do use the tpus for training. Ofc v4 and v5 (older generations of tpus) were more specialized for search related embedding workloads and i could see people not using them for training.
[1]: We train and run Claude on a range of AI hardware—AWS Trainium, Google TPUs - April 6th, Anthropic on Google and Broadcom partnership
[2]: "[Apple foundation model]... builds on top of JAX and XLA, and allows us to train the models with high efficiency and scalability on various training hardware and cloud platforms, including TPUs and both cloud and on-premise GPUs" - Apple in 2024
He's been saying whatever is good for Nvidia for years now without any regard for truth or reason. He's one of the least trustworthy voices in the space.
Jensen hallucinates more than any llm, he just speaks without thinking all that much about what he says and he generalizes a lot. Trying to hold him accountable to imprecisions and gross simplifications is just going to frustrate whoever tries without changing one bit of his behavior.
This is the same guy who said OpenClaw was the most important software release ever. Statements like this make me question how technically competent these tech CEOs are
They're at the frontier of last year. They compete with Opus 4.5. They don't yet compete with current frontier models.
They'll presumably catch up, there is no monopoly on talent held by the US. And, that's more true than ever now that the US is actively hostile to immigrants. Scientists who might have come to the US three years ago have little reason to do so now.
Since Gemini 3.1 Pro is considered to be at frontier and GLM 5.1 does better than it in coding benchmarks it would be fair to say GLM 5.1 is a frontier model.
Nit: scientists have the same reasons to do so now, the same as ever. They just have additional reasons to not do so.
But even that distinction is only temporary, since we're determined to piss away any remaining research lead that draws people in.
Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".
> Hopefully the next administration will work at actively reversing the damage, with incentives beyond just "we pinky-promise not to haul you at gunpoint to a concrete detention center and then deport you to Yemen".
Won't be enough to undo the damage. The US would have to do a full about face, prosecute crimes of the current administration and enact serious core reforms to make it impossible for things to drastically change again in 4 years. Also known as, never going to happen because even the current opposition party doesn't actually want structural change. The world has seen how bad the US can get from a single election, and that isn't changing any time soon.
It's kind of hard to say this unless you go out of your way - the scaffolding for interacting with the raw model is a lot better now for many tasks. Is it that 4.7 is so much better than 4.5 or claude 1.119 is so much tuned to squeeze utility out of the LLM despite the hallucinations and lack of self awareness etc. Certainly the current products are great, but I think it's hard to separate the two things, the raw model and the agent workflow constraining the model towards utility.
That deal is a win-win for Google. If they develop a better coding model than Anthropic and beat them at coding, then they win. If they don’t, they still win by making a ton of money from Anthropic long term.
Well, it's a lose for Google if all the money disappears into thin air - but I agree that it's mostly upsides for them because of how (relatively) small the investment is for this much upside.
For agentic work, both Gemini 3.1 and Opus 4.6 passed the bar for me. I do prefer Opus because my SIs are tuned for that, and I don't want to rewrite them.
But ChatGPT models don't pass the bar. It seems to be trained to be conversational and role-playing. It "acts" like an agent, but it fails to keep the context to really complete the task. It's a bit tiring to always have to double check its work / results.
I find both Opus 4.6 and GPT-5.4 have weaknesses but tend to support each other. Someone described it to me jokingly as "Claude has ADHD and Codex is autistic." Claude is great at doing something until it gets done and will run for hours on a task without feedback, Codex is often the opposite: it will ask for feedback often and sometimes just stop in the middle of a task saying it's done with step 1 of 5. On the other hand, Codex is a diligent reviewer and will find even subtle bugs that Claude created in its big long-running "until its done" work mode.
Seems like the diagnoses are backwards, in this case. Claude usually stays on task no matter what, but lately Opus 4.6 is showing signs of overuse. I never used to get overload/internal server error messages, but I've seen about a half-dozen of them today alone. And it has been prone to blowing off subtasks that I'd have expected it to resolve.
The difference between physical and software controls is pretty straightforward. (Closed source) software controls are just asking politely. Physical controls are making it so.
May be but corporate greed for data surveillance occasionally surpasses the needs of state actors so much that they'll voluntarily invent features on the same level.
Google/Apple can in the future announce a "safety" feature that periodically announce Airtag'ish Bluetooth beacons even when in Airplane or powered off mode.
Comes down to the model of trust as much as any threat model. I don't need to perceive or even conceive of a specific threat to want to safeguard my privacy, even aggressively if I so desire.
Airplane mode doesn't actually turn everything off, and it's not a guarantee like a faraday cage is. There are instances of devices still actually transmitting "stuff" in airplane mode, or having airplane mode silently disabled. A faraday cage is absolute, assuming correctly constructed, in that it guarantees zero signal can get in AND out. There's no ambiguity (again, assuming correct construction).
Wouldn’t turning the phone off accomplish the same thing?
All of the bags and boxes others have posted make the phone unusable, so turning it off seems just as well. Plus, if you leave the phone on, without airplane mode, in a faraday cage, it’s going to die rather quickly while it searches for a signal.
Believe it or not, no - modern phones are rarely genuinely “off.” iOS has some form of “find my” that still works, I’m not sure where android’s at with it, but “off” and “not transmitting” are genuinely not the same thing at this point.
Settings for WiFi, turn off; settings for Bluetooth, turn off; does turn them off.
This change was the result of years of supporting people meaning to turn it off in one place or one day, then confused it didn't work at home or the next day.
So now airplane mode is a conditional off (the state machine across cellular, wifi, and bluetooth is pretty decent at doing a reasonable set of toggles), while the settings are off off.
The near field stuff seems to remain available; full shutdown turns that off too.
It used to turn them off on the iPhone, but not these days.
A lot of planes have WiFi, and people are also using Bluetooth headphones. So when talking about being a passenger on an airplane, this seems like a rather practical choice.
Not for my Android phone, at least not by default (Pixel 9a a/ GrapheneOS). It leaves Bluetooth and WiFi on in airplane mode. I doubt this is specific to GrapheneOS and may say more about AOSP.
Not if you were connected to either when turning on airplane mode. Both can also be turned on manually in airplane mode. A physical block avoids any uncertainty or mistakes.
I've also tried Gemini 3 for Clues by Sam and it can do really well, have not seen it make a single mistake even for Hard and Tricky ones. Haven't run it on too many puzzles though.
And this is exactly why I only shop at Costco. While other retailers try to get me to buy more stuffs, Costco try to make sure I'm satisfied enough that I'll renew my yearly membership (their main profit source). The incentive structure aligns very well.
Buying in bulk is about having the ability to both afford next week’s food this week and have the means to store it. Not to mention the annual subscription.
Responding to a comment about dollar stores preying on the poor with, “that’s why I shop at Costco” is… a choice.
The fact that the strategic wedge with which a successful, relatively socially-positive business manages to sustain itself isn't universally accessible doesn't negate its value.
The Venn diagram between people who shop at dollar stores and people who shop at Costco isn't empty.
> the ability to both afford next week’s food this week
At minimum that's everyone on a normal paycheck, paid every two weeks. There are situations where someone couldn't get together a few days' pay at once, but that's a tiny fraction of situations.
And the means to store most food is a two foot square of space in a room somewhere. And then most of the rest fits in the empty fridge space you already have.
And there are deals there that can be useful for your wallet right away. This isn't something where you have to put up a ton of money for months before you benefit.
The biggest issue is probably that costco isn't easy to get to.
This is true, but a valuable - and damning - observation that this variation in business model, that seems to be both decent and profitable, is so rare
Time to go and acquire necessary food stuff is not a luxury in any reasonable framing. What is the alternative, eating drive-thru every day or having Instacart deliver overpriced groceries?
I believe eating food from street vendors was the usual way for paupers until quite recently. Recall that it was common to rent a bed for a few hours and share it with someone who worked different shifts.
Indeed. And I say this as Costco member. There are lot of factors that make Costco memberships work. And a lot of people won't be able to make much benefit out of Costco membership.
I say this as someone who admires their business model and how they treat customers & employees: your typical Costco experience is drive to the suburbs, spend $500 and load up your car with nice to have food products and discretionary purchases. Poorer people cannot do any of these things.
They don't require everyone to own a car. At the very least, they can run an efficient delivery service. And there's got to be a way to make a 3 hour rental or single taxi drive once a month much cheaper than owning a car.
>While other retailers try to get me to buy more stuffs, Costco try to make sure I'm satisfied enough that I'll renew my yearly membership (their main profit source). The incentive structure aligns very well.
This doesn't make any sense. Costco makes a profit on the goods sold as well. They have every incentive to sell you as much stuff as possible. That's why they also engage in the usual retail tactics to increase sales, like having the essentials all the way in the back of the store, and putting the high margin items (electronics and jewelry) in the front. They might practice a more cuddlier form of capitalism than dollar general, but they're still a for profit retail business.
I see you're not terribly familiar with Costco. Membership fees account for the vast majority of net operating income for Costco and they keep markups on items at no more than 14% over cost (15% for Kirkland brand).
So yes, Costco does make most of its profit by ensuring customers are happy and continue to renew their memberships every year.
>Membership fees account for the vast majority of net operating income for Costco
This is financially illiterate because you're mixing revenue ("membership fees") with profit ("net operating income"). While it might be tempting to assume that membership fees is pure profit for them, it's not, because people only buy memberships because they're useful for something (ie. shopping at their stores). Therefore you can't strip that out from the other costs associated with operating a chain of warehouses.
It’s kind of a meme; Costco’s profits are almost exactly the same as their total revenue from membership fees, which leads people to think that the warehouses run at zero margin and the fees are their only profit source. The fees certainly give them room to run the sales at extremely low margins (though large grocers like Kroger only have something like 3% margins), but it wouldn’t take a huge shift in purchasing patterns to change this coincidence. If all the people who don’t use their membership that much dropped them and those who use them were all large-scale buyers, they would have to increase their prices just to give themselves a bit of cushion.
>It seems to amount to a similar principle, that their business model depends on repeat customers, and would fail if they lost trust.
You think dollar general is making $37.9B (in 2023) of annual revenue from one-off customers? Unless you're operating a tourist trap, or some sort of business that people only need a few times in their lifetimes (eg. real estate agents), most businesses rely on repeat customers.
Dollar stores around here pop up in small towns, killing off any locally-owned competition, and are far enough away from the big chains to mean they can charge quite a bit more while offering terrible service.
No, I think they have other advantages that are less attractive to me.
I have money to buy in bulk, care about quality, and am willing to make longer trips to stock up. The membership is trivial relative to annual groceries.
I think think the target market for dollar stores is the opposite
Oh gee aren't you cute. Costco's entire cost of doing business are covered purely by their sale of goods. The memberships are pure icing on the cake. It's not wrong to look at it this way.
I don't think you're wrong but I think you're being too quick to attack the gp. They're not wrong either. The point you brought up doesn't contradict theirs but adds nuance.
I'm all for nuance. Its also why I'm biased towards long form media as it's more likely to contain nuance, but not guaranteed. The gps specific example of lectures is quite narrow and more likely to have depth. Which is the entire problem of short form media, that we live in a complex world where we can't distill everything into 1-2 minute segments. Hell, even a lecture series, which will be over 10hrs of content is not enough to make one an expert on all but the most trivial of topics.
You're right that we need nuance but you're not right in arguing for it while demonstrating a lack of it. A major issue is we need to communicate, something we're becoming worse at. We should do our best to speak and write as clearly as possible but at the end of the day language is so imprecise that a listener or reader will be able to construct many, and even opposing, narratives. It is more important to be a good listener than a good speaker. I'd hope programmers, of all people, could understand this as we've invented overly pedantic languages with the explicit purpose of minimizing ambiguity[0]
VSCode is Electron based which, yes, is based on Chromium. But the page you link to isn't about that, its about using VSCode as dev environment for working on Chromium, so I don't know why you linked it in this context.
Which came from "the KDE HTML Widget" AKA khtmlw. Wonder if that's the furthest we can go?
> if all that effort stayed inside the KDE ecosystem
Probably nowhere, people rather not do anything that contribute to something that does decisions they disagree with. Forking is beautiful, and I think improves things more than it hurts. Think of all the things we wouldn't have if it wasn't for forking projects :)
On the other hand if that had stopped google from having a browser they push into total dominance with the help of sleazy methods, maybe that would have been better overall.
I still prefer a open source chromium base vs a proprietary IE (or whatever else) Web Engine dominating.
(Fixing IE6 issues was no fun)
Also I do believe, the main reason chrome got dominance is simply because it got better from a technical POV.
I started webdev on FF with firebug. But at some point chrome just got faster with superior dev tools. And their dev tools kept improving while FF stagnated and rather started and maintained u related social campaigns and otherwise engaged with shady tracking as well.
> I still prefer a open source chromium base vs a proprietary IE (or whatever else) Web Engine dominating.
Okay but that's not the tradeoff I was suggesting for consideration. Ideally nothing would have dominated, but if something was going to win I don't think it would have been IE retaking all of firefox's ground. And while I liked Opera at the time, that takeover is even less likely.
> Also I do believe, the main reason chrome got dominance is simply because it got better from a technical POV.
Partly it was technical prowess. But google pushing it on their web pages and paying to put an "install chrome" checkbox into the installers of unrelated programs was a big factor in chrome not just spreading but taking over.
> And their dev tools kept improving while FF stagnated and rather started and maintained u related social campaigns and otherwise engaged with shady tracking as well.
Since when you don't touch Firefox or try the dev tools ?
I use FF for browsing, but every time I think of starting dev tools, maybe even just to have a look at some sites source code .. I quickly close them again and open chrome instead.
I wouldn't know where to start, to list all the things I miss in FF dev tools.
The only interesting thing for me they had, the 3D visualizer of the dom tree, they stopped years ago.
We might not have had Mozilla/Phoenix/Firefox in the first place if so either, who I'd like to think been a net-positive for the web since inception. At least I remember being saved by Firefox when the options were pretty much Internet Explorer or Opera on a Windows machine.
> they push into total dominance with the help of sleazy methods
Ah, yes. The famously sleazy "automatic security updates" and "performance."
It is amazing how people forget what the internet was like before Chrome. You could choose between IE, Firefox, or (shudder) Opera. IE was awful, Opera was weird, and the only thing that Firefox did better than customization was crash.
Now everyone uses Chrome/WebKit, because it just works. Mozilla abandoning Servo is awful, but considering that Servo was indirectly funded by Google in the first place... well, it's really hard to look at what Google has done to browsing and say that we're worse off than we were before.
> Both are based on khtml. We could be living in a very different world if all that effort stayed inside the KDE ecosystem
How so?
Do you think thousands of googlers and apple engineers could be reasonably managed by some KDE opensource contributors? Or do you imagine google and apple would have taken over KDE? (Does anyone want that? Sounds horrible.)
I think they meant we wouldn’t have had Safari, Chrome, Node, Electron, VSCode, Obsidian? Maybe no TyeScript or React either (before V8, JavaScript engines sucked). The world might have adopted more of Mozilla.
that's a bit misleading. it was based on webcore which apple had forked from khtml. however google found apple's addition to be a drag and i think very little of it (if anything at all, besides the khtml foundation) survived "the great cleanup" and rewrite that became blink. so actually webkit was a just transitional phase that led to a dead end and it is more accurate to say that blink is based on khtml.
I wouldn't say the Vietnamese alphabet is "transliteration". Vietnamese is one of the most, if not the most tonal language in the world. The same word, speaking with different tones will convey different meanings.
The modern Vietnamese alphabet was developed in 17th century (so it's not a transliteration) with tonal marks as a core feature. The writing language is very phonetic. Within a region with similar accent, if you hear a word, you can write it. And if you see a word, you can pronounce it.
The tonal marks are very important to the language. It allows for rich poetic rules that makes Vietnamese poem fun and musical to read:
Yes, I had never looked into this and had assumed Vietnamese uses a Chinese-inspired writing system natively, like other languages in the region. Knowing that this is the only writing system immediately made sense of why this is necessary.
Ehm, like in Vietnam's neighbors Laos (ພາສາລາວ) and Cambodia (ខ្មែរ)? Sure Vietnamese used to (a long time ago) be written in its own version of the Chinese script, I'll give you that. But most languages in the region do not use a script derived from Chinese.
One explanation could be: many humans (including me) mistakenly think a seahorse emoji exists. My mind can even construct a picture of how it should look like, despite me also knowing it's very unlikely I've seen one myself.
I mean, its not like emojis were always standardized. It is completely possible that there was a "emoji" or "emoticon" of a seahorse in a messaging application. I wouldn't be so quick to accept that your memory is incorrect.
Slack has a :seahorse: reacji, and is what I was picturing; I frequently try to use emoji that turn out to be reacji-exclusive (or reacji in the wrong workspace that I learn that way aren't Slack defaults) - I wonder if those insisting it exists are thinking of that.
Oh or Snapchat/TikTok/Instagram video/etc.? I think I've seen clips of whichever of those with overlaid stuff like seahorses.
Yeah, this seems more plausible to me. False memories and mass delusions are absolutely real, but if this is one, I'd like to know how it started and why it is so specific.
E.g. no one seems to be misremembering a sea cucumber emoji or anglerfish emoji - but there are other alleged emojis such as swordfish or bandit/bank robber, where people have the same reaction:
It would be interesting to see if LLM behavior is also similar. E.g. if you asked about an anglerfish emoji, would they straight-up tell you it doesn't exist, but for swordfish would start to spiral?
Would be interesting to read that proposal, as "usage level"[1] and "compatibility with existing systems"[2] are both factors that the emoji working group officially considers for new proposals.
So if the proposal includes one or both of those sections, that could shed some light on possible former usage in "proprietary" software.
Unfortunately, I don't see the actual proposal accessible anywhere.
This subreddit makes me so uneasy, so many people thinking that they remembered something and won't take "no this never happened" for an answer. Humans hallucinate like LLMs in fact! ;)
It makes me rather excited! Maybe there are some easy "memory illusion" tricks waiting out there somewhere to be discovered. (I am strongly pessimistic regarding the future of humanity overall, and I think we are all doomed (me, and everyone else). So I think someone playing a memory illusion in a radio would be rather neat, a new fact about us humans, and not something that I'm scared of.)
I meant more the denying reality aspect of the subreddit. There are some users there that go straight up into "someone must have altered the timeline" territory because they insist they are right.
I think you're conflating between 2 different things: the USD and US stocks from US companies.
- The USD is definitely losing value. That also means stocks from US companies would be cheaper from a foreigner's point of view.
- That means it represents good investment opportunity as long as the fundamentals of those companies are not affected too much (e.g. AI companies not directly affected by workers' raid, or pay tarrifs). Nothing is contradictory here.
> - The USD is definitely losing value. That also means stocks from US companies would be cheaper from a foreigner's point of view.
That's actually not quite right. You can only buy securities on US markets with US dollars. You'd have to buy dollars on the money market to make that trade. So to the extent that "cheaper dollars" are driving investment in dollar-valued securities, they're increasing the value of the dollar on the global market by the same amount.
All markets seek toward efficiency. The situation you posit would be subject to a money-printing arbitrage loop if it actually existed.
I think the biggest winner of this might be Google. Virtually all the frontier AI labs use TPU. The only one that doesn't use TPU is OpenAI due to the exclusive deal with Microsoft. Given the newly launched Gen 8 TPU this month, it's likely OpenAI will contemplate using TPU too.
reply