You are forgetting about how with the multiple high quality open weights models, we are gonna quickly/(already have?) reached the point where using completely local models will make sense.
If the writer of the (grandparent comment?)/ (the person who wrote about the philipines secretary is reading this), I would love it if you can do me a simple task and instead of having them use the SOTA models for the stuff for which they are using AI right now, they use an open source model (even an tiny to mid model) and see what happens.
> "My assistant in the phillipines has used it to substantially improve her communications, for instance."
So if they are using it for communications, Honestly even a small-mid model would be good for them.
Please let me know how this experiment goes. I might write about it and its just plain curiosity to me but I would honestly be 99% certain that the differences would be so negligible that using SOTA or even remotely hosted AI datacenter models wont make much sense but of course we can say nothing without empirical evidence which is why I also asked you to conduct my hypothesis.
> You are forgetting about how with the multiple high quality open weights models, we are gonna quickly/(already have?) reached the point where using completely local models will make sense.
I'm not, since I'm a heavy user of local models myself, and even with the beast of a card I work with locally daily (RTX Pro 6000), the LLMs you can run locally are basically toy models compared to the hosted ones. I think, if you haven't already, you need to give it a try yourself to see the difference. I didn't mention or address it, as it's basically irrelevant because of the context.
And besides that, how affordable how GPUs today in the developing world? Electricity costs? How to deal with thermals? Frequent black outs? And so on, many variables you seemingly haven't considered yet.
Best way of making the difference between hosted models and local modals is to run your own private benchmarks against both of them and compare. Been doing this for years, and still local models are nowhere near the hosted ones, sadly. I'm eager for the day to come, but it will still take a while.
I’ve got a Max M3 with 64 GB ram and can run more than just toy models, even if they are obviously less than hosted ones. Honestly, I think local LLMs are the future and we are just going to be doing hosted until hardware catches up (and now they have something to catch up to!).
> Honestly, I think local LLMs are the future and we are just going to be doing hosted
Same here, otherwise I wouldn't be investing in local hardware :) But I'd be lying if I said I think it's ready for that today. I don't think the hardware as much to catch up with, it's the software that has a bunch of low hanging fruits available for performance and resource usage, since every release seems to favor "time to paper" above all else.
There are lots of things you can do on local hardware already, and you don’t have to worry about safeguards or token limits. There are lots of crazy models, especially Chinese ones, that have a lot of capabilities and aren’t just there for academic papers.
Again, put those under test with your private benchmarks, then compare the results with hosted models.
I'm not saying it's completely useless, or that I don't think it won't be better in the future. What I am saying is that even the top "weights available" models today really don't come close to today's SOTA. This is very clear when you have benchmarks to get hard concrete numbers that aren't influenced by public benchmarking data.
> even the top "weights available" models today really don't come close to today's SOTA.
This is the statement thatI'm disagreeing with. They do come close, even if they are somehow less, it is a fixed distance away where the hosted models aren't more than a magnitude better. Hosted models are still better, just not incredibly so.
I 100% agree with your comment. That yes, we should test the models on our own private benchmarks and there is no denying that local has a long way to go.
I was just proposing that local feels the most sustainable way for things to go, perhaps even an API Openrouter like things but you can read my other comment on how I found their finances to be loss/zero profit so its good while it lasts (on the AI bubble) if one person needs it but long term I feel like its prices are gonna rise whereas local would still remain stable (Also worth mentioning that there is no free lunch so I think the losses would be distributed to everybody in the form of the financial crisis caused by AI, I hope that the impacts of the financial crisis lessens because I am genuinely worried about the crisis more so at this point.
Agreed, I myself understand that right now using these AI bubble fuel money sponsored might make sense (look at my other comments where I went into the rabbithole on how most companies are losing/zero profitting while investing billions)
Although these aren't sustainable, the one idea where it makes sense is that we transition to a local model based (which yes I know are inefficient) but the inevitability in my opinion if the bubble bursts but there are definitely some steal deals nowadays if one wishes.
Also You may have understood me wrong in this comment (if so my apologies) in the sense that what I was mentioning was for the secretary use-case and not a company (using AI?)/selling Ai related services which need 24x7 access
One wouldn't have to worry about Blackouts because if your secretary's house is blacked out, lets just be honest that AI won't turn the magic lights on
Also the machines in our devices are beast. I am pretty sure that for basic communication tasks as the grandfather comment suggested, we are very likely that even the "toy" models as you say would be "good enough"
So yea, a tiny model can even run locally (perhaps even on the phone)can solve her use case so the moat for AI companies is close to zero (as expected)
This is what I was trying to say actually, thanks for responding.
That being said, the original point about Americans/Europeans does become a bit moot after this discovery because the fact is, I don't think that most are against small models but rather the SOTA models run in AI centric datacenters which they hate as it actively acts as a tax on them by increasing electricity rates etc. while taking workforce from them
A tiny model on the other doesn't really do all the above. I definitely feel like the concerns of American people are definitely valid for the AI datacenters though so I hope something can be done about it in a timely and helpful manner which brings real help to the average american.
If the writer of the (grandparent comment?)/ (the person who wrote about the philipines secretary is reading this), I would love it if you can do me a simple task and instead of having them use the SOTA models for the stuff for which they are using AI right now, they use an open source model (even an tiny to mid model) and see what happens.
> "My assistant in the phillipines has used it to substantially improve her communications, for instance."
So if they are using it for communications, Honestly even a small-mid model would be good for them.
Please let me know how this experiment goes. I might write about it and its just plain curiosity to me but I would honestly be 99% certain that the differences would be so negligible that using SOTA or even remotely hosted AI datacenter models wont make much sense but of course we can say nothing without empirical evidence which is why I also asked you to conduct my hypothesis.