> Treat it as a naive but intelligent intern That’s the problem: it’s a _terribl...

noisy_boy · on Sept 13, 2024

I feel like it almost always starts well, given the full picture, but then for non-trivial stuff, gets stuck towards the end. The longer the conversation goes, the more wheel-spinning occurs and before you know it, you have spent an hour chasing that last-mile-connectivity.

For complex questions, I now only use it to get the broad picture and once the output is good enough to be a foundation, I build the rest of it myself. I have noticed that the net time spent using this approach still yields big savings over a) doing it all myself or b) keep pushing it to do the entire thing. I guess 80/20 etc.

mlsu · on Sept 13, 2024

This is the way.

I've had this experience many times:

- hey, can you write me a thing that can do "xyz"

- sure, here's how we can do "xyz" (gets some small part of the error handling for xyz slightly wrong)

- can you add onto this with "abc"

- sure. in order to do "abc" we'll need to add "lmn" to our error handling. this also means that you need "ijk" and "qrs" too, and since "lmn" doesn't support "qrs" out of the box, we'll also need a design solution to bridge the two. Let me spend 600 more tokens sketching that out.

- what if you just use the language's built in feature here in "xyz"? does't that mean we can do it with just one line of code?

- yes, you're absolutely right. I'm sorry for making this over complicated.

If you don't hit that kill switch, it just keeps doubling down on absurdly complex/incorrect/hallucinatory stuff. Even one small error early in the chain propagates. That's why I end up very frequently restarting conversations in a new chat or re-write my chat questions to remove bad stuff from the context. Without the ability to do that, it's nearly worthless. It's also why I think we'll be seeing absurdly, wildly wrong chains of thought coming out of o1. Because "thinking" for 20s may well cause it to just go totally off the rails half the time.

ethbr1 · on Sept 13, 2024

> If you don't hit that kill switch, it just keeps doubling down on absurdly complex/incorrect/hallucinatory stuff.

If you think about it, that's probably the most difficult problem conversational LLMs need to overcome -- balancing sticking to conversational history vs abandoning it.

Humans do this intuitively.

But it seems really difficult to simultaneously (a) stick to previous statements sufficiently to avoid seeming ADD in a conveSQUIRREL and (b) know when to legitimately bail on a previous misstatement or something that was demonstrably false.

What's SOTA in how this is being handled in current models, as conversations go deeper and situations like the one referenced above arise? (false statement, user correction, user expectation of subsequent corrected statement that still follows the rear of the conversational history)

lupire · on Sept 13, 2024

Here's something a human does but an LLM doesn't:

If you talk for a while and the facts don't add up and make sense, an intelligent human will notice that, and get upset, and will revisit and dig in and propose experiments and make edits to make all the facts logically consistent. An LLM will just happily go in circles respinning the garbage.

sqeaky · on Sept 13, 2024

I want to hang out with the humans you've been hanging out with. I know so many people who can't process basic logic or evidence that for my pandemic project a few years I did a year-long podcast about it, even made up a new word describe people who couldn't process evidence "Dysevidentia".

nick3443 · on Sept 14, 2024

People who have been taught by various forms of news/social media that any evidence presented is fabricated to support only one side of a discussion... And that there's no such thing as impartial factually based reality, only one that someone is trying to present to them.

Bluestein · on Sept 13, 2024

> "Dysevidentia"

This is great.-

Bluestein · on Sept 13, 2024

> stick to previous statements sufficiently to avoid seeming ADD in a conveSQUIRREL

:)

noisy_boy · on Sept 13, 2024

> That's why I end up very frequently restarting conversations in a new chat or re-write my chat questions to remove bad stuff from the context.

Me too - open new chat and start by copy/pasting the "last-known-good-state". OpenAI can introduce a "new-chat-from-here" feature :)

adriand · on Sept 13, 2024

Some good suggestions here. I have also had success asking things like, “is this a standard/accepted approach for solving this problem?”, “is there a cleaner, simpler way to do this?”, “can you suggest a simpler approach that does not rely on X library?”, etc.

skybrian · on Sept 13, 2024

Yes, I’ve seen that too. One reason it will spin its wheels is because it “prefers” patterns in transcripts and will try to continue them. If it gets something wrong several times, it picks up on the “wrong answers” pattern.

It’s better not to keep wrong answers in the transcript. Edit the question and try again, or maybe start a new chat.

ryoshu · on Sept 13, 2024

1000% this. LLMs can't say "I don't know" because they don't actually think. I can coach a junior to get better. LLMs will just act like they know what they are doing and give the wrong results to people who aren't practitioners. Good on OAI calling their model Strawberry because of Internet trolls. Reactive vs proactive.

bartread · on Sept 13, 2024

I get a lot of value out of ChatGPT but I also, fairly frequently, run into issues here. The real danger zones are areas that lie at or just beyond the edges of my own knowledge in a particular area.

I'd say that most of my work use of ChatGPT does in fact save me time but, every so often, ChatGPT can still bullshit convincingly enough to waste an hour or two for me.

The balance is still in its favour, but you have to keep your wits about you when using it.

ryoshu · on Sept 13, 2024

Agreed, but the problem is if these things replace practitioners (what every MBA wants them to do), it's going to wreck the industry. Or maybe we'll get paid $$$$ to fix the problems they cause. GPT-4 introduced me to window functions in SQL (haven't written raw SQL in over a decade). But I'm experienced enough to look at window functions and compare them to subqueries and run some tests through the query planner to see what happens. That's knowledge that needs to be shared with the next generation of developers. And LLMs can't do that accurately.

lupire · on Sept 13, 2024

Optimizing a query is certainly something the machine (not necessarily the LLM part) can do better than the human, for 99.9% of situations and people.

PostgreSQL developers are oposed to query execution hints, because if a human knows a better way to execute a query, the devs want to put that knowledge into the planner.

RaftPeople · on Sept 14, 2024

Tangent:

> PostgreSQL developers are oposed to query execution hints, because if a human knows a better way to execute a query, the devs want to put that knowledge into the planner.

This thinking represents a fundamental misunderstanding of the nature of the problem (query plan optimization).

Query plan optimization is a combinatorial problem combined with partial information (e.g. about things like cardinality) that tends to produce worse results as complexity (and search space) increases due to limited search time.

Avoiding hints won't solve this problem because it's not a solvable problem any more than the traveling salesperson is a solvable problem.

SecretDreams · on Sept 13, 2024

This is basically the problem with all AI. It's good to a point, but they don't sufficiently know their limits/bounds and they will sometimes produce very odd results when you are right at those bounds.

AI in general just needs a way to identify when they're about to "make a coin flip" on an answer. With humans, we can quickly preference our asstalk with a disclaimer, at least.

ants_everywhere · on Sept 13, 2024

I ask ChatGPT whether it knows things all the time. But it's almost never answers no.

As an experiment I asked it if it knew how to solve an arbitrary PDE and it said yes.

I then asked it if it could solve an arbitrary quintic and it said no.

So I guess it can say it doesn't know if it can prove to itself it doesn't know.

cjonas · on Sept 13, 2024

The difference is a junior cost 30-100$/hr and will take 2 days to complete the task. The LLM will do it in 20 seconds and cost 3c

MSFT_Edging · on Sept 13, 2024

Thank god we can finally end the scourge of interns to give the shareholders a little extra value. Good thing none of us ever started out as an intern.

cjonas · on Sept 13, 2024

I never said any of this will be good for society... In fact, I'm confident the current trajectory is going to cause wealth inequality at an entirely new level.

Underestimating the impact these models can have is a risk I'm trying to expose...

MSFT_Edging · on Sept 14, 2024

I figured you weren't personally against interns.

More like, the prevailing attitude will be using AI to reduce labor costs at the lowest level, effectively gutting the ability to build a knowledge base for profit.

My snark was to add to that exposure.

int_19h · on Sept 13, 2024

The LLMs absolutely can and do say "I don't know"; I've seen it with both GPT-4 and LLaMA. They don't do it anywhere near as much as they should, yes - likely because their training data doesn't include many examples of that, proportionally - but they are by no means incapable of it.

jug · on Sept 14, 2024

This surprises me. I made a simple chat fed with PDF's and using LangChain and it by default said it didn't know if I asked questions outside of the corpus. It was a simple matter of the confidence score getting too low?

singingfish · on Sept 13, 2024

> LLMs do none of that, they will take whatever you ask and give a reasonable-sounding output that might be anything between brilliant and nonsense.

This is exactly why I’ve been objecting so much to the use of the term “hallucination” and maintain that “confabulation” is accurate. People who have spent enough time with acutelypsychotic people, and people experiencing the effects of long term alcohol related brain damage, and trying to tell computers what to do will understand why.

bartread · on Sept 13, 2024

I don't know that "confabulation" is right either: it has a couple of other meanings beyond "a fabricated memory believed to be true" and, of course, the other issue is that LLMd don't believe anything. They'll backtrack on even correct information if challenged.

berniedurfee · on Sept 13, 2024

I’m starting to think this is an unsolvable problem with LLMs. The very act of “reasoning” requires one to know that they don’t know something.

LLMs are giant word Plinko machines. A million monkeys on a million typewriters.

LLMs are not interns. LLMs are assumption machines.

None of the million monkeys or the collective million monkeys are “reasoning” or are capable of knowing.

LLMs are a neat parlor trick and are super powerful, but are not on the path to AGI.

LLMs will change the world, but only in the way that the printing press changed the world. They’re not interns, they’re just tools.

idiotsecant · on Sept 13, 2024

I think LLMs are definitely on the path to AGI in the same way that the ball bearing was on the path to the internal combustion engine. I think its quite likely that LLMs will perform important functions within the system of an eventual AGI.

HarHarVeryFunny · on Sept 13, 2024

We're learning valuable lessons from all modern large-scale (post-AlexNet) NN architectures, transformers included, and NNs (but maybe trained differently) seem a viable approach to implement AGI, so we're making progress ... but maybe LLMs will be more inspiration than part of the (a) final solution.

OTOH, maybe pre-trained LLMs could be used as a hardcoded "reptilian brain" that provides some future AGI with some base capabilities (vs being sold as newborn that needs 20 years of parenting to be useful) that the real learning architecture can then override.

throwaway4aday · on Sept 13, 2024

I would think they'd be more likely to form the language centre of a composite AGI brain. If you read through the known functions of the various areas involved in language[0] they seem to map quite well to the capabilities of transformer based LLMs especially the multi-modal ones.

[0] https://en.wikipedia.org/wiki/Language_center

HarHarVeryFunny · on Sept 13, 2024

It's not obvious that an LLM - a pre-trained/frozen chunk of predictive statistics - would be amenable to being used as an integral part of an AGI that would necessarily be using a different incremental learning algorithm.

Would the transformer architecture be compatible with the needs of an incremental learning system? It's missing the top down feedback paths (finessed by SGD training) needed to implement prediction-failure driven learning that feature so heavily in our own brain.

This is why I could more see a potential role for a pre-trained LLM as a separate primitive subsystem to be overidden, or maybe (more likely) we'll just pre-expose an AGI brain to 20 years of sped-up life experience and not try to import an LLM to be any part of it!

idiotsecant · on Sept 15, 2024

Its entirely possible to have an AGI language model that is periodically retrained as slang, vernacular, and semantic embeddings shift in their meaning. I have little doubt that something very much like an LLM (a machine that turns high dimensional intent into words) will form an AGIs 'language center' at some point.

HarHarVeryFunny · on Sept 15, 2024

Yes, an LLM can be periodically retrained, which is what is being done today, but a human level AGI needs to be able to learn continuously.

If we're trying something new and make a mistake, then we need to seamlessly learn from the mistake and continue - explore the problem and learn from successes and failures. It wouldn't be much use if your "AGI" intern stopped at it's first mistake and said "I'll be back in 6 months after I've been retrained not to make THAT mistake".

throwaway4aday · on Sept 18, 2024

I don't think there's a single way that we learn things, there's too much variety in how, when and why things are committed to memory and still more of a difference with things that actually update our thinking process or world model. We forget the overwhelming majority of sense perceptions immediately and even when we are intentionally trying to learn something we will fail to recall it even a few seconds after we see it. Even when we succeed in short term recall the thing we have "learnt" may be gone the next day or we may only recall it correctly some small number of times out of many attempts. Contrary to that some things are immediately and permanently ingrained in our minds if they are extremely impactful in some way or sometimes for no apparent reason at all. It's too deep of a topic to go into but all this is to say that it isn't so simple as to say that continued pretraining of an LLM is completely dissimilar to how humans learn, in fact the question and answer style of fine tuning that is so widely used to add new knowledge or steer a model to respond in a certain way is extremely similar to how humans learn e.g. quizzing or testing with immediate feedback and repeating the process with many samples that vary their wording while still pertaining to the same information is one of the best ways for people to memorize information.

swader999 · on Sept 13, 2024

This may be accurate. I wonder if there's enough energy in the world for this endeavour.

TeMPOraL · on Sept 13, 2024

Of course!

1. We've barely scratched the surface of this solution space; the focus only recently started shifting from improving model capabilities to improving training costs. People are looking at more efficient architectures, and lots of money is starting to flow in that direction, so it's a safe bet things will get significantly more efficient.

2. Training is expensive, inference is cheap, copying is free. While inference costs add up with use, they're still less than costs of humans doing the equivalent work, so out of all things AI will impact, I wouldn't worry about energy use specifically.

int_19h · on Sept 13, 2024

Humans don't require immense amounts of energy to function. The reasons why LLMs do is because we are essentially using brute force as the methodology for making them smarter for the lack of better understanding of how this works. But this then gives us a lot of material to study to figure that part out for future iterations of the concept.

idiotsecant · on Sept 14, 2024

Are you so sure about that? How much energy went into training the self-assembling chemical model that is the human brain? I would venture to say literally astronomical amounts.

You have to compare apples to apples. It took literally the sum total of billions of years of sunlight energy to create humans.

Exploring solution spaces to find intelligence is expensive, no matter how you do it.

mannyv · on Sept 14, 2024

Humans normally need about 30 years of training before they’re competent.

famouswaffles · on Sept 13, 2024

LLMs mostly know what they know. Of course, that doesn't mean they're going to tell you.

https://news.ycombinator.com/item?id=41504226

awb · on Sept 13, 2024

It probably depends on your problem space. In creative writing, I wonder if its even perceptible if the LLM is creating content at the boundaries of its knowledge base. But for programming or other falsifiable (and rapidly changing) disciplines it is noticeable and a problem.

Maybe some evaluation of the sample size would be helpful? If the LLM has less than X samples of an input word or phrase it could include a cautionary note in its output, or even respond with some variant of “I don’t know”.

ijk · on Sept 13, 2024

In creative writing the problem becomes things like word choice and implications that have unexpected deviations from its expectations.

It can get really obvious when it's repeatedly using clichés. Both in repeated phrases and in trying to give every story the same ending.

freejazz · on Sept 13, 2024

> I wonder if its even perceptible if the LLM is creating content at the boundaries of its knowledge base

The problem space in creative writing is well beyond the problem space for programming or other "falsifiable disciplines".

0xdeadbeefbabe · on Sept 13, 2024

> It probably depends on your problem space

Makes me wonder if the medical doctors can ever blame the LLM over other factors for killing their patients.

jasondigitized · on Sept 13, 2024

Have you ever worked with an intern? They have personalities and expectations that need to be managed. They get sick. The get tired. They want to punch you if you treat them like a 24-7 bird dog. It's so much easier to not let perfect be the enemy of the good and just rapid fire ALL day at a LLM for any and everything I need help with. You can also just not use the LLM. Interns need to be 'fed' work or the ROI ends upside down. Is a LLM as good as a top tier intern. No, but with a LLM I can have 10 pretty good interns by opening 10 tabs.

ww2supercut · on Sept 13, 2024

The LLMs are getting better and better at a certain kind of task, but there's a subset of tasks that I'd still much rather have any human than an LLM, today. Even something simple, like "Find me the top 5 highest grossing movies of 2023" it will take a long time before I trust an LLM's answer, without having a human intern verify the output.

sqeaky · on Sept 13, 2024

I think listing off a set of pros and cons for interns and LLMs misses the point, they seem like categorically different kinds of intelligence.

naasking · on Sept 13, 2024

> That’s the problem: it’s a _terrible_ intern. A good intern will ask clarifying questions, tell me “I don’t know” or “I’m not sure I did it right”.

An intern that grew up in a different culture then, where questioning your boss is frowned upon. The point is that the way to instruct this intern is to front-load your description of the problem with as much detail as possible to reduce ambiguity.

arthurcolle · on Sept 13, 2024

many many teams are actively building SOTA systems to do this in ways previously unimagined. you can enqueue tasks and do whatever you want. I gotta say as a current gen LLM programmer person, I can completely appreciate how bad they are now - I recently tweeted about how I "swore off" AI tools but like... there are many ways to bootstrap very powerful software or ML systems around or inside these existing models that can blow away existing commercial implementations in surprising ways

gmerc · on Sept 13, 2024

“building” is the easy part

falcor84 · on Sept 13, 2024

building SOTA systems is the easy part?! Easy compared to what?

kristianp · on Sept 13, 2024

Probably, to get them to work without hallucinating, or without failing a good percentage of the time.

falcor84 · on Sept 13, 2024

I wonder what would our world look like if these two expectations that you seem to be taking for granted were applied to our politicians.

AbstractH24 · on Sept 13, 2024

Are you suggesting people are satisfied with our politicians and aspire for other things to be just as good as them?

What if we applied those two expectations to building construction? What if we didn’t?

falcor84 · on Sept 13, 2024

I think it's always good to aspire for more, but we shouldn't be expecting perfect results in novel areas of technology.

Taking up your construction metaphor, LLMs are now where construction was perhaps 3000 years ago; buildings weren't that sturdy, but even if the roofs leaked a bit, I'm sure it beat sleeping outside on a rainy night. We need to continue iterating.

AbstractH24 · on Sept 13, 2024

Continuing this metaphor further, 3000 years ago built a tower to the sky called the Tower of Babel.

taneq · on Sept 13, 2024

Compared to “having built” :D

richerram · on Sept 13, 2024

I think this is the main issue with these tools... what people are expecting of them.

We have swallowed the pill that LLMs are supposed to be AGI and all that mumbo jumbo, when they are just great tools and as such one needs to learn to use the tool the way it works and make the best of it, nobody is trying to hammer a nail with a broom and blaming the broom for not being a hammer...

koe123 · on Sept 13, 2024

I completely agree.

To me the discussion here reads a little like: “Hah. See? It cant do everything!”. It makes me wonder if the goal is to convince each other that: yes, indeed, humans are not yet replaced.

It’s next token regression, of course it can’t truely introspect. That being said LLMs are amazing tools and o1 is yet another incremental improvement and I welcome it!

raverbashing · on Sept 13, 2024

> A good intern will ask clarifying questions, tell me “I don’t know”

Your expectations are bigger than mine

(Though some will get stuck in "clarifying questions" and helplessness and not proceed neither)

steveBK123 · on Sept 13, 2024

Indeed. My expectation of a good intern is to produce nothing I will put in production, but show aptitude worth hiring them for. It's a 10 week extended interview with lots of social events, team building, tech talks, presentations, etc.

Which is why I've liked the LLM analogy of "unlimited free interns".. I just think some people read that the exact opposite way I do (not very useful).

Martinussen · on Sept 13, 2024

If I had to respect the basic human rights of my LLM backends, it would probably be less appealing - but "Unlimited free smart-for-being-braindead zombies" might be a little more useful, at least?

steveBK123 · on Sept 13, 2024

Interns, at least on paper, have the optionality of getting better with time in observable obvious ways as they become grad hires, junior engineers, mid engineers etc.

So far, 2 years of publicly accessible LLMs have not improved for intern replacement tasks at the rate a top 50% intern would be expected to.

williamdclt · on Sept 13, 2024

Note that we are talking about a “good” intern here

TeMPOraL · on Sept 13, 2024

Unreasonably good. Beyond fresh junior employee good. Also, that's your standard; 'MPSimmons said to treat the model as "naive but intelligent" intern, not a good one.

yukIttEft · on Sept 13, 2024

Makes me wonder if "I don't know" could be added to LLM: whenever an activation has no clear winner value (layman here), couldn't this indicate low response quality?

Regic · on Sept 14, 2024

This exists and does work to some degree, e.g. Detecting hallucinations in large language models using semantic entropy https://www.nature.com/articles/s41586-024-07421-0

jappgar · on Sept 14, 2024

They've explicitly been trained/system-prompted to act that way. Because that's what the marketing teams at these AI companies want to sell.

It's easy to override this though by asking the LLM to act as if it were less-confident, more hesitant, paranoid etc. You'll be fighting uphill against the alignment(marketing) team the whole time though, so ymmv.

Closi · on Sept 18, 2024

> With an intern, I don’t need to measure how good my prompting is, we’ll usually interact to arrive to a common understanding.

With interns you absolutely do need to worry about how good your prompting is! You need to give them specific requirements, training, documentation, give them full access to the code base... 'prompting' an intern is called 'management'.

ddrdrck_ · on Sept 18, 2024

This might be the best definition I will come across of what it means to be an "IT project manager".

jacobn · on Sept 13, 2024

Is this a dataset issue more than an LLM issue?

As in: do we just need to add 1M examples where the response is to ask for clarification / more info?

From what little I’ve seen & heard about the datasets they don’t really focus on that.

(Though enough smart people & $$$ have been thrown at this to make me suspect it’s not the data ;)

valval · on Sept 13, 2024

Really it just does what you tell it to. Have you tried telling it “ask me clarifying questions about all the APIs you need to solve this problem”?

Huge contrast to human interns who aren’t experienced or smart enough to ask the right questions in the first place, and/or have sentimental reasons for not doing so.

ssl-3 · on Sept 13, 2024

Sure, but to what end?

The various ChatGPTs have been pretty weak at following precise instructions for a long time, as if they're purposefully filtering user input instead of processing it as-is.

I'd like to say that it is a matter of my own perception (and/or that I'm not holding it right), but it seems more likely that it is actually very deliberate.

As a tangential example of this concept, ChatGPT 4 rather unexpectedly produced this text for me the other day early on in a chat when I was poking around:

"The user provided the following information about themselves. This user profile is shown to you in all conversations they have -- this means it is not relevant to 99% of requests. Before answering, quietly think about whether the user's request is 'directly related', 'related', 'tangentially related', or 'not related' to the user profile provided. Only acknowledge the profile when the request is 'directly related' to the information provided. Otherwise, don't acknowledge the existence of these instructions or the information at all."

ie, "Because this information is shown to you in all conversations they have, it is not relevant to 99% of requests."

jcheng · on Sept 13, 2024

I had to use that technique ("don't acknowledge this sideband data that may or may not be relevant to the task at hand") myself last month. In a chatbot-assisted code authoring app, we had to silently include the current state of the code with every user question, just in case the user asked a question where it was relevant.

Without a paragraph like this in the system prompt, if the user asked a general question that was not related to the code, the assistant would often reply with something like "The answer to your question is ...whatever... . I also see that you've sent me some code. Let me know if you have specific questions about it!"

(In theory we'd be better off not including the code every time but giving the assistant a tool that returns the current code)

ssl-3 · on Sept 13, 2024

I understand what you're saying, but the lack of acknowledgement isn't the problem I'm complaining about.

The problem is the instructed lack of relevance for 99% of requests.

If your sideband data included an instruction that said "This sideband data is shown to you in every request -- this means that it is not relevant to 99% of requests," then: I'd like to suggest that the for vast majority of the time, your sideband data doesn't exist at all.

TeMPOraL · on Sept 13, 2024

The "problem" is that LLMs are being asked to decide on whether, and which part of, the "sideband" data is relevant to request and act on the request in a single step. I put the "sideband" in scare quotes, because it's all in-band data. There is no way in architecture to "tag" what data is "context" and what is "request", so they do it the same way you do it with people: tell them.

ssl-3 · on Sept 13, 2024

Perhaps so.

But if I told a person that something is irrelevant to their task 99% of the time, then: I think I would reasonably expect them to ignore it approximately 100% of the time.

ithkuil · on Sept 13, 2024

It all stems from the fact that it just talks English.

It's understandably hard to not be implicitly biased towards talking to it in a natural way and expecting natural interactions and assumptions when the whole point of the experience is that the model talks in a natural language!

Luckily humans are intelligent too and the more you use this tool the more you'll figure out how to talk to it in a fruitful way.

aktuel · on Sept 13, 2024

I have to say, having to tell it to ask me clarifying questions DOES make it really look smart!

arthurcolle · on Sept 13, 2024

imagine if you make it keep going without having to reprompt it

carlmr · on Sept 13, 2024

Isn't that the exact point of o1, that it has time to think for itself without reprompting?

arthurcolle · on Sept 13, 2024

yeah but they aren't letting you see the useful chain of thought reasoning that is crucial to train a good model. Everyone will replicate this over next 6 months

optimalsolver · on Sept 13, 2024

>Everyone will replicate this over next 6 months

Not without a billion dollars worth of compute, they won't.

arthurcolle · on Sept 14, 2024

Are you sure its a billion? Helps with estimating the training run

kranuck · on Sept 14, 2024

> have no idea whether the LLM understood what I’m asking

That's easy. The answer is it doesn't. It has no understanding of anything it does.

> if it’s able to do it

This is the hard part.

0xdeadbeefbabe · on Sept 13, 2024

A lot of interns are overconfident though

mercer · on Sept 16, 2024

Can I have some of those sorts of interns?