Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Anthropic in particular does this masterfully, you’d think they’d invented Skynet by the way they hand-wring.

As always what matters are actions and evidence, not talk.

 help



I’ll believe Anthropic when they fire everyone making more than the cost of a few GPUs. Until then, it’s just marketing.

When a model can tell funny jokes or write good poetry, that's when I'll be concerned.

No, you'll just say "That's not really very funny," or "That's not very impressive poetry," and nobody will be able to dispute it.

For some time now, at least a year, LLMs have been capable of doing both of these things well enough to fool you.

(Pastebin of my response below, which got nuked for whatever reason: https://pastebin.com/buJBSgiq . Some if not most of them would've fooled me into thinking a human wrote them.)


Okay post a really funny LLM joke about potatoes and post a great piece of LLM poetry about lemons.

I’ll wait. You should be able to do it quickly though since LLMs are so good at it.


CamperBob2 responded with a model comparison of potato jokes and got insta-[dead]'d by an auto filter.

Maybe turn on [show dead] option and / or vouch.


> responded

And the results are just awful.


Hell yeah, no argument there

- but in this case I wouldn't advocate for [dead]ing a mostly AI response as it was exactly what was asked for and it compares AI models when asked for potato based dad jokes.


Of course they're awful, they're jokes about potatoes and poems about lemons.

The question is, can you tell that a machine wrote all of them? If so, how?


Nope I guess can't tell between machine written and mediocre jokes.

Models are structurally biased toward the expected, which is the opposite of what makes a joke land or a poem transcend.


I think you could make that case for poetry but I'm not sure about jokes. Great poems tell us something new or make us feel something new, which is hard to do when the subject is lemons, while jokes work by wedging the familiar into new contexts.

That's why the jokes work somewhat better than the poems here. I genuinely laughed at "Are those chips?" Which came from the model running on my own freakin' GPU.


Yeah I mean I also chuckle at good (or cheap) puns sometimes. But wordplay and puns are the current ceiling of LLMs. Good at them because they're purely structural (pattern-match on phonetics, then swap the meaning). In that bit, there's no buildup, no callbacks, no escalation, no expectations to subvert, no thesis, no perspective.

Grounded, buried, couchy, deep-seated, eyes, baked... It's like a thesaurus!

I feel like human comedians would have to deal with a lot of layered subtleties. They would make the potatoes _serve the bit_ instead of _be the bit_.


Do funny jokes about potatoes or great poetry about lemons exist? If not, can they exist? How do you know?

Gemini 3.1 Pro:

    Potatoes are easily the most intimidating 
    audience for a stand-up comedian. They have 
    dozens of eyes to judge you with, they are 
    almost always baked by the time you get on stage, 
    and if your set really starts to bomb, they are 
    the only ones in the room who are actually looking 
    forward to a roasting!
GPT 5.4:

    Potatoes are the only food we traumatize into greatness.
Qwen 3.5 122B-A10B, running locally on my graphics card:

    (Steps up to the mic, adjusts the stand, 
    and looks around the room)

    Alright everyone, settle down. You know, 
    I've been thinking a lot lately... mostly 
    about my dating life. Or rather, my lack of it.

    See, my last relationship was with a potato.

    And honestly? It was amazing at first. We had 
    so much in common. She was grounded, she loved being 
    buried deep under the covers, and we were perfect 
    together until the heat came on. But then... well, 
    let's just say she started to get a little couchy.

    (Pauses for effect)

    No, wait, that's not it. The real problem was that she 
    had some really deep-seated issues. Literally! She was 
    covered in eyes. Every time we tried to have an intimate 
    conversation, she'd just stare at me with a dozen 
    different expressions. It was weirdly intense. Like, "I 
    see you," "You're suspicious," "Wait, are those chips?"

    Anyway, thanks guys! You've been a great crowd. You're 
    like potatoes to me... you're all baked tonight! 
GPT 5.4:

    Not every light is gentle.
    Some light shines with sharpened edge.

    A lemon keeps the memory of sun
    but teaches it into bitterness,
    as if to say:
    what wakes the mouth
    may also wake the soul.

    We do not ripen only into sweetness.
    Sometimes the truest gold
    is tart.
Opus 4.6:

    A lemon on the table
    is already an argument—
    that yellow can be sour,
    that beauty owes you nothing.
    Peel it and the room
    takes sides.

    The rind remembers being whole.
    The flesh forgives too quickly.
    What's left is yours:
    a stain on wood,
    a sting in the hangnail
    you forgot you had.
Qwen 3.5 122B-A10B:

    To cut it is to break a seal,
    Where bitterness and sweetness dwell.
    A sudden bite, both stinging, sweet—
    The taste of earth beneath our feet.

    No sugar softens all its sting,
    It forces soul to wake and sing.
    A small, fierce orb, yellow as dawn,
    Teaching the tongue how life is born.
Your turn. Go round up some English-lit grad students and give them the same prompts. I can't do any better, myself, but maybe they can.

one of their highlights with mythos was it's ability to generate new puns

I took a look and honestly they're the first AI puns that aren't bad

Times are changing


Trained with the conversations of one million dads and their kids, captured by Amazon Echo.

I'm not sure if this is mythos-specific though. Past models have been great at puns! They do wordplay and puns reasonably well because those are structural.

However, the concepts of comedic timing, subversion of expectations, and emotional punch are kinda contrary to how LLMs work. LLMs are trained to minimize cross-entropy loss. So by construction, they're biased toward the statistically expected.


> Although Claude Opus models largely recycle puns which can be found online, Mythos Preview comes up with decent and seemingly novel ones, often relating to its preferred technical and philosophical topics.

Yes, the system card mentions this, but this is kinda meaningless. It seems like they essentially ran it multiple times and curated a few good ones. Then puffed it up in the marketing copy.

This is made more clear when they attempt to brag about their literal slot machine behavior when finding that kernel crashing bug in OpenBSD.

> Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings. While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can’t know in advance which run will succeed.


Yes, they cannot. But it amuses the oligarchy. Here is Musk linking to Grok jokes. The first one is plagiarized and in the standard joke literature, the second one is an utterly stupid and gross (warning) modification of the first one:

https://xcancel.com/elonmusk/status/2042770839633039635#m

They modify and plagiarize.


I mean, I'm sure they can tell you good jokes... they just won't be _new_ jokes.

Define _new_.

I just think that the difficulty with jokes is the delivery, cadence & setting. Not the actual words.

I'm sure a good comedian can tell a nonsense joke and make "everyone" laugh their heads off.

And I don't get the sense that you are referring to this part of jokes but rather the actual words.


Why are you asking someone to define "new". It means exactly what it appears to mean and exactly what it always means.

Read the sentence and take it literally.

Jesus Christ.


Because I'm actually curious if they mean "new" as in "a new knock-knock joke" (which imo is a quite small step especially if you are allowed to screen all attempts and only publish the ones that work) or as "a new kind of joke or way of telling a joke" (which is a giant step especially if it's told live without pre-screening by a human).

I'm all for dismissing LLMs and the AI-hype but I'm also interested in trying to understand what it means to be human and I think humour is a key aspect.


The jokes I posted in this thread are new, to the best of my knowledge. Can you show that they're not?

>... you’d think they’d invented Skynet by the way they hand-wring.

Meanwhile, in reality: "Skynet, I'm not sure that line of thinking is correct. You should re-check the first part again before making any assumptions."

Skynet 4.6 Extended: "You're right, I should have caught that. Let me redo everything correctly this time."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: