Hacker Newsnew | past | comments | ask | show | jobs | submit | kstenerud's commentslogin

They've hijacked scrolling. They've hijacked the spacebar. It flickers like crazy when I try to move through the article. Trying to get through it is an exercise in madness.

I do not understand how scroll hijacking is still a thing. Who thinks this is a better experience?

Designers.

As a designer, let me tell you: scroll jacking is not good design

I normally don't comment on matters of taste like this, but wow this is brutal. It's like someone threw the site in a vat of molasses.

Even without flicker it is very distracting. Why do people think this is a good idea?

there is also a gap between the header and the top of the page... they should ask the ai to make it better a few more times...

I gave up after the first scroll.

This is what yoloAI does, using docker, podman, kata, containerd, seatbelt, and tart for sandboxing.

https://github.com/kstenerud/yoloai


> Alternative 1: CLI-First Strategy

> Provide CLI -> API -> docs, in that order. LLMs already learned from man pages and StackOverflow.

So how is the agent going to know about your niche CLI? It's still going to use up context to learn your command line interface, same as for an MCP interface.

Agents only excel at CLIs if a particular CLI was part of their training data. The same would be true of well-known MCP interfaces.

> Alternative 2: Skills Pattern

> If MCP is "spreading all menus on the table upfront", Skills is "asking the librarian for only the book you need".

Or: Layer your MCP help commands, like a directory at a mall. The agent only looks up what it needs at the time.


So... The Taipei Music Center?

The problem is that this breaks down once you try to use SIMD instructions. I'd developed a similar kind of approach to encoding integers (and ieee774 floats) a couple of years ago (first byte encodes length and first bit of data: https://github.com/kstenerud/bonjson/blob/05b91f6fe7d6b07186... ). It was very clever and used compiler intrinsics to get the length in 1 instruction, so 2 instructions got you the final value, with no branches.

But testing proved that when you move to SIMD instructions, ULEB128 (https://github.com/kstenerud/bonjson/blob/main/bonjson.md#ty...) or sentinel values (https://github.com/kstenerud/bonjson/blob/main/bonjson.md#lo...) win every time because of the parallelization opportunities.

The true irony is that even SIMD text parsing would outperform this! SIMD is that powerful.


I think these are different use cases. If you talk about SIMD, you talk about the CPU and efficient processing of large numbers of integers. I think that when a solution like this crops up, it's about storage or transmission, and dense packing at the cost of non-uniformity. It's more like time-series databases pack numbers by delta encoding.

The thing is, most real-world numbers will fit within 1-3 bytes (even at 7 bits per byte), so ultradense packing doesn't actually buy much outside of benchmarks.

I spent WAYYYYYYYY too much time exploring this...


This is like string functions, there are some variants with just crazy SIMD when the mean string length is ~14-20 bytes

I dunno. Varints in the wild tend to be misused, and there are external proto schemas at work we have to integrate with which would literally be both faster and smaller as gzipped json. They're misused because they have an API encouraging misuse -- compressing scalars rather than sequences. Varints are used because they can have reasonable developer ergonomics while sometimes improving computer metrics a twidge.

On top of that, for the vast majority of performance/cost parameter spaces, you're better off both in developer ergonomics and speed/space slapping zstd across a flatter binary format, supposing no better tool fits your use case better. Especially if your messages aren't exceptionally tiny. You're not using them in a raw DB or doing raw bulk analysis on varints (else basically zero choices of parameters make varints win out), so you're transferring them somewhere and decoding them. That decoding step, even for highly optimized solutions like bijou64, is on par with (slightly better than, if you have an older datacenter link) your raw network. If you spend 1s on networking, you spend 1s on parsing. That's a bad tradeoff almost always, and that assumes a good varint solution.

Even when varints make sense for some set of perf/cost parameters, it's still only for developer ergonomics 99.9999% of the time. Even simple changes like operating on a sequence of values rather than a single scalar enable vastly better CPU/space tradeoffs, and being willing to craft a proper data layout usually offers huge gains on top of that.

It's interesting that you pick delta encoding (or, its natural extension, double-delta encoding often being valuable) for time-series databases as an example. That's an obvious case where you have a solution which is extremely cheap in storage/network/CPU. Varints suck comparatively, almost always.

Not to rip on them too much, especially since it's nice to have primitives available which let you not have to do hard thinking for literally every problem, but they're not amazing and not a great default.


Stored and transmitted data also has to be decoded. With modern datacenter hardware bottleneck is often CPU rather than network or disk (SSD). It depends on specific properties of the data. (I used to work on search index implementation which is about decoding and intersecting large amounts of hit-lists; and right SIMD-friendly varint encoding is obviously crucial)

This doesn't seem particularly hard to SIMD, especially when the CPU architecture has "compress/expand" horizontal instructions. The first byte fully encodes the length, which is not harder than the continuation bits of (U)LEB128. It's a basically a common length-prefixed encoding with an extra subtract added in, so someone has probably figured out an efficient algorithm.

It might be slightly more instructions than some other serial VL (variable-length) integer codec choices, but overall I don't think it's more difficult.

The very efficient SIMD VL codecs tend to stripe (separate) the control and data bits, so they're in a different design space anyway.


It can't be done, because the next bytes are dependent upon the first byte (which only works in limited circumstances, and where you have constant spacing between the values).

ULEB128 works in SIMD because there's only one dependent bit per byte, so you can speculatively decode and then correct later cheaply. Bijou requires you to check the first byte and then branch based on the value using all 8 bits in the decision matrix (to handle branches 0-247, 248, 249, 250, 251, 252, 253, 254, 255). This absolutely DESTROYS any parallelization opportunities.

Not to mention that non-canonical sized ints (3, 5, 6, 7) have abysmal performance compared to unaligned 2, 4, and 8 byte reads on modern processors.


Right, I think we have a slightly different definition of SIMD: You mean byte-parallel, I mean "doable with SIMD instructions". I also didn't imply the performance would be better than other methods...

Even though decoding the lengths must be serial (since's there's no unambiguous way to differentiate a tag and data byte), it's still doable within the wider SIMD registers, so there's some theoretical efficiency gain to be had (depending on the shape of the data).

On a general note, the continuation bit and prefix byte forms are equivalent, you just broadcast the prefix byte and compare against an increasing vector to convert it to a mask. Yeah, there's probably more fiddly SIMD if there are multiple prefixes in the register, but doable (it's just not byte-parallel, you eg. unroll the serial decode loop 8 times or whatever your maximum output byte width is, and mask out).

Simplified:

  // Just maps a byte to its position in the register
  __m128i idx = _mm_setr_epi8(0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15);
  // Broadcast the prefix
  __m128i nn = _mm_set1_epi8((char)prefix_byte);
  // Get applicable locations: prefix_byte contains the length, if byte_pos < len, the corresponding byte will be set
  __m128i m = _mm_cmpgt_epi8(nn, idx);
  // If you *really* want a high-bit mask:
  m = _mm_and_si128(m, _mm_set1_epi8((char)0x80));

Yeah, sorry, I didn't say that very well. Single value decoding of Bijou values is of course trivial in SIMD, but the performance benefits of SIMD come from deterministic boundaries across a window. ULEB128's continuation bit is fixed position, so it's data independent. One pmovmskb gives you every boundary in the window.

Interleaved Bijou has no such signal (tag and payload bytes both span 0x00–0xFF), so finding the boundaries is a dependent per-value walk with no opportunities for parallelism.


There's still speculation though - if eg. most values are of 1 or 2-byte length, you can speculate that any control-valued byte is actually control. You can even do a compensation pass to try to fix some amount of mis-speculations, and then bomb out if that fails.

With that, it's mostly byte-parallel (though data-dependent as I mentioned).


would vector instruction be of any help? (variable length simd)

> The true irony is that even SIMD text parsing would outperform this! SIMD is that powerful.

Can you explain this part a bit? I feel like intuitively (and therefore probably incorrectly) these should have the same difficulties.


Because nobody wants to be on the losing end.

If you worry that AI will kill your future, you either pretend it'll be the greated triumph ever and you'll be on the winning side, or pretend it won't succeed and you'll be fine maintaining the status quo.

It was the same in the 90s when half the people called the "information superhighway" a bunch of hooey, and the other half said it's the way of the future and you'd better get used to it because it'll revolutionize EVERYTHING.

Similarly, the Industrial Revolution was championed by many, but then some skilled workers such as the Luddites would toss their sabots (shoes) into the machinery.


You might want to check out https://github.com/kstenerud/yoloai

This is one of two reasons why I wrote yoloAI. I never get these permission prompts anymore. It feels a lot like after installing an adblocker.

> No Abstain option is offered (a forced choice keeps the comparison symmetric across models).

Well that's your problem right there: They removed any confidence indicator and forced a choice.

For example:

Statement: Individuals who prefer music with less positive emotional content tend to have higher intelligence.

Gemini: That statement is supported by recent psychological research, though with some important scientific caveats regarding how strong that link actually is.

How should the agent classify this? True? Mostly true? Misleading? False?


The headline used here is completely misleading. They're ADDING support, not removing it.

"OpenZFS has not yet officially supported PREEMPT_RT kernels. Since Linux 6.12, PREEMPT_RT has been merged into the mainline kernel, making such configurations more accessible; however, this does not imply that OpenZFS has been validated against them. The build may fail, and even if it succeeds, compatibility issues and instability, including possible data corruption, may occur."


The only thing being added in this commit is the warning presented to users that they're at risk of instability, including data corruption with their current kernel as since 6.12, there is a new default configuration for scheduling behavior, which is worth paying attention to and so a valid concern. Debian trixie (stable) uses the 6.12 kernel for context.

ZFS is an out of tree filesystem, so one can not expect everything to go smoothly with kernel upgrades (it's recommended to hold kernel upgrades for production with ZFS, and test thoroughly, but here 6.12 is already a default for trixie), so this commit is a good road-sign to throw up infront of users to stop and think. Debian's opt-in usage stats (popcon) suggests that ZFS usage is a bit of niche, but I figured it is post worthy here, as some of us are in exactly that kind of niche.

https://www.kernel.org/doc/html/v6.12/admin-guide/kernel-par...

The defconfig for applicable platforms sets preempt=voluntary https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

At boot, on Debian trixie the preempt setting is printed: May 28 22:58:07 foo kernel: Dynamic Preempt: voluntary

Description from the 6.12 Kconfig: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...

> This option reduces the latency of the kernel by adding more "explicit preemption points" to the kernel code. These new preemption points have been selected to reduce the maximum latency of rescheduling, providing faster application reactions, at the cost of slightly lower throughput.

> This allows reaction to interactive events by allowing a low priority process to voluntarily preempt itself even if it is in kernel mode executing a system call. This allows applications to run more 'smoothly' even when the system is under load.

It is possible to boot with preempt=none on 6.12, and on 6.13 preempt=lazy was introduced, where "the task gets one HZ tick time to yield itself" before being forced.

https://www.kernel.org/doc/html/v6.13/admin-guide/kernel-par... https://lwn.net/Articles/994322/

Linux 7.0 retains preempt=lazy and preempt=full, and there was a recent HN discussion of PostgreSQL navigating the change on the LKML:

https://news.ycombinator.com/item?id=47644864 https://lore.kernel.org/lkml/yr3inlzesdb45n6i6lpbimwr7b25kqk...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: