More

ArcHound · 2026-06-07T19:59:12 1780862352

This is the sad conclusion of the next part. JA4 is a great supplement, it can squeeze some additional info, but for a motivated attacker it can be avoided.

Now the question of how motivated are noisy AI scrapers is still open. Even a solution that cuts down 50 percent of the dumbest scraping attempts will still provide much needed relief to a struggling site.

mmarian · 2026-06-07T20:12:03 1780863123

I'm curious, which site struggles are you envisaging? In my exp, JA4 is used as a hammer for which the nail must be found; simpler solutions oftentimes work better.

ArcHound · 2026-06-07T20:24:03 1780863843

I think we agree that JA4 is situational. It really saved me when investigating a credential stuffing attack - random logins with random chance of success spread into many ASNs, all had the same fingerprint.

From my experience, there are all kinds of levels of bots. Add them all together and they can produce a ridiculous load on a site (especially a fragile one that you have to secure anyway). So I look at the volume, trying to block anything stupid I can get away with.

It is a game of whack-a-mole. It also can cut down the overall traffic to a fraction of the original, which has tangible infra costs benefits.

And yes, captcha works better in a lot of cases. Fortunately I'm not selling JA4, I'm just curious.

And yes, IP rate limits and ASN checks work really well in plenty cases. Side note: I got a high-throughput free offline asn-checker too! https://blog.miloslavhomer.cz/asn-check/

mmarian · 2026-06-07T20:41:55 1780864915

I agree JA4 is situational; but the # of use cases is smaller than most people think. Like you said, Captcha works better; would've stopped the credential stuffing. Managed DDoS services (Cloudflare et al) + rate limits are better at DDoS.

Cool ASN project, but doesn't IPInfo already offer this for free: https://ipinfo.io/lite ?

ArcHound · 2026-06-08T06:27:39 1780900059

Back in the day I couldn't find a downloadable DB for offline checks, which is very much needed when looking at approx 10k different IPs. Even with an offline DB I might need to create this tree structure so that I can process the data fast.

mmarian · 2026-06-08T07:08:43 1780902523

Fair enough. Great work with the projects!

ArcHound · 2026-06-07T18:45:36 1780857936

Hello again! Yes it is. If you have an exotic client, I'm here for it :D

Bender · 2026-06-07T18:54:44 1780858484

Nice. I was more curious of the clients using HTTP/2.0 HTTP Protocol, what percentage of them is JA4 detecting as bots that spoof all the other headers a browser sends? That is the missing piece in my blog write-up as I don't do SSL fingerprinting. I am trying to see what percentage are getting through my very crude methods.

ArcHound · 2026-06-10T05:22:37 1781068957

ok, so I've parsed some logs. I do see the ALPNs pointing to http2, but I don't capture all of the headers. The only thing I capture is the user-agent, which is the major spoof anyway.

Now, to differentiate between spoofed and non-spoofed header, I need to check the "valid" JA4 signature for a given browser and then proclaim that the rest of them are wrong. The "valid" JA4 signature can be observed, but I've found that sometimes browsers tweak their handshake a bit, so it's not 100% consistent.

The JA4 DB was recently taken down, I've requested full access, but no response (as expected). There might be some issues in getting those valid headers for the browsers, the hardware and software varies a lot (PC, Mac, Android, Iphone of all kinds of versions and browsers).

I was hoping for a quick win to share, but it doesn't seem like so and I'll have to do it properly. That should be my next post on JA4.

As a quick note, approx 30% of traffic claims to use http2 and approx 60% of that traffic has a non-bot user-agent (you know, along the lines of "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/149.0.7827.102 Safari/537.36"). I suspect majority of those are spoofed as I know how many readers I have on my blog.

ArcHound · 2026-06-07T19:11:17 1780859477

I'll get back to you on this, I'll need to parse some logs. I should have at least ALPNs

ArcHound · 2026-06-01T09:30:01 1780306201

"Who solved the alignment problem for these superhumans?"

The gun, pointed to their head.

ArcHound · 2026-06-01T09:27:51 1780306071

The article had a great opportunity to at least reference the newest encyclical from the Pope.

With a bit of click bait, this might be the first step of "Butlerian Jihad" -> "AI Crusade" and a founding text for the Orange CATHOLIC Bible.

I saw a take that you can now cite religious reasons to refuse working on and with AI - if more people try this, I wonder how it'd play out.

Interesting times.

ArcHound · 2026-05-31T16:57:44 1780246664

In this article I take a look at the technical properties of Encrypted Client hello as well as some scenarios that are not really covered by the threat model proposed.

I argue that to get any tangible benefit you have to use the big providers, which places trust into entities that are behaving less trustworthy by the hour.

Bender · 2026-05-31T17:38:35 1780249115

This is a really good write-up. I can't really think of anything important to add.

I don't know if it is worth adding but there is one small piece that can be kept at home rather than depending on Cloudflare or Google though by itself is rather moot but I will mention it anyway.

If using Unbound DNS [0] at home as a DNS resolver one can enable DoH if Unbound was compiled using --with-libnghttp2 thus allowing an HTTPS listener and enabling ECH tested / verified on [1]. I realize its just one tiny piece of the puzzle but we can take away the logging of DNS queries away from the big providers. If people do not trust their home ISP they can put Unbound on a VM or physical server somewhere else. I only mention this because I know some people run PiHole and other security distros on their WiFi or Firewall hardware at home.

Documentation [2][3]

I am half tempted to put a DoH listener out there for anyone to experiment with and see what kind of abuse it gets.

[0] - https://nlnetlabs.nl/projects/unbound/about/

[1] - https://tls-ech.dev/

[2] - https://unbound.docs.nlnetlabs.nl/en/latest/topics/privacy/d...

[3] - https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound...

ArcHound · 2026-05-31T18:03:48 1780250628

Thank you for the kind words.

DoH is a critical enabler of ECH, and getting it right isn't easy - especially dodging all of the free services provided by the giants.

Bender · 2026-05-31T18:13:45 1780251225

In my unbound.conf it looks like this:

    # https://dohint.mydomain.tld/dns-query
    # lan interface
    interface: [x.x.x.x]@443
    # wifi interface
    interface: [x.x.x.x]@443
    https-port: 443
    http-query-buffer-size: 16m
    http-response-buffer-size: 16m
    http-max-streams: 420
    tls-service-key: "/etc/unbound/keys.d/unbound_server.key"
    tls-service-pem: "/etc/unbound/keys.d/unbound_server.pem"

Then in browsers / devices I set a custom DoH endpoint of https://dohint.mydomain.tld/dns-query and uses the same key/cert I used in the past for DNS over TLS (DoT) which is still listening on TCP port 853

ArcHound · 2026-06-01T05:31:16 1780291876

Have you tried putting this behind a reverse proxy? This gives us a lot of features like rate-limiting and it should work well since it is https after all.

Bender · 2026-06-01T13:17:36 1780319856

I thought about putting a few instances behind HAProxy for public use. Not sure many people would use it.

Bender · 2026-06-01T23:35:56 1780356956

I put Unbound directly on the web to play with for now, having some quirks with haproxy. It has an hourly cron job that pre-caches the Cloudflare Top 20000 or so .com .net .org .is domains and some domains I use.

https://doh.nochan.net/dns-query

ArcHound · 2026-05-31T16:50:44 1780246244

Simply put, you'll need algebra, linear algebra, number theory. So a lot of math with various degrees of depth.

allthetime · 2026-05-31T19:56:22 1780257382

Do you have any recommendations for effective self-learning paths? I have murky old foundations in all three fields (took first year linear algebra and a variety of logic courses) so am not starting from nothing but the few times I’ve tried to jump back in I always get a bit bogged down and can’t keep with it.

notfed · 2026-06-01T07:52:26 1780300346

Gilbert Strang has an excellent free (IIRC) linear algebra course, accompanied by his excellent textbook. Kahn Academy has some surprisingly good linear algebra lectures as well, if something isn't clicking.

ArcHound · 2026-05-31T07:00:58 1780210858

Oh this brings me back to my uni days. I suppose that since this is the basis of post-quantum crypto it is a good time to learn this.

Seems to me that these lattices and error-correcting codes are very close to each other, but for some reason they are discussed separately.

I'd wager that there will be some reductions between those problems - maybe I could dig more around that.

slwvx · 2026-06-01T01:30:29 1780277429

> Seems to me that these lattices and error-correcting codes are very close to each other...

mb see https://www.amazon.com/Lattice-Coding-Signals-Networks-Quant...

ArcHound · 2026-05-27T18:25:13 1779906313

Makes sense if you think about it: if all photons pass through you (invisible) then you can't capture them to get info (blind).

ArcHound · 2026-05-13T07:52:27 1778658747

Pretty well if you consider the "bio" label, which is a set of practices not using all of the tech. They can ask for and usually get higher prices for the products.

Granted, it's more about chemicals than tractors, but still quite close to the spirit of the comments. Bio approach sacrifices some tech advances.

ArcHound · 2026-04-28T11:04:42 1777374282

But, there are? I can host a repo on GitHub, Codeberg and self host it too. Then I need to watch over main to keep it consistent between those. After that's established, I can do updates from wherever. Link'em in the README.

embedding-shape · 2026-04-28T11:22:08 1777375328

There are distributed forges? Yes, git is distributed, but often everything around it isn't. The case parent is trying to make, is that the rest ("federated forges") should also be distributed, not just git.

ArcHound · 2026-04-28T11:35:56 1777376156

Ok, gotcha. So there's a demand for the additional features that are not bundled within git to be federated somehow.

I'd say we have emails, mailing lists and bug trackers. Or maybe: what is the missing killer feature that needs federation?

embedding-shape · 2026-04-28T11:37:35 1777376255

> what is the missing killer feature that needs federation?

Issues, pull requests, collaboration/permissions/access, "staring"/"favoriting", etc.

I think ultimately the goal is that people can run their own forges, yet still collaborate on repositories hosted in other forges, leveraging your existing authentication so you no longer need to sign up individually for each forge.

nibbleyou · 2026-04-28T11:13:41 1777374821

There's also a tool to automatically push it to multiple repos: https://github.com/prashantsengar/GitEcho

Disclaimer: the author is a colleague of mine

Though to be fair, what the parent meant by federated forges is different than this approach.

pabs3 · 2026-04-28T12:24:28 1777379068

git itself can push to multiple URLs btw:

https://stackoverflow.com/questions/849308/how-can-i-pull-pu...