Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why base 58? I've never even heard of base 58.


from the RFC:

> Base58 is designed with a number of usability characteristics in mind that Base64 does not consider. First, similar looking letters are omitted such as 0 (zero), O (capital o), I (capital i) and l (lower case L). Doing so eliminates the possibility of a human being mistaking similar characters for the wrong character. Second, the non-alphanumeric characters + (plus), = (equals), and / (slash) are omitted to make it possible to use Base58 values in all modern file systems and URL schemes without the need for further system-specific encoding schemes. Third, by using only alphanumeric characters, easy double-click or double tap selection is possible in modern computer interfaces. Fourth, social messaging systems do not line break on alphanumeric strings making it easier to e-mail or message Base58 values when debugging systems.


The linked README links a draft RFC [1], the introduction section of which contains an explanation.

[1] https://datatracker.ietf.org/doc/html/draft-msporny-base58#s...


Wow how did I never hear about this? I only read the first 2 paragraphs under Introduction, but it addresses all the qualms I have with base64 and its many variants.


The RFC doesn't mention an important drawback of base58, which is that the computational cost of encoding or decoding it is quadratic in the length of the input -- unlike base64, which is linear-time.

I assume the performance is acceptable when you're dealing with very small API tokens, but it would be totally unsuitable as a replacement for base64 in the context of, say, email attachments. Even using it for something like a PKI certificate would probably be asking for trouble.


I’m confused by this. Maybe I misunderstand the algorithm given in the draft linked above, but that doesn’t look O(n^2) at all. Intuitively it also doesn’t make sense as quadratic somehow implies that each character depends on all others (so for each of n chars you’d have do do something with all other n chars -> n^2)


It’s basically a naive algorithm of converting a number in base 256 (each byte is a digit) to base 58. The algorithm is super-linear and worst case quadratic due to all the carrying. Maybe average case quadratic too? Not sure.


Oops, I think my previous comment was inaccurate, and it's too late for me to edit it. It's only decoding that's quadratic, not encoding. Sorry for the confusion.

It's not really obvious (IMO) from the wording of the specification, but step 2 of the decoding algorithm is actually a nested loop.


Still not seeing it. As it's an encoding, not a compression, it cannot produce a large amount of output. At most I guess ceil(log2(256)/log2(58)). So encoding will produce a slightly larger output, while decoding will produce a slightly shorter output. Still all linear.


The output size is linear, but since each byte of output requires O(n) work to compute, the total time to decode an encoded string is O(n^2). Or to put it another way, the decoding throughput (in bytes per second) is inversely proportional to the total length of the string.


Odd. Certainly different to how I thought it works (equivalent to base64, just with divmod instead of bit-shifting). Thanks for fixing my intuition. Related: https://cs.stackexchange.com/q/21736


I usually just use base62 (ASCII letters/numbers). Visual inspection/comparison of a long random value seems unlikely, and when you get into 10+ random gibberish characters, you're just as likely to gloss over a transpositional error


Base62 is a great alternative! The libraries/alphabet around base58 are fairly common thanks to cryptocurrencies, but base62 libraries are generally available and easy to produce if needed.


Pretty sure that’s what they use for bitcoin addresses.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: