> the blog post titled Lock-freedom without garbage collection, which demonstrat...

gpderetta · on Jan 30, 2019

The issue is memory reclamation in node based data structures with multiple concurrent lock free writers. With GC it is a non issue. Otherwise you have resort to reference counting (or GC by another name), hazard pointers, RCU or similar.

Diggsey · on Jan 30, 2019

Yep, but even reference counting doesn't really help: the basic building block of many lock free algorithms is the ability to read/write a reference to/from a shared location atomically.

With a reference counted pointer, you cannot "read" it without incrementing the count, or else someone else might drop the count to zero and free the memory. You would need hardware support for atomically reading a pointer, dereferencing it, and then incrementing the value at that location.

GC (whether tracing or epoch based), hazard pointers or avoiding dynamic memory allocations in the first place (so all memory is pre-allocated, outside of the lock free parts) are the only solutions to this I'm aware of.

gpderetta · on Jan 30, 2019

True. IIRC there are ways [1] to implement try lock free updates of reference counted pointers but they are a bit exotic and not cheap.

[1] at the very least you can implement a lock free n-CAS operation given a plain CAS.

vardump · on Jan 30, 2019

> Otherwise you have resort to reference counting (or GC by another name), hazard pointers, RCU or similar.

Yeah. And atomic reference counting is expensive (20-80M contested atomic ops per second until all CPU cores are saturated), hazard pointers... are hard, and RCU can block.

gpderetta · on Jan 30, 2019

I guess you can think of hazard pointers as a degenerate mark and sweep concurrent GC, where the hazard pointers themselves are the roots and the scan goes only one level deep.

CyberDildonics · on Jan 30, 2019

If someone is trying to do concurrency by using 80 million contested atomic ops per second, they are likely doing just about everything wrong. The currency of concurrency is isolation from dependencies and synchronization. 80 million small synchronizations per second is the polar opposite of how to get good performance.

gpderetta · on Jan 30, 2019

But one issue with RC is that pure read operations might require refcount updates so even a read mostly data structure that it is expected to scale well with the number of readers will perform horribly as each reader acquires and release references.

vardump · on Jan 30, 2019

That was my point. The reason why atomic reference counting can be a bad idea.

CyberDildonics · on Jan 30, 2019

Anything can be a bad idea if it is abused and having 80 million individual overlapping reads of individual shared objects is total nonsense. That kind of synchronization on tiny bits of data would just indicate terrible design choices more than anything. Atomic reference counting can be extremely useful, simple, elegant and fast, but there is no single silver bullet to concurrency.

gpderetta · on Jan 31, 2019

Having an arbitrary large number of concurrent read operations and expect no or minimal contention is completely reasonable.

CyberDildonics · on Jan 31, 2019

That depends on the technique, the lifetime of the data and the lifetime of the memory that holds it.

If you want to make sure the data won't be touched and the memory won't be freed while you read it, reference counting can be a great technique.

If the memory allocation (including size) is already known to be stable but the data could change, the data can be read along with a time or version number that will let the reader make sure it didn't change while it was being read.

If the data can be read atomically, this isn't a problem. If the 'data' is just a pointer that is meant to transfer ownership to the reading thread this isn't a problem, etc.

The underlying principal here is that there are many different techniques and design trade-offs when it comes to concurrency and synchronization. Discounting one thing because it isn't a silver bullet is ridiculous, because there are not silver bullets. A system has to be architected as a whole.

CyberDildonics · on Jan 30, 2019

Garbage collection just shifts some of the responsibility, which means that garbage collector dictates part of the algorithm.

When there are 'lock free' data structures that are just flipping pointers with compare and exchange instructions, they are essentially ignoring the storage of memory all together and hoping the garbage collection takes care of it, while optimally the data structure would confront the storage of data and organization of access in an integrated way.

vardump · on Jan 30, 2019

> Garbage collection just shifts some of the responsibility, which means that garbage collector dictates part of the algorithm.

No. In this case it allows cheap, (mostly) unsynchronized deallocation. Otherwise you'll have to synchronize object deletion. Not only can that be tricky to get right (use-after-free hazard), but the required synchronization consumes a lot of CPU cycles. Ask any C++ programmer.

With (limited) GC, you only need to trace reachability to free the objects, avoiding almost all of the expensive synchronization operations.

You don't even need to use GC in general case, just for the data structures.

AboutTheWhisles · on Jan 30, 2019

> No. In this case it allows cheap, (mostly) unsynchronized deallocation.

There isn't anything about garbage collection that enables something that couldn't be done before. If a lock free data structure is holding other non-trivial data structures that have no notion of concurrency, you are already in a position where you have to take what you can get.

> synchronization consumes a lot of CPU cycles. Ask any C++ programmer.

I've done a large amount of lock free programming in C++ and the best scenario is lock free data structure that takes responsibility for all the data stored in it and isn't just a container of pointers is the best approach to control the amount and types of synchronization. Atomic reference counting with the last reading thread responsible for deletion is just fine as an approach.

Ultimately, having granular but heap allocated data structures in a 'lock free' container needs to come with proper expectations, since it is a poor way to scale concurrency. Granular synchronization can be fine if typical usage isn't going to see much overlap or if the whole thing won't see much use in the first place, but for concurrency to scale, larger chunks of data separated out by their dependencies needs to be used.

vardump · on Jan 30, 2019

> There isn't anything about garbage collection that enables something that couldn't be done before. ...

True, there's always another way. But you can say same about almost any subject. In my experience about lock-free (and wait-free) programming, safe and high performance deallocation of the objects is often a problem. (Especially in kernel mode when context switching is not an option.)

Whenever high performance is required and it's possible to avoid atomics (CAS, FAA, LL/SC, etc.), (false) cache line sharing or even memory operations altogether, I'll take it. Not to mention CAS (or equivalent) loops and mutexes...

AboutTheWhisles · on Jan 30, 2019

> True, there's always another way. But you can say same about almost any subject.

This thread was started because the article focuses on lock free programming with garbage collection when garbage collection is not any sort of fundamental factor in lock free programming.

> In my experience about lock-free (and wait-free) programming, safe and high performance deallocation of the objects is often a problem.

Concurrency bottlenecks are fundamentally about synchronization. The solution is frequently to synchronize less and that means doing more in between synchronizing. This implies allocating and deallocating more at one time.