Asami: A flexible graph store in Clojure

mark_l_watson · on Nov 18, 2022

I just spent ten minutes looking through the repo. I like how the query language is similar enough to Datomic’s to have an easy learning curve. I wish I could push a magic button and have this in Common Lisp (where I usually just use SQLite as an easy to use data store).

harryvederci · on Nov 18, 2022

An idea that pops up more and more often in my head is to create a SQLite wrapper library that basically treats SQLite like Datalog.

I know way too little about Datalog, but it goes something like this:

1. create helper functions to create/alter/drop tables.

2. create helper functions for insert/select/update/delete queries, etc. Probably with the same syntax as in Datomic / Asami / etc.

3. each table its name would be a Datalog "attribute", and would have 2 columns: the "id" and the "value". "id" would be indexed, "value" would be a value or a foreign key to another table.

I guess writes would be slow if everything you insert into the DB is indexed, not sure what the other downsides would be.

packetlost · on Nov 18, 2022

It's not like datomic claims to be lightening fast, it's not. The real problem is if you want to support the historical queries "as_of()", which is very difficult to do efficiently with only B-tree indexes. So you're either left maintaining your own indices as tables or externalizing it to something else. I think the latter would actually be a pretty ok option if your dataset fits entirely in memory (though... for historical queries it will grow much faster than a normal database).

Regardless, I have a partial attempted implementation of this on top of SQLite using Python: https://git.sr.ht/~chiefnoah/quark

No query engine, I didn't get that far.

quoll · on Nov 18, 2022

Not sure what you mean here? as_of() is easy to do efficiently, so long as you have tree indexes (it doesn't really matter what type of tree, though fractal trees would not be good). But you need direct access to the tree, rather than having an index that's built using a tree, since then you can't build the phases.

packetlost · on Nov 18, 2022

Right, if you have direct access to the tree it's possible and not really inefficient at all (a modified B+Tree would be my first pick, but I digress). Part of my problem was I was trying to offload as much of the work to SQLite as possible. I think if I were to go back and try again, I would make it a bit further :)

harryvederci · on Nov 18, 2022

I'm talking about Datalog, I mentioned Datomic purely in the context of syntax.

And to sum up what I know about Datalog, it's basically just this article (not reference material, but it got me interested in the concept), plus some light wikipedia-ing: https://www.instantdb.com/essays/datalogjs

Edit: I think you added the part with your git repo after I wrote this. Looks cool, could you add a license file to it?

packetlost · on Nov 18, 2022

Got it. Yeah, datalog is cool. I really like it (if that wasn't obvious from my attempt at implementing it). While not built on SQLite, this project has caught my attention recently: https://github.com/cozodb/cozo

It's built on RocksDB and has slightly different query syntax (supposedly to be more similar to Python's, as a primary target usecase is within Jupyter notebooks). Check it out!

zh217 · on Nov 19, 2022

More precisely, Cozo currently has a RocksDB backend. The development version also has SQLite backend (for easier compilation on mobile platforms), in-memory backend, etc. It now can run as a web assembly module as well (see https://cozodb.github.io/wasm-demo/)

With respect to the SQLite backend, Cozo uses SQLite purely as a KV store with range scan capabilities, with both keys and values stored as bytes. This actually results in faster queries than offloading some stuff to the SQLite engine for the (very few) cases we tested.

Disclaimer: I'm the author of Cozo.

packetlost · on Nov 19, 2022

Oh wow, I didn't catch that. Awesome work man, I'm fan of Cozo and hope to make contributions in the future!

zh217 · on Nov 19, 2022

What you described can be implemented efficiently with B-trees. The problem is, the required operation is low-level and not usually exposed by the database engines. Basically you need to be able to walk the tree, up and down.

Say your table is `[data, timestamp]`, with data sorted ascendingly and timestamp sorted descendingly in the tree, and you want to scan for values for `as_of(T)`. Assume you have found `[data1, T1]` as a valid row. To get the next row, instead of the usual scanning, you seek to the value greater than `[data1, neg_inf]`. Now assume the value you get for the seek is `[data2, T2]`. If `T2 <= T`, then you get a match and can continue with the loop by seeking to `[data2, neg_inf]`, otherwise seek to `[data2, T]` and see what you get.

It is doable in SQLite, but you must use its VM directly, as you cannot do tree-walking with SQL.

Assume you have `M` keys and that for every key you have `N` timestamped data points. The above algorithm cuts down the as-of query time from linear complexity in `N` (`MN log(M)`) to logarithmic complexity in `N` (`M log(MN)`), which I think is the best we can do.

packetlost · on Nov 19, 2022

If I were to implement it, I would have a custom B+tree where you have a double-layer. The "upper" half of the tree the blocks are sorted by the "key". Once you get to the "leaf" nodes of the upper layer, you have a "lower layer" that contains blocks, all with the same key (or omit it), but sorted by the timestamp/tx instant. You get O(log(n) + log(m)) for historical queries, but O(log(n)) for "current" queries.

Or you can just index all modifications to all keys and have them ordered by key then tx, which will give you O(log(n)), but where your n is all values that have ever been set set.

dms92n1bx · on Nov 18, 2022

There is (now unmaintained) project called Mentat [0] from Mozilla.

[0] https://github.com/mozilla/mentat

michaelsbradley · on Nov 18, 2022

Did you see the Show HN post announcing cozo 10 days ago?

https://news.ycombinator.com/item?id=33518320

https://github.com/cozodb/cozo

Note: there was a fair amount of discussion about the license; the author eventually switched it to MPL2.

quoll · on Nov 18, 2022

The main problem is that I don't have a lot of time on it right now, though I'm trying to do more.

I don't actually know Common Lisp, though it seems to work nicely when compiled to binary via Graal, so maybe there's some possibilities of integration using that approach?

sitkack · on Nov 19, 2022

My first thought was a Truffle CommonLisp so you could integrate via the GraalVM.

https://github.com/charig/TCLisp

https://github.com/armedbear/abcl/issues/62

fiddlerwoaroof · on Nov 18, 2022

Do you happen to know of any good resources for implementing datalog? I’ve thought about writing something similar in Common Lisp several times, but it’s hard to find a good introduction here.

worldsayshi · on Nov 18, 2022

> schema-less

It's really good to see a scalable database that did schemas a bit like typescript does types. Opt-in explicit schema, try to infer schemas when possible and allow schema-less if I ask for it.

Hmm, I suppose elasticsearch does have this.

quoll · on Nov 18, 2022

Hmmm, I wouldn't call it scalable. The storage is pluggable though, and it can be made to scale that way. But it holds a lot of data locally, and seems to do a reasonable job. "Loading" means indexing everything, so nothing is available until the index is done, and that can take a few minutes if you're trying to load GBs. However, it's usually very fast to query. I don't have an explicit schema yet, though I'm working on adding something like Datomic's schema as an option. (this is important for things like upsert, or just updating entity attributes instead of appending new attributes)

joeevans1000 · on Nov 18, 2022

XTDB has a new version coming out which will support, as I understand it, more SQL friendly interaction for those who want it.

klysm · on Nov 19, 2022

I don’t understand why you would choose a jvm language to write a database in.

krn · on Nov 21, 2022

https://github.com/questdb/questdb is a good success story.