How do you/AirBnB handle deeply linked features (2-hop+?) that are also latency sensitive? Maybe I'm missing something, but I don't imagine that with the transformation DSL described in Chronon.
For our org, those are by far the most complicated to handle. Graph DBs are kind of scaling poorly, while storing state in stream processing jobs is way too large/expensive. Those would also be built on top of API sources, which then lead us to the unfortunate "log & wait" approach for our most important features
In the API itself - you could specify the chain links by specifying the source.
To be precise - a GroupBy(aggregation primitive) can have a Join(enrichment primitive) as a source. To rephrase, you can enrich first and then aggregate and continue this chain indefinitely.
> Graph DBs are kind of scaling poorly
That makes sense. Since you scaling these on the read side it is much much harder than pre-computing on the write side. (That is what Chronon allows you to do)
Offline is pretty easy to get started with. It should take less than a week to set it up for new use-cases across the company. (You can begin building training-sets if offline is setup)
Online is a bit more involved - you need a month or more to test that your KV store scales against traffic coming from chronon for reads and writes.
Bighead is the model training and inference platform.
Chronon is a full re-write of zipline with
1) a different underlying algorithm for time-travel to address scalability concerns.
2) a different serde and fetching strategy to address latency concerns.
I noticed airflow as the backing orchestration service. Was there any consideration for another orchestration tool? I know Airbnb has at least two internally, but also that airflow is the predominant one for the data org still.
I'm also curious how you went from a non-platformatized approach to adopting this platform; what were the important insights for strategizing, prioritizing, motivating teams to lift existing pipelines into the new thing? Open ended question
- inability to back-test new real-time features. People were forced to log-and-wait to create training sets for months. Chronon reduces this to hours or days.
- the difficulty of creating the lambda system (batch pipeline, streaming pipeline, index, serving endpoint) for every feature group. In chronon, you simply set a flag on your feature definition to spin up the lambda system.