Seems like really cool tech. Such a bummer that the it is not source available. I might be a minority in this opinion, but I would absolutely consider commercial services where the core tech is all released under something like a FSL with fully supported self-hosting. Otherwise, the lock-in vs something like kafka is hard to justify.
(Founder) We are happy for S2 API to have alternate implementations, we are considering an in-memory emulator to open source ourselves. It is not a very complicated API. If you would prefer to stick with the Kafka API but benefit from features like S2's storage classes or having a very large number of topics/partitions or high throughput per partition, we are planning an open source Kafka compatibility layer that can be self-hosted, with features like client-side encryption so you can have even more peace of mind.
Having a kafka compatible API and S3 storage would be something I would jump to, the savings over MSK would be huge.
If you had a (paid for) API that sat on top of an S3 API for on-prem, that would be fantastic as well.
Kafka is great, but the whole Java ecosystem and the lack of control of what is in the topics and the stuff about co-ordinating the cluster in zookeeper is a management PITA.
> we are considering an in-memory emulator to open source ourselves
I'd suggest a persistent emulator, using something like SQLite (one row per record). Even for local development, many applications need persistence. And it'd be even enough to run a single node low throughput production server which doesn't need robust durability and availability. But it still has enough overhead and limitations not to compete with your cloud offering.
What's however important is being as close as possible to your production system, behavior wise. So I'd try so share as much of the frontend code (e.g. the GRPC and REST handlers) as possible between these.
First-class kafka compatibility could go a long way to making it a justifiable tech choice. When orgs go heavy on event streaming, that code gets _everywhere_, so a vendor off-ramp is needed.
(Founder) That makes sense. We would eventually host the Kafka layer too - and will be able to avoid a hop by inlining our edge service logic in there.