Roughly ten years ago, I started Gazette [0]. Gazette is in an architectural middle-ground between Kafka and WarpStream (and S2). It offers unbounded byte-oriented log streams which are backed by S3, but brokers use local scratch disks for initial replication / durability guarantees and to lower latency for appends and reads (p99 <5ms as opposed to >500ms), while guaranteeing all files make it to S3 with niceties like configurable target sizes / compression / latency bounds. Clients doing historical reads pull content directly from S3, and then switch to live tailing of very recent appends.
Gazette started as an internal tool in my previous startup (AdTech related). When forming our current business, we very briefly considered offering it as a raw service [1] before moving on to a holistic data movement platform that uses Gazette as an internal detail [2].
My feedback is: the market positioning for a service like this is extremely narrow. You basically have to make it API compatible with a thing that your target customer is already using so that trying it is zero friction (WarpStream nailed this), or you have to move further up to the application stack and more-directly address the problems your target customers are trying to solve (as we have). Good luck!
(S2 Founder) Congrats on the success with Estuary! You are not the first person to tell me there is no/tiny market for this. Clearly _you_ thought there was something to it, when you looked to HN for validation. We may do a lot more on top of S2, like offering Kafka compatibility, but the core primitive matters. I have wanted it. It gets reinvented in all kinds of contexts and reused sub-optimally in the form of systems that have lost their soul, and that was enough for me to have this conviction and become a founder.
ED: I appreciate where you are coming from, and understand the challenges ahead. Thank you for the advice.
The market is gobsmackingly huge, it's just the go-to-market entry points which are narrow.
In my opinion, the key is to find a value prop and positioning which lets prospects try your service while spending a minimum of their own risk capital / reputation points within their own org.
That makes it hard to go after core storage, because it's such a widely used, fundamental, and reliable part of most every company's infrastructure. You and I may agree that conventions of incremental files in S3 are a less-than-ideal primitive for representing streams, but plenty of companies are doing it this way just fine and don't feel that it's broken.
WarpStream, on the other hand, leaned in to the perceived complexity of running Kafka and the share of users who wanted a Kafka solution with the operational profile of using S3. Internal champions can sell trying their service because the prospect's existing thing is already understood to be a pain in the butt.
For what it's worth, if I were entering the space anew today I'd be thinking carefully about the Iceberg standard and what I might be able to do with it.
Gazette started as an internal tool in my previous startup (AdTech related). When forming our current business, we very briefly considered offering it as a raw service [1] before moving on to a holistic data movement platform that uses Gazette as an internal detail [2].
My feedback is: the market positioning for a service like this is extremely narrow. You basically have to make it API compatible with a thing that your target customer is already using so that trying it is zero friction (WarpStream nailed this), or you have to move further up to the application stack and more-directly address the problems your target customers are trying to solve (as we have). Good luck!
[0]: https://gazette.readthedocs.io/en/latest/ [1]: https://news.ycombinator.com/item?id=21464300 [2]: https://estuary.dev