We are developing an open-source cloud-scalable vector database: Milvus.

In our new cloud architecture, we need to employ a messaging framework to serve as the central log sequence of the whole system. Ultimately, we will support well-known messaging solutions like Apache Kafka, Apache Pulsar, etc.

Referring to Confluent’s article, Pulsar’s message consumption model is push-based, while Kafka’s is pull-based.

Our new cloud architecture is based on the actor model, which means all the worker nodes are working asynchronously, and the log sequence is the key to linking all the nodes. We think the push-based message consumption model and the actor model have more logical consistency. So we started with Apache Pulsar in our new implementation. We did have some concerns about Pulsar; for example,

  • The project popularity
  • The Go SDK is buggy
  • The documentation is not as good as Kafka

We don’t think these points are showstoppers. However, some of our users have different thoughts. They are asking when we would support Kafka. Some of them even mentioned they won’t maintain Pulsar in their production environment. And they have not provided convincing reasons to us.

So please let me know your thoughts about this question. Should we give Kafka adoption a much higher priority since we just completed the new lease GA?