Multiple Requests at the same time (multi-threading concern)

I am doing a test to send multiple requests (e.g. 10 requests) at the same time for different entity ID to the application which is started with 1 replicas. I assume each request takes 2 seconds to process. The test behaviour shows that 10 requests are processed in sequence (i.e. taken 2 x 10 = 20 seconds) to complete. Why it is not 2 seconds for all 10 Requests at the same time? Does it mean that Kalix does not support multithreading?

Some questions to ponder:

  • Are you sure you’re sending them to different entities?
  • Did you make sure the entities are all created before running the test?
  • Is your timing “assumption” correct? What if it takes 20 seconds for each to complete?
  • Are you sure you’re sending them to different entities?
    • I used Jmeter to create 10 thread groups and send 10 requests to different entities. i.e. Each threads send 1 requests at the same time.
  • Did you make sure the entities are all created before running the test?
    • No. Why do we have to make sure the entities are all created before? What tricky mechanism inside Kalix?
  • Is your timing “assumption” correct? What if it takes 20 seconds for each to complete?
    • If take 20 seconds for each to complete, the behaviour in client is timeout, while the server side keeps processing the request one by one, so 10 requests will finally take 20 seconds x 10 requests = 200 seconds to complete all transactions.

my code is using shopping cart example to add thread.sleep to simulate the long processing inside the application.

e.g.

    @Override
    public Effect<Empty> addItem(ShoppingCartDomain.CartState currentState, ShoppingCartApi.AddLineItem addLineItem) {
        if (addLineItem.getQuantity() <= 0) {
            return effects().error("Quantity for item " + addLineItem.getProductId() + " must be greater than zero.");
        }
        ShoppingCartDomain.ItemAdded itemAddedEvent = ShoppingCartDomain.ItemAdded.newBuilder().setItem(
                ShoppingCartDomain.LineItem.newBuilder().setProductId(addLineItem.getProductId()).setName(addLineItem.getName()).setQuantity(addLineItem.getQuantity()).build())
                .build();
        try {
            System.out.println("before sleep = " + currentState.getItemsCount());
            TimeUnit.MILLISECONDS.sleep(2000);
            System.out.println("after sleep = " + currentState.getItemsCount());
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        return effects().emitEvent(itemAddedEvent).thenReply(__ -> Empty.getDefaultInstance());
    }

One more thing, the replicas is 1 at the moment. I dunno how it can be scaled up. So, I raised another ticket to ask how to adjust / scale the instance. ==> Which command to scale up and down the replicas

The time to create an entity and the time to process a message could be significantly different. I’m not saying that is always the case, but it is common for performance analyses to “warm up” a system before conducting the test and measuring results.

For that matter, if you’re using a JVM language to interface with Kalix, you also need to warm up the JVM in your service. The first time through the JVM is interpreting byte code but eventually, that gets compiled to native code.

If you’re on a JVM language, you might want to give this a try:

  • Run 100 messages through the system first
  • Then turn on the recording of latency for subsequent requests
    OR:
  • use a GraalVM-generated container image that has pre-compiled everything to native code

Thanks for your reply.

In my test, I am using JVM language. I used two days to prepare the test and verify the test and I believe I have done the similar things as you mentioned. i.e.

  • Run 100 messages through the system first
  • Then turn on the recording of latency for subsequent requests

Actually, in the begining, i just tested very few messages only and it has already given me a bad result.

FYI, I have also tested in the debug mode using eclipse IDE. I expect there should be multiple threads as seen in the debug mode and the log outcome should be like this:

before sleep from entity #1
before sleep from entity #2
before sleep from entity #3
...expect here to wait for 2000 ms...
after sleep from entity #1
after sleep from entity #2
after sleep from entity #3

but the outcome is in sequence, not in parallel pattern. And I cannot see there are multiple threads stopped at the target breakpoint.

before sleep from entity #1
...actual finding shows here waiting for 2000 ms...
after sleep from entity #1
before sleep from entity #2
...actual finding shows here waiting for 2000 ms...
after sleep from entity #2
before sleep from entity #3
...actual finding shows here waiting for 2000 ms...
after sleep from entity #3

Also, I removed the sleep statement and tried to run 5000 requests using 5 threads group and each thread group loops for 1000 messages, and the result is not 100% in HTTP 200 response. I can see a high percentage in HTTP 500 response. Anyway, it seems that the server is incapable to handle multiple requests. To be honest, such test is really in low volume yet. I cannot imagine what happen if I tuned up the TPS.

By the way, another question is the command to scaling up/down the replicas (or is it application instances?).

Hey faichun,

Kalix does support multiple requests in parallel for different entities. Moreover, Kalix is actually meant to allow one to build high-performance and scalable APIs so such property is actually a fundamental premise.

I’ve done a small experiment with vegeta just to show requests being processed in parallel. I’ve run 10 different HTTP requests (different entities) for 30 seconds with a 30 req/s rate. As shown below, the total 900 requests are processed without issues even when the min request time is ~100ms thus proving that requests are being served in parallel, otherwise the service would only be able to serve ~10 req/s. Note also that I’ve purposely run this test with a very small amount of req/s.

➜  vegeta attack -duration=30s -rate=30 --targets=test | vegeta report --type=text
Requests      [total, rate, throughput]         900, 30.03, 29.93
Duration      [total, attack, wait]             30.072s, 29.967s, 105.109ms
Latencies     [min, mean, 50, 90, 95, 99, max]  103.225ms, 120.246ms, 104.661ms, 149.099ms, 208.95ms, 449.684ms, 709.55ms
Bytes In      [total, mean]                     37980, 42.20
Bytes Out     [total, mean]                     0, 0.00
Success       [ratio]                           100.00%
Status Codes  [code:count]                      200:900

Regarding the results you’re seeing, they seem to hint that the requests might be being sent in sequence.
I would advise trying to start with a smaller subset just to make sure you have the right configuration in place.

To be honest, such test is really in low volume yet. I cannot imagine what happen if I tuned up the TPS.

Bear in mind that any service running under trial account is subject to a limit of 100 req/s. As for any real use case, the platform will allow you to scale and cope with a much much higher load.

Thank you for giving my some idea.

However, what is the different between vegeta and jmeter?

I haven’t tried vegeta. Instead, I tried to verify my jmeter script using a tiny spring boot REST API application which was running in my local machine. I just send three requests in parallel (far more lower than a limit of 100 req/s, but the request needs to take 5s per each). It worked properly with expected log pattern (processed in parallel) in my spring boot REST API application.

However, when I updated the URL to be Kalix server instance, the log pattern showed that the requests were processed in sequence. So, I am sure that the jmeter script worked fine.

# this is a spring boot app to simulate the endpoint in kalix
# and verify the jmeter script which is running in my local machine.
@RestController
public class StubController {
    @PostMapping("/cart/{id}/items/add")
    public Mono<String> getUserWithDelay(@PathVariable("id") String id, @RequestBody String data) {
        return Mono.just(go(id)).delayElement(Duration.ofMillis(6000)).doFinally(c -> {
                System.out.println("--- after = " + id);
            });
    }

    private String go(String id) {
        System.out.println("--- before = " + id);
        return "EXAMPLE";
    }
}

Does it mean that the scaling option is disabled in trial account? Then, is there another kalix documentation talking about scaling?

I am still trying to verify which part in my test has problem. It may be from

  • jmeter script - seems no issue.
  • coding in my kalix app - i just added thread sleep in the shopping cart demo app. nothing special
  • kalix server - might be trial account issue?
  • or, any other place

I believe Kalix is powerful. But I have no idea so far on my parallel requests testing outcome.

I am still evaluating the capability of Kalix. Now, still sucking on the parallel calls issue.

Here is the log printout.

When I trigger 2 requests at the same time, I can see the log pattern which shows that 2 requests are executed in SEQUENCE, NOT PARALLEL.

{"timestamp":"2022-08-24T01:54:35.012Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntityProvider","message":"Constructing using DealEventSourcedEntityProvider.newRouter","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.014Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity.constructor   ; hashcode=562907054, entityId=B004, id=DEMO, uuid=51a24c4b-8944-4f2b-91dd-5f0aef22aace, state=INITIAL","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.014Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntityRouter","message":"Constructing DealEventSourcedEntityRouter(749500967); id=example.domain.DealEventSourcedEntity@218d47ae, entity.hashcode=562907054","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.016Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity.emptyState    ; hashcode=562907054, entityId=B004, id=DEMO, uuid=51a24c4b-8944-4f2b-91dd-5f0aef22aace, state=INITIAL","context":"default","severity":"INFO"}
 ::: at router ::: Submit
{"timestamp":"2022-08-24T01:54:35.017Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntity","message":"Sleep for 0 milliseconds in router(562907054).","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.017Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity(562907054).handleCommand with dealId=B004, stateMachine_1=1454870505, id=DEMO, uuid=51a24c4b-8944-4f2b-91dd-5f0aef22aace","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.019Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity(562907054).handleCommand with dealId=B004, stateMachine_2=1454870505, id=DEMO, uuid=51a24c4b-8944-4f2b-91dd-5f0aef22aace","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.020Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.action.DealSubmissionAction","message":"DealSubmissionAction(859494916).execute; id=DEMO, uuid=51a24c4b-8944-4f2b-91dd-5f0aef22aace","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:35.020Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.action.DealSubmissionAction","message":"Sleep for 6000 milliseconds in action(859494916).","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.039Z","thread":"kalix-akka.actor.default-dispatcher-7","logger":"example.domain.DealEventSourcedEntityRouter","message":"DealEventSourcedEntityRouter(749500967).handleEvent with dealId=B004 ","context":"default","severity":"INFO"}

{"timestamp":"2022-08-24T01:54:41.043Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntityProvider","message":"Constructing using DealEventSourcedEntityProvider.newRouter","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.047Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity.constructor   ; hashcode=193727518, entityId=B003, id=DEMO, uuid=ef27abf2-9165-4896-9e28-ec9c8ed5f232, state=INITIAL","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.047Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntityRouter","message":"Constructing DealEventSourcedEntityRouter(41224664); id=example.domain.DealEventSourcedEntity@b8c0c1e, entity.hashcode=193727518","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.048Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity.emptyState    ; hashcode=193727518, entityId=B003, id=DEMO, uuid=ef27abf2-9165-4896-9e28-ec9c8ed5f232, state=INITIAL","context":"default","severity":"INFO"}
 ::: at router ::: Submit
{"timestamp":"2022-08-24T01:54:41.049Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntity","message":"Sleep for 0 milliseconds in router(193727518).","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.049Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity(193727518).handleCommand with dealId=B003, stateMachine_1=330919301, id=DEMO, uuid=ef27abf2-9165-4896-9e28-ec9c8ed5f232","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.051Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntity","message":"DealEventSourcedEntity(193727518).handleCommand with dealId=B003, stateMachine_2=330919301, id=DEMO, uuid=ef27abf2-9165-4896-9e28-ec9c8ed5f232","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.053Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.action.DealSubmissionAction","message":"DealSubmissionAction(859494916).execute; id=DEMO, uuid=ef27abf2-9165-4896-9e28-ec9c8ed5f232","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:41.053Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.action.DealSubmissionAction","message":"Sleep for 6000 milliseconds in action(859494916).","context":"default","severity":"INFO"}
{"timestamp":"2022-08-24T01:54:47.070Z","thread":"kalix-akka.actor.default-dispatcher-9","logger":"example.domain.DealEventSourcedEntityRouter","message":"DealEventSourcedEntityRouter(41224664).handleEvent with dealId=B003 ","context":"default","severity":"INFO"}

Could anyone help to check whether there is any bug in the dispatcher process?

Note that there is a significant difference in your Kalix entity doing Thread.sleep which blocks a thread for that full duration vs the async delay in the Spring app delayElement, which is non-blocking and returns the thread to do other work while delaying.

This could explain what is going on as thread starvation in the JVM of your Kalix service, where all available threads being blocked by sleep would postpone any further requests, to any component, to be handled.

Note that in general entities are expected to be relatively quick about validating incoming commands and persisting and or returning a result.

For an entity doing a heavy computation (like the 6s sleep you have there) you may need to tune the thread pool of the service a bit (I don’t think we have any docs for this currently), or compose an Action doing the heavy work (or async delay) with a call to an entity as actions support asynchronous results if you want to mimic your Spring REST API snippet more closely (Described in Actions as Controllers :: Kalix Documentation)

1 Like

Thanks for the information.

In my test, I can see that the log shows the thread name as “default-dispatcher-7” and “default-dispatcher-9” for two requests. Does it mean that both dispatcher instance are sharing the same thread (but the thread name is not same) for processing so that the 2 request execution is in sequential?

Or, for example, in debug mode, I can see kalix-akka.actor.default-dispatch-4 is on hold due to my breakpoint. Then, should other dispatchers (i.e. kalix-akka.actor.default-dispatch-3, 6, 7, 8, 9) be ready another request use? Which logic inside the Kalix to distribute the work (i.e. incoming request or listener) to dispatcher?
Screenshot 2022-08-26 114155

Thus, it seems that we should not put the long processing logic inside the router’s handleCommand and the entity operation because they may not be multiple threads supported.

At this moment, I guess the root cause of my scenario may be similar to this article:

The different numbers in the threads means they are different thread, the scenario I was talking about was related to that the default configuration assumes non blocking code and sizes the thread pool relatively small, to conserve resources, this means that a few parallel Thread.sleeps could block all threads in the pool and cause further request processing to be delayed.

Given that in your shared log we see different thread names in a sequential fashion, pool starvation should not be the case. Could you perhaps share the sources of the service you are testing so we can take a look?

My source code is the Java Quick Start Shopping Cart (found in Kalix). Then, I just add a thread sleep in the method itemAdded in the entity.

Client is simple, just POSTMAN, and two requests are triggered from two different laptops.

Thanks, I can repeat, I think there is a bug somewhere causing this, I have created an issue in the JVM SDK repo to track the bug: Calls to same entity type sequential · Issue #1078 · lightbend/kalix-jvm-sdk · GitHub

1 Like

Thank you so much! :sweat_smile:

That’s a good news. I can have a better evaluation report to the technical lead in my team. Hope that it can be fixed soon.

:pray:t2:

We have now released Kalix Java/Scala SDK 1.0.6 with a fix for this.

Note that my previous mentions of the default small threadpool, which can easily be starved by entities doing Thread.sleep, still likely will be an issue for your test service unless you tune up thread pool sizes.

Nice, Let me test on that.

By the way, may I know which parameter setting can tune up thread pool sizes?

Note that I wouldn’t really recommend it, if the Thread.sleep is there to simulate CPU bound work, actual work would compete for few resources with many threads, and if it is about blocking IO, calling a remote service through a blocking client for example, if possible calling it through an async client will be more efficient.

If you still have to, it would be the default dispatcher of the internal actor system of Kalix, it is configured by creating a src/main/resources/application.conf file with a config block like this:

kalix.system.akka.actor.default-dispatcher {
    executor = "thread-pool-executor"
    thread-pool-executor = {
        fixed-pool-size = 16
    }
}

Note that this is just one example config and not a recommendation for a specific config, more options and details about the default-dispatcher config can be found under akka.actor.default-dispatcher in the reference config in Akka docs here: Default configuration • Akka Documentation

1 Like

Thank you for your advise.

I have verified the app behavior using new version and now the outcome matched the typical multi-threading behavior.

The use of Thread.sleep is used in the testing because we discovered the abnormality in the load test (no thread.sleep inside yet, just a simple shopping cart example only). Anyway, it is grateful we could discover this in early development stage. I agree actual work would compete for few resources with many threads. In this ticket, it seems that the resources got only one thread for processing. For my curiosity, if using Thread.sleep to simulate CPU bound work is not a recommended approach, may I know which approach / tooling is suitable to simulate the issue and discover the potential bug about threading.

1 Like

Good, thanks for letting us know about the fairness/parallelism issue so we could fix it!

1 Like