Need to reduce overhead on a round-trip message/response

kpturner

I have been using a SmartDataProvider to perform a task which basically does this:

-> Subscription with one data element (field)
-> sendMessage from client
-> Message received by MetadataAdapter and passed to data adapter
-> Response (of about 20000 bytes) returned to client using smartUpdate

Although the process is working, I have noticed that when the response is consistently the same I get a round-trip of between 150-200ms. It seems quite a lot, but it is acceptable. However, if the response is different each time (albeit still around 20000 bytes) the round trip is consistently about 120ms longer.

This seems to indicate that somewhere along the line some sort of check is being carried out on the data being returned to see if it has changed. Is the possible? If so, is there any way I can avoid that check? I can see why this would be happening/necessary in a normal subscription where (for example) you only want the client to be bothered if the data changes, but in this instance I want the data regardless, and I don't want any overhead at all.

I am using DISTINCT on my subscription, but I haven't yet got my head around the difference between DISTINCT, MERGE, COMMAND etc - so this could be one thing I am doing incorrectly. I am also using a SmartDataProvider, and perhaps for this I need to use a "LessSmart"DataProvider.

Can you give me any idea what I could/should be doing differently in this case?

Many thanks.

Alessandro Alinone

My colleagues will give you an answer tomorrow, but I was curious to know if you are performing the test locally. In such case, a 150ms roundtrip looks like a huge time. Consider that when not under heavy stress, the upper latency limit should be equal to <max_delay_millis> as configured in lightstreamer_conf.xml, which, by default, is 30ms.

kpturner

Hello Alessandro,

Your answer has me a bit worried that I have messed something up. I am not testing locally as such. The client is obviously running on my PC, and Lightstreamer is running on a very powerful IBMi partition (JDK 7). It is on the same subnet (so not over the internet).

I am effectively having to use Lighstreamer to emulate a XHR request for comparison purposes. I know this is not what it is usually meant for, but my benchmarking includes this test.

I am effectively creating a connection, a subscription (DISTINCT) and a listener on the subscription. I then send a querystring using sendMessage and my data adapter passes this to an RPG program (via a data queue) that generate about 20000 bytes of static HTML with embedded script etc. This is passed back to the client using smartUpdate, and when onItemUpdate fires on my listener, some script uses jQuery to replace some elements of the DOM with the HTML etc received.

It works - so good. However, if I cache the response in the data adapter so that subsequent calls completely omit the RPG call and just sends back the previous response, I still get a round trip averaging around 150-170ms. I am sending a tiny querystring of about 200 bytes and returning about 20000 bytes. Why does it take so long? :Smile_Ac: My timer starts when the sendMessage() is called, and stops when onItemUpdate() fires - so the fiddling around with the response is omitted from the timer completely.

My metadataadapter and dataadapter are fairly trivial. I could include the code if it helps, but I don't want to clutter up the thread yet unless you think it is necessary. I am hoping I have a big mistake there that explains it.

Even my simple echo server takes about 120ms to echo a message back.

kpturner

Sorry, we encountered an error while displaying this content. If you're a user, please try again later. If you're an administrator, take a look in your Flarum log files for more information.

Mone

hi,

Yes the max_delay_millis has an impact on the minimum delay as it represents the longest delay that the Server is allowed to apply to outgoing updates in order to collect more updates in the same packet.

As per the problem you're facing, I would start to verify that the delay is not inside the server, if you have the LightstreamerLogger.requests lo category set to INFO and LightstreamerLogger.subscriptions and LightstreamerLogger.push log categories set to DEBUG you can trace when the message reaches the server when the response is sent back from the adapter to the server and when the response is written out to the client. All of the exceeding delay is spent on the net and on the client.

As per the client part, I just discovered that there is a sort of bug in the client that is triggered when the client is executed from file:/// that slows the execution of internal tasks; Do you load your test pages from file:/// ?

I'm also introducing another speed-up in the client and I'm going to send you via private message a download link that points to the fixed & improved client.

HTH

kpturner

I have the client fix thanks. Now I will modify my logging and see what I can fathom out on the simple echo server. I guess a round trip of 150-170ms for a simple echo is not expected?

I am not quite clear on the max_delay_millis config option despite your explanation. If I have expected longish roundtrips of 300ms (for example), should I change this value to be higher than that? What are the implications of making it too high (or too low)?

kpturner

I forgot to answer your question - no I am not using file://

I set the logging as instructed in lightstreamer_log_conf.xml but it made no difference whatsoever to the output from the server. The log file for my adapters seems to behave OK though (I can switch between INFO, DEBUG etc successfully with my adapters). Have I changed the logging in the wrong place?

<logger name="LightstreamerLogger.subscriptions" level="DEBUG"/>



<logger name="LightstreamerLogger.push" level="DEBUG"/>

EDIT: Ignore that - I am looking at the console output! Let me check the other logs.

kpturner

OK - so I have a request/response cycle that took 313ms from the "sendMessage()" to the "onItemUpdate()" on the listener.

The log is shown below. This seems to indicate that the request was received at 10:19:32,710 and the push took place at 10:19:32,832 -> 122ms
So that is 191ms on the network and in client.

Other than length of time on the network/client, something else puzzles me: The output during this test is just a cached global string. My data adapter is doing virtually zero work other than to echo back 20000 bytes. So what accounts for the 119ms here? It seems to be related to the size of the response? Is it the execution of the smartUpdate or the logging itself?

Log

Pragma: no-cache
Expires: Thu, 1 Jan 1970 00:00:00 GMT
Date: Thu, 13 Dec 2012 10:19:27 GMT
Content-Length: 86
Access-Control-Allow-Origin: http://rnsdev:8081
Access-Control-Allow-Credentials: true

setPhase(7543);start('Sb60d90daf0c2c74T1553581', null, 19000, 50000);loop(0);end( );

13-Dec-12 10:19:32,710|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 8 |Serving request: /lightstreamer/msg.js --> LS_message=<--querystring here--> on "Lightstreamer HTTP Server" from 10.2.0.167:59712
13-Dec-12 10:19:32,711|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 8 |Sending message to session: S2e472b1dd69c6329T1553581 on "Lightstreamer HTTP Server" from 10.2.0.167:59712
13-Dec-12 10:19:32,712|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> HTTP/1.1 200 OK
Server: Lightstreamer/5.0 build 1576 (Lightstreamer Push Server - www.lightstreamer.com) Vivace edition
Content-Type: text/javascript; charset=iso-8859-1
Cache-Control: no-store
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 1 Jan 1970 00:00:00 GMT
Date: Thu, 13 Dec 2012 10:19:32 GMT
Content-Length: 119
Access-Control-Allow-Origin: http://rnsdev:8081
Access-Control-Allow-Credentials: true

setPhase(3703);start('S2e472b1dd69c6329T1553581', null, 19000, 50000);c(4,3,"UNORDERED_MESSAGES",0,0);loop(0);end( );

13-Dec-12 10:19:32,713|DEBUG|LightstreamerLogger.subscriptions|Thread-38 |INCOMING DATA for RPC.CXRMCJF6ZP.S2e472b1dd69c6329T1553581 --> {response=Content-type: text/html

<-----Approximately 20000 bytes of data from application logged here --->
}

13-Dec-12 10:19:32,831|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 6 |Polling (42) to session: S2e472b1dd69c6329T1553581 on "Lightstreamer HTTP Server" from 10.2.0.167:59693
13-Dec-12 10:19:32,832|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> HTTP/1.1 200 OK
Server: Lightstreamer/5.0 build 1576 (Lightstreamer Push Server - www.lightstreamer.com) Vivace edition
Content-Type: text/javascript; charset=iso-8859-1
Cache-Control: no-store
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 1 Jan 1970 00:00:00 GMT
Date: Thu, 13 Dec 2012 10:19:32 GMT
Content-Length: 96
Access-Control-Allow-Origin: http://rnsdev:8081
Access-Control-Allow-Credentials: true

setPhase(3705);start('S2e472b1dd69c6329T1553581', null, 19000, 50000);d(6,1,1);loop(0);end( );

Dario Crivelli

The <max_delay_millis> setting allows batching of updates in a single packet.
The batching involves updates for different items and, depending on the kind of subscription, also subsequent updates for the same item.
The delay caused by this batching operation depends on the whole sequence of updates;
you can only expect that the delays will be distributed between 0 and the <max_delay_millis> value chosen.

So, for a test like yours, you should set it to 0, so it will not alter the flow.
In the final application, depending on the expected throughput on your sessions, you may decide to set it higher, to reduce the Server and network overload at the expense of some additional delay on the updates.

kpturner

Dario Crivelli The <max_delay_millis> setting allows batching of updates in a single packet.
The batching involves updates for different items and, depending on the kind of subscription, also subsequent updates for the same item.
The delay caused by this batching operation depends on the whole sequence of updates;
you can only expect that the delays will be distributed between 0 and the <max_delay_millis> value chosen.

So, for a test like yours, you should set it to 0, so it will not alter the flow.
In the final application, depending on the expected throughput on your sessions, you may decide to set it higher, to reduce the Server and network overload at the expense of some additional delay on the updates.

Dario

Is the setting global or can it be set at an adapter level? If I use lighstreamer to replicate XHR transactions (as in the above example) I may well want to set it to zero, but for other more standard uses of Lightstreamer (like ordinary feeds, chat IM etc) I would not want it set to zero.

Also, is this setting value a side issue to the strange delay I am seeing, or a probable cause of it?

Thanks
Kevin

kpturner

Here is a second test, using the exact same code, but where the cached response is a trivial size. The round trip took 74ms, and here athe log is shown below. The time spent between the request and the response is a mere 32ms in comparison - although still in excess of <max_delay_millis>

So it does seem to point to some issue regarding the size of the response.....or you may say it is expected (although 20000 bytes does not strike me a massive amount of data).

13-Dec-12 10:46:34,998|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 7 |Serving request: msg -> LS_message=<--querystring-->on "Lightstreamer HTTP Server" from 10.2.0.167:60056
13-Dec-12 10:46:35,002|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 7 |Sending message to session: S72cd3ecfd5e822e0T4604590 on "Lightstreamer HTTP Server" from 10.2.0.167:60056
13-Dec-12 10:46:35,010|DEBUG|LightstreamerLogger.subscriptions|Thread-36 |INCOMING DATA for RPC.CXRMCJF6ZP.S72cd3ecfd5e822e0T4604590 --> {response=Trivial response}
13-Dec-12 10:46:35,020|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> d(3,1,'Trivial response');

13-Dec-12 10:46:40,030|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> p();

13-Dec-12 10:46:44,324|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> HTTP/1.1 200 OK

Dario Crivelli

The log shows that the involved session is in polling, which obviously may alter the times further.
Was it an unlucky case or is it possible that your test environment does not support streaming?

Please check that first.
The establishing of a polling session due to the Stream-Sense mechanism should be visible in the previous part of the log.
Can you confirm that, even in your testing environment, some proxy or similar intermediate layer is involved?

UPDATE: the above only refers to the 12:39 post.

kpturner

Dario Crivelli The log shows that the involved session is in polling, which obviously may alter the times further.
Was it an unlucky case or is it possible that your test environment does not support streaming?

Please check that first.
The establishing of a polling session due to the Stream-Sense mechanism should be visible in the previous part of the log.
Can you confirm that, even in your testing environment, some proxy or similar intermediate layer is involved?

UPDATE: the above only refers to the 12:39 post.

Sorry our messages are crossing. The previous part of the log is shown below.

Streaming is supported (according to the status updates on my connector) and there is no proxy or intermediate layer. I am testing on a PC to an IBMi server on the same internal subnet.

13-Dec-12 10:19:08,138|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 6 |Polling (11) to session: Sb60d90daf0c2c74T1553581 on "Lightstreamer HTTP Server" from 10.2.0.167:59696
13-Dec-12 10:19:20,118|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> HTTP/1.1 200 OK
Server: Lightstreamer/5.0 build 1576 (Lightstreamer Push Server - www.lightstreamer.com) Vivace edition
Content-Type: text/javascript; charset=iso-8859-1
Cache-Control: no-store
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 1 Jan 1970 00:00:00 GMT
Date: Thu, 13 Dec 2012 10:19:20 GMT
Content-Length: 87
Access-Control-Allow-Origin: http://rnsdev:8081
Access-Control-Allow-Credentials: true

setPhase(3701);start('S2e472b1dd69c6329T1553581', null, 19000, 50000);loop(0);end( );

13-Dec-12 10:19:27,169|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> HTTP/1.1 200 OK
Server: Lightstreamer/5.0 build 1576 (Lightstreamer Push Server - www.lightstreamer.com) Vivace edition
Content-Type: text/javascript; charset=iso-8859-1
Cache-Control: no-store
Cache-Control: no-cache
Pragma: no-cache
Expires: Thu, 1 Jan 1970 00:00:00 GMT
Date: Thu, 13 Dec 2012 10:19:27 GMT
Content-Length: 86
Access-Control-Allow-Origin: http://rnsdev:8081
Access-Control-Allow-Credentials: true

setPhase(7543);start('Sb60d90daf0c2c74T1553581', null, 19000, 50000);loop(0);end( );

Dario Crivelli

The log in the 1:00 post shows a streaming session, so your environment is adequate;
in the previous case there may have been some side effect.
So, the "big message" test was in polling, whereas the "small message" test was in streaming and this explains the difference;
I suppose this has happened by pure chance, as the session is established before any message is sent.

Please retry the "big message" case and ensure in advance that the session is established in streaming.
If in doubt, you can detect that by looking at the monitor console, assuming that the test session is the only one running.

I also suggest you getting rid of the <max_delay_millis> (i.e. setting it to 0).
Note that the effect of the <max_delay_millis> on a single update is not predictable, because it depends on the preceeding updates for the same session.
Unfortunately, the setting is general and cannot be associated to the Adapter Set.

kpturner

A third test, back with the large response again, and I only see 3ms on the server and a total round trip of 180ms. So that shoots a big hole in theory regarding the size of the response.

The polling looks like it could be a different session. When does polling start? Is it something that happens when a client disconnects unexpectedly, or something that happens when a client loses connection to the server temporarily? I had to stop and restart my server to introduce changes, and I had other test clients hanging around.

13-Dec-12 11:27:45,821|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 6 |Serving request: msg -> LS_message=<--querystring--> on "Lightstreamer HTTP Server" from 10.2.0.167:60493
13-Dec-12 11:27:45,822|INFO |LightstreamerLogger.requests |SERVER POOLED THREAD 6 |Sending message to session: Sf1c5cddd71b4dc23T2354509 on "Lightstreamer HTTP Server" from 10.2.0.167:60493
13-Dec-12 11:27:45,823|DEBUG|LightstreamerLogger.subscriptions|Thread-44 |INCOMING DATA for RPC.CXRMCJF6ZP.Sf1c5cddd71b4dc23T2354509 --> {response=<--20000 byte response-->
}
13-Dec-12 11:27:45,824|DEBUG|LightstreamerLogger.push |PUMP POOLED THREAD 8 |RELEASING DATA --> d(1,1,1);

Dario Crivelli

Please ensure that the client is using the fixed version of the library sent to you by Mone.

About the polling: there are a few cases in which the client switches to polling.
Usually, it is chosen at session startup, if the attempt to setup a streaming session fails.
This allows the client to adapt to different environments;
as long as the client is in the same environment you should not expect it to choose differently upon each run, unless there are temporary network issues.
Only in the latter case, could a temporary disconnection also give rise to polling.
Client slowness, on the other hand, may cause the client to switch to polling.

What happened to you is not clear, but it may have been a side effect due to your testing activity.
If you see polling occur again, please provide us with the whole log, since Server startup, for a check.

kpturner

Dario Crivelli Please ensure that the client is using the fixed version of the library sent to you by Mone.

Yes it is the fixed version I think. It says Version 6.0 Build 1585

I think the polling thing was just a symptom of my testing. I have shaved a bit of time off the round-trip now on the server side also.

To be honest, I am having doubts about the validity of trying to compare emulated XHR requests using web sockets with real XHR requests using HTTP. Although we would definitely use something like Lightstreamer for data feeds, IM, Chat and replacing long polling, I am not sure we would use it for adhoc requests for data like this.

Lets take the jqGrid for example. Your demo has this updating via a feed which is fine for websockets. However, a normal jqGrid would load into the DOM and then make a request for the first page of data. When the user pages up, it requests another page of data and so on. This request is typically a normal XHR for a JSON from the server. Would there be any reason to do this via a request/response web socket transaction instead? My experience of testing so far seems to indicate that there is nothing to be gained here, because I am using the Lightstreamer client/server for something it was not intended for.

What would your view be on this?

Alessandro Alinone

Yes, generally speaking, separation of concerns is always a good architectural pattern. Lightstreamer has been designed and optimized for asynchronous (pub/sub) communication, while Web servers have evolved for synchronous (request/response) communication. Usually the two servers sit along in full synergy, each one taking care of the aspects it is specialized for.

That being said, there can be cases where you might want to leverage Lightstreamer even for request/response. Perhaps because you want to keep all your server-side logic inside a single adapter, or because you need a law latency link to send messages from the client to the server. With XHR, you might stumble into connection setup latency (especially high with SSL), while with Lightstreamer the link is kept always open, reducing round-trip latency.