yoadsn
Hello,
We recently upgraded to server version 4.0.1 build 1513.1.3 from version 3.5 build 1428.4 (i know, i took us too long.)
We began experiencing a weird situation - Our web based application (Client version 5.0 build 1446.3) will wait for EOS command before loading the rest of the data from other streams (multi-level push).
The initial stream is in COMMAND mode, unfiltered. The data adapter is implemented in .net and is quite simple.
The problem is that sometimes - on very rare cases the client would not load past the initial stream as if EOS was never sent from LS. Debugging our JS code shows the ADD commands arrives but the assigned EOS callback is not called. The UPD commands that should arrive after the EOS also keep on comming.
This happens on very rare occasions as i said - perhaps no more than once every 1-2 weeks but when it happens it can be observed on any browser, on any machine which tries to load the web application.
connected clients are not affected since all streams appear to continue pushing just fine. (they are all COMMAND mode with snapshot though). Monitoring the server at the time shows nothing special, load is not necessarily the highest, and aside from outstanding NIOQueueWaitTimes which can be observed all the time - all is normal.
JMX interface has not revealed any important things while the problem occured. Client with the problem seemed subscribed to the item in problem, but the other cummulative counters did not show any data about snapshots being sent... maybe i missed something.
I have 3 suspects:
1. The JS code we have written has some exception being thrown in the callback of EOS - i could not confirm that yet and could not find a scenario when that would happen - but i do not rule that out.
2. My data adapter may do something funny with the snapshot creation method - but it seems the problem doesn't happen on the first subscriber - but rather while some clients are already subscribed (so the snapshot creation method is probably not called at the time of problem.. i could not confirm that yet).
Anyway - i tried doing funny things on purpose to reproduce the problem (sent garbase event sequence, throw in the middle of building the snapshot calls without terminating the snapshot, sending deletes with the snapshot, and what ever - LS kept sending EOS command eventually).
3. Lightstreamer may have some bug that causes the drop of the EOS command on some extreme situation - i can see in the change log from version 3.6 build 1463 (i remind you, this version is still newer than what we had) this:
"Removed the limitations posed by <max_buffer_size> for events waiting to be sent to the client, if they belong to a COMMAND mode item snapshot.
Fixed a bug that affected items requested in COMMAND mode, with snapshot and unfiltered dispatching. In that case, the end-of-snapshot notification was not sent to the clients."
which indicates something was done around that area...
I actually cannot believe LS has a bug of that kind - since it is known to be the most stable part of our system :Smile_Ab: but still - i must consider that option as well.
How does the problem go away?
- Well we don't actually now
- restarting LS as far as i have seen may make it go away for a while but it returns after few client subscribe to the stream (not sure) some of them get EOS - the rest will not.
- Disconnecting some clients may have done the trick (restarting LS can do that...)
- I am not sure but i think it may go away after 10-20 minutes if i do nothing, did not confirm that yet.
- Restarting the data adapter service causes a restart to LS - so i could not tell if that may have solved it.
- Strangely enough - when i stop the Web server (IIS in our case) it usually makes it go away. but i cannot imagine why would that have to do with it. Maybe it causes some client to disconnected as stated above.
It happened only 3 times on two different server by now but we actually cannot continue working on any server until we find the solution. We have seen it on Vivace and Moderato.
Since i cannot reproduce the problem, i cannot test too many situation - and this is why i consult you experts.
Thanks,
Yoad.
Dario Crivelli
If you can observe, on the data flow coming out of LS Server, that no EOS event is sent before some DELETE or UPDATE event (and the item was subscribed to with the snapshot flag specified), then this would be a Server bug; certainly not a Data Adapter issue.
If you were able to reproduce the issue on one client while recording the Server output, we could help you analyzing the output to ascertain whether or not the above happened.
To this purpose, you could either take a full network capture (if feasible) or take the Server log with the "LightstreamerLogger.pump" logger set at DEBUG level;
then you should identify the involved client by IP address and/or port numbers.
Note that with the latter method (i.e. taking the Server log), if no inconsistency were found, we would still be left with some doubts about the final writing phase.
For the moment, you can test the isSnapshot flag on the received updates and detect the first update which does not belong to the snapshot.
This would only leave it unmanaged the case in which no update comes after the snapshot for a significantly long time.
yoadsn
Hey,
We have found the cause of this problem - it was a javascript problem and the EOS command did arrive as expected.
I Always knew LS is flawless :Smile_Ab:
Thanks for all the help,
Yoad.