I see the first "Current delay" message at 10:32:08,036 in the log.
What did you notice before? The sockets in CLOSE_WAIT?
This is possible, because the "Current delay" messages are logged only for delays longer than 10 seconds, but slightly shorter delays could still cause some sockets to remain in CLOSE_WAIT for a few seconds
(this is more true with some old Client SDK versions, which is your case).
Anyway, if the Server reports delays in the internal thread pool, this is actually where you should focus.
And, as said, it is probably the notifyUser invocation that clutters the system.
The fact that the issue appears only in one of the two Server instances may or may not be significant.
I provided a possible explanation, assuming a different number of concurrent invocations of notifyUser in the two cases.
This has not been disproved.
There are two aspects:
- The number of javascript clients concurrently connecting may be more than in the Java case.
Only a log for the Java case could clarify.
- The javascript client may be more aggressive in retrying after a delay, causing a short delay to grow.
This depends on the two SDK versions and it is quite likely.
You didn't report the version of the Java client SDK in use, but since the Server is 5.1.1, the version should be 2.5.
So, I confirm that the javascript clients perform automatic retries, whereas the Java clients don't (unless you enforced such retries in custom code).
So, for now, our suggestion is to focus on notifyUser, rather than on the differences between the two installations.
And, as said, you could gather important clues by taking a thread dump of the Server JVM after you see the first occurrences of the "Current delay" message; we can help analyzing the dump.
If you find any bottleneck in notifyUser, you can make it more efficient.
Otherwise, we can revise the configuration of the thread pools.