johnffoley
Hi,
We have 5 instances of LightStreamer running on a server (each using a different interface connection)
During performance testing we have started receiving a "no more processes" error which locks up the server.
I increased the number of file descriptors and reran the test but this was not the cause:
descriptors 131072
Memory and CPU on the server is fine, we can replicate the problem outside of market hours suggesting the issue is not update driven.
We can connect 5K users total with smaller numbers of symbols requested but when we increased the number of symbols per connection (20 security and 10 News stories) we only hit 2K total across the 5 instances before we encounter this issue.
Have you seen this problem before or if not could you offer any advice as to where we should look next for root cause?
Here is an example monitor text log entry close to breaking point:
13:23:14 < INFO> LightstreamerMonitorText a - Total threads = 5873, Total memory = 448135168 (free = 295004696), Connections = 405 (max = 1131), New connections = [+0, -0], Active sessions = 405, New sessions = [+0, -0], In pool threads = 148, Active threads = 0, Inactive threads= 148, Queued events = 0, NIO write queue = 0, NIO write queue wait = 0, NIO active write selectors = 64, NIO total selectors = 192, Subscribed items = 6123, New events per sec. = 386, Filtered events per sec. = 386, Bytes sent = 45048993, Bandwidth in use = 536.07 (max = 638.17, allocated = 0), Bandwidth per client = 1.32, Extra sleep = 0, Notify delay = 0, Delay evaluation = 0
Here is some ouptput from /proc/[PID]/status
Name: java
State: S (sleeping)
SleepAVG: 29%
Tgid: 30280
Pid: 30280
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 2048
Groups: 8002
VmPeak: 7868344 kB
VmSize: 7832352 kB
VmLck: 0 kB
VmHWM: 807816 kB
VmRSS: 794932 kB
VmData: 7732844 kB
VmStk: 88 kB
VmExe: 36 kB
VmLib: 69968 kB
VmPTE: 14688 kB
StaBrk: 4958a000 kB
Brk: 49a99000 kB
StaStk: 7fffe7e18310 kB
Threads: 6541
SigQ: 0/1064959
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000002
SigCgt: 1000000181005ccd
CapInh: 0000000000000000
CapPrm: 00000000fffffeff
CapEff: 00000000fffffeff
Cpus_allowed: 00000000,00000000,00000000,00000000,00000000,0000ffff,ffffffff,ffffffff
Mems_allowed: 00000000,0000000f
Alessandro Alinone
Hi John,
The log is very clear. There is evidently a big issue with your Data Adapter.
I see that the total threads are between 5800-6500. Consider that Lightstreamer Server in your case creates less than 400 threads (I guess your machine has 64 cores, right?). This means that your Data Adapter creates more than 6000 threads!
You might want to issue a thread dump to see the identity of all those threads and demonstrate the problem with the Adapter.
I also see huge memory consumption: roughly 8 GB used by the JVM, where only 450 MB are used by the Java heap. So it seems that your Data Adapter creates a lot of thread and consumes a lot of memory (apparently out of heap memory).
Furthermore, all the indicators of Lightstreamer Server performance are great. There are no queued events, the thread pools are completely idle, no delays are reported.
From a general perspective, I confirm that the traffic is so light that a monster machine like the one you are using should probably handle millions of users, not hundreds...
I hope that helps in nailing down your issue.
Cheers,
Alessandro