Memory leak in FolderObserver?

TheNetStriker · October 6, 2017, 2:25pm

The memory usage of the OpenHab 2 process on my Raspberry (Openhabian) is constantly increasing. If there is no more memory left the process is killed by the java oom-killer. I’ve created a memory dump and analyzed it with the “Eclipse Memory Analyzer”. The leak suspect reports suspects “org.eclipse.xtext.util.OnChangeEvictingCache$CacheAdapter” as the leaking class. This class is used by the WatchQueueReader service in the FolderObserver. Is there a way to disable the FolderObserver so I could watch if this stops the memory leak?

Here is a screenshot from the leak suspect report:
LeakSuspects

Kai · October 6, 2017, 3:55pm

29 instances of that class does not really sound that bad.
Can you confirm that this number is ever increasing, even without doing any changes to the files in the file system?
If you have a screenshot where it shows 1000 instances and your memory is shortly before exhaustion, this would be a more convincing argument for a resource leak

TheNetStriker · October 12, 2017, 11:50am

Hi Kai, sadly at the moment I only have a memory snapshot from an Openhab that is running for a few days.It takes almost a month until the memory of my Raspberry is filled and the process is killed. I will try to take a snapshot shortly before the memory is completeley filled.

TheNetStriker · November 2, 2017, 1:35pm

Hi Kai, I’ve montiored the ram usage for some time now and I’m not sure if the way I analyzed the memory dump is correct. My Openhab Instance is now at 711 MB total memory regarding Webmin. (I think that includes the swap memory) But if I aquire a memory dump the file size of the dump is only 185 MB. Am I doing something wrong or is normal that the dump is smaller than the actual memory usage?

TheNetStriker · November 9, 2017, 5:33pm

One Week later and my OH process ist at over 800 MB. Not long now before the os will kill the process again. Any idea how to get a proper memory dump of the whole process?

Kai · November 9, 2017, 8:23pm

I’d suggest to try out jmap to take the heap dump and jvisualvm for analysing it, see https://stackoverflow.com/questions/15130956/how-to-analyse-the-heap-dump-using-jmap-in-java.

TheNetStriker · November 9, 2017, 8:57pm

Thanks for the hint. I just used VisualVM and I also created a histogram with jmap. Here are the top classes from the histogram:

num instances #bytes class name

1: 366331 29644944 [C
2: 56971 26951128 [I
3: 19534 10504840 [B
4: 4114 9599240 [Ljava.nio.ByteBuffer;
5: 84781 6815184 [Ljava.lang.Object;
6: 85741 5487424 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask
7: 337647 5402352 java.lang.String
8: 165799 3979176 java.util.HashMap$Node
9: 38375 3132656 [Ljava.util.HashMap$Node;
10: 132490 2119840 java.util.concurrent.LinkedBlockingQueue$Node
11: 58670 1877440 java.util.LinkedHashMap$Entry
12: 19006 1843304 java.lang.Class
13: 74976 1799424 java.util.ArrayList
14: 4434 1667184 java.lang.Thread
15: 39517 1580680 java.util.HashMap
16: 37369 1494760 java.util.WeakHashMap$Entry
17: 49878 1327256 [Ljava.lang.String;
18: 8067 1161360 [[I
19: 35223 1127136 org.eclipse.emf.ecore.util.EObjectContainmentEList

Most of the space seams to be used by chars. (But I still don’t get why this 30 MB are resulting in 800 MB ram usage)

I also did take a look at the strings using VisualVM. I am no expert in analyzing memory snapshot, but my interpretation would be that there are a lot of strings that are not referenced anymore but aren’t garbage collected either. Here is a screenshot of VisualVM:

But as I said, I’am no expert in analyzing memory snapshots. Would it be possible that I upload a memory snapshot somewhere for you to take a look at?

Kai · November 10, 2017, 7:13am

I am no expert either and I cannot promise to find any time soon to go into a deeper analysis myself . Nonetheless, feel free to share the dump with me and I will see what I can do.

As you used MAT before, you should be able to load this dump into MAT - maybe this can give you now another “suspected leak” information based on it?

TheNetStriker · November 10, 2017, 11:22am

Ok thanks, I will send you the link via private message.

I now also setup a JMX connection to the OpenHab instance. This will provide some additional tools to analyze the memory. For instance I can now see which thread uses how much memory.

But I think the most features for analyzing the memory would be available if I could run VisualVM directly on the hardware where openhab runs. When debugging over JMX the only tool to analyze the memory is the “Sampler” tool. But if the application runs local there is an additional modul for “Profiling”.

I did a short test with some local java processes with the “Profiling” tool and I was able to right click on the char type and then display which classes are using the most char instances. This would be perfect to analyze my memory problem. But sadly my OpenHab runs on an Openhabian wihout a GUI. Or is it maybe possible to enable an X-GUI on Openhabian?

Kai · November 10, 2017, 2:06pm

Thanks. Having a brief look at it, there isn’t really anything worrying inside - merely 70MB live objects, the rest should be collectable through GC. Did you try to manually trigger a GC through jVisualVM? What heap size is shown immediately after this?
Regarding getting a GUI on openHABian, I cannot really give you advice. Maybe @ThomDietrich can comment. But having remove JMX access should be a good start as well. Afaik, jVisualVM can use a JMX remote connection, could you try that?

sjka · November 10, 2017, 2:16pm

You’d have to keep in mind that there is a fundamental difference between the Java “heap” and the memory a JVM process occupies in the OS. Basically what the OS reports is how much the JVM process reserved in the OS. The JVM uses this memory for many purposes: Running itself, running the GC, loading classes, calling native shared libraries, and also keeping Java objects in memory (in the heap).

Now, if you acquire a heap dump, only those portions of the heap are written out which actually contain some Java objects. It would be a perfectly valid case that the JVM let the heap grow (i.e. the space that it reserved for it at the in the OS layer) but actually doesn’t need it all, so it keeps it more or less sitting there, mainly empty. The OS of course doesn’t know this. But if you look at the numbers in VisualVM, you will see a huge heap which is only used by a fraction. It is allowed to grow the heap until the size given by the -Xmx JVM parameter. If this amount is full and cannot get garbage-collected anymore (because of a Java memory “leak”), then it would start to fail, throwing OutOfMemory errors and you would also see a huge heap dump. Which obviously is not the case here, right?

Of course there could be all kinds of other memory leak, e.g. in some native shared libraries or within the JVM implementation itself. However, as you are using a standard openHABian image this sounds pretty unlikely. I’d first try an check if maybe the JVM was allowed to grow beyond what is physically available.

Could you gather some more stats?
(The <PID> would be the process ID of openHAB which either linux in ps aux or jps tell you)

# process statistics
cat /proc/<PID>/status

# physical memory situation
free -h

# garbage collector statistics
jstat -gccapacity <PID>
jstat -gcutil <PID>

# JVM arguments
jcmd <PID> VM.command_line
jcmd <PID> VM.flags

# some more internal details (not sure though if that works remotely)
jmap -heap <PID>

TheNetStriker · November 10, 2017, 2:31pm

Hi Kai, I tried to start garbage collection from command line once using this command:

jcmd GC.run

But that didn’t change anything.

TheNetStriker · November 10, 2017, 2:39pm

Hi Simon, If my Openhab process lives long enough to use almost all memory on the RPI it seams to be killed from the os. (I found hints regarding this in the messages log) I don’t remember finding a memory dump or OutOfMemory errors from Java the last time this happend.

I also did run your commands, but as I said earlier I had to restart Openhab to enable the JMX debugging. So I’am currently back to 50% memory usage. But maybe the data helps anyway. Here is the output:

# process statistics
cat /proc/1161/status

Name:   java
Umask:  0022
State:  S (sleeping)
Tgid:   1161
Ngid:   0
Pid:    1161
PPid:   626
TracerPid:      0
Uid:    109     109     109     109
Gid:    114     114     114     114
FDSize: 512
Groups: 5 20 44 50 100 114 997 998
NStgid: 1161
NSpid:  1161
NSpgid: 626
NSsid:  626
VmPeak:   554580 kB
VmSize:   552044 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    371412 kB
VmRSS:    367576 kB
RssAnon:          359296 kB
RssFile:            8280 kB
RssShmem:              0 kB
VmData:   479668 kB
VmStk:       132 kB
VmExe:         4 kB
VmLib:      9456 kB
VmPTE:       494 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        222
SigQ:   0/7695
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000010000000
SigCgt: 2000000181005ccf
CapInh: 0000000000000000
CapPrm: 0000000000003400
CapEff: 0000000000003400
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Seccomp:        0
Cpus_allowed:   f
Cpus_allowed_list:      0-3
Mems_allowed:   1
Mems_allowed_list:      0
voluntary_ctxt_switches:        1
nonvoluntary_ctxt_switches:     9

# physical memory situation
free -h

             total       used       free     shared    buffers     cached
Mem:          970M       867M       103M       3.0M       159M       227M
-/+ buffers/cache:       480M       490M
Swap:          99M        10M        89M


# garbage collector statistics
jstat -gccapacity 1161
 NGCMN    NGCMX     NGC     S0C   S1C       EC      OGCMN      OGCMX       OGC         OC       MCMN     MCMX      MC     CCSMN    CCSMX     CCSC    YGC    FGC
  5440.0  83264.0  76992.0 7680.0 7680.0  61632.0    10944.0   166592.0   152840.0   152840.0      0.0  60720.0  59824.0      0.0      0.0      0.0   3427     9

jstat -gcutil 1161
  S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT
  1.99   0.00  62.48  95.76  92.65      -   3428   71.883     9    7.379   79.262

# JVM arguments
jcmd 1161 VM.command_line
1161:
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208)
        at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147)
        at sun.tools.jcmd.JCmd.main(JCmd.java:131)

jcmd 1161 VM.flags
1161:
com.sun.tools.attach.AttachNotSupportedException: Unable to open socket file: target process not responding or HotSpot VM not loaded
        at sun.tools.attach.LinuxVirtualMachine.<init>(LinuxVirtualMachine.java:106)
        at sun.tools.attach.LinuxAttachProvider.attachVirtualMachine(LinuxAttachProvider.java:63)
        at com.sun.tools.attach.VirtualMachine.attach(VirtualMachine.java:208)
        at sun.tools.jcmd.JCmd.executeCommandForPid(JCmd.java:147)
        at sun.tools.jcmd.JCmd.main(JCmd.java:131)

# some more internal details (not sure though if that works remotely)
jmap -heap 1161
Usage:
    jmap -histo <pid>
      (to connect to running process and print histogram of java object heap
    jmap -dump:<dump-options> <pid>
      (to connect to running process and dump java heap)

    dump-options:
      format=b     binary default
      file=<file>  dump heap to <file>

    Example:       jmap -dump:format=b,file=heap.bin <pid>

The jcmd commands sadly did not work. What do I have to enable to get them working?
Also my jmap command only seams to have the option for histo and dump.

sjka · November 10, 2017, 3:08pm

Okay, that’s not really helping then now. Let’s wait until it is almost critical again and then have another look.

ThomDietrich · November 10, 2017, 4:43pm

Hey guys,
I did not go through the whole thread but to just answer on the question regarding a GUI on openHABian, you just have to execute one command. Read about it here: http://docs.openhab.org/installation/openhabian.html#headache

TheNetStriker · November 10, 2017, 8:07pm

Do I have to add something more to the JAVA_OPTS to get jcmd working? I would like to do this now because I would have to restart the process again.

TheNetStriker · November 20, 2017, 5:09pm

Hi Kai, my OH2 instance is now running for 10 days and memory usage is now at 676.20 MB. The virtual memory of my Rasperry is now again at 99% and physical memory is at 67% and increasing.

I now connected via VisualVM directly to the JMX port and found out that all three “ESH-thingHandler” threads are using a lot more memory than after the restart. Here is a screenshot shortly after the restart:

mem1

And here is the screenshot from now:

mem2

Running a manual garbage collection did not make any difference.

Any ideas what could cause this issue?

Kai · November 20, 2017, 11:03pm

This looks pretty weird. The thingHandler threads are actually from a shared thread pool. I would expect that once the jobs that are submitted to the pool are done, all their memory would be released. Probably some binding does nasty things in their jobs then. Which bindings are you using? I see Z-Wave on your list, but what else do you run?

@sjka Any advice on how to get useful info out of @TheNetStriker’s system?

TheNetStriker · November 20, 2017, 11:17pm

Hi Kai, I guess that only Openhab 2 bindings are relevant. This are the ones I’am using:

Kodi
Z-Wave
Systeminfo
Onkyo
Exec
Samsung TV
HdmiCEC (one of my own, for code please see here)
ZigBee (not used at the moment but installed)
Hue

So there are quite a lot of suspects. Any idea how to find out which one causes the problem? Would maybe stopping them release the memory?

sjka · November 21, 2017, 7:47am

That indeed looks like some trouble within a binding’s thing handler. And as your java heap was pretty small and looked unsuspicious last time I would guess a binding is spilling native memory somehow (e.g. by not properly closing thread/processes/whatever…).

As Kai already said: The process returns as soon as the call to the binding returns. However, if during the execution within this thread pool some memory gets allocated, it also gets accounted to this thread. It doesn’t mean this thread holds the memory itself, it could be e.g. a member variable within the ThingHandler which e.g. keeps a process open or anything like that.

Any idea how to find out which one causes the problem?

With the limited tools on the RPi I think the easiest way would be to exclude them one by one for a certain time from your setup, i.e. restart openHAB without a certain binding (or stop it right after the restart) and observe the system for a while. Now that you know what to look for, you don’t necessarily need to wait until the system breaks down. Just long enough to see if there is such a strong tendency…

Would maybe stopping them release the memory?

That really depends on how the memory is leaked, i.e. if it is one or many java objects that keep such references and whether this object can be garbage collected or not. So until we know the exact mechanism how this happened it is hard to tell whether it really is going to be freed or not. Nevertheless, I’d say it’s worth a try: Stop the bindings one after another and observe the memory - there is a chance that it helps and gets you a step further.