java.io.IOException: Too many open files

Here are the requested outputs:

http://kulturnetz-drensteinfurt.de/Medien/lsof_n.txt
http://kulturnetz-drensteinfurt.de/Medien/lsof_p.txt

I’m having an hard time opening them, they also look really small. Can you validate that the processes uses only 10 MB?

I got an exception during generating the heap dump:

Dumping heap to heap_03.bin ...
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at sun.tools.jmap.JMap.runTool(JMap.java:201)
        at sun.tools.jmap.JMap.main(JMap.java:130)
Caused by: java.lang.InternalError: Metadata does not appear to be polymorphic
        at sun.jvm.hotspot.types.basic.BasicTypeDataBase.findDynamicTypeForAddress(BasicTypeDataBase.java:278)
        at sun.jvm.hotspot.runtime.VirtualBaseConstructor.instantiateWrapperFor(VirtualBaseConstructor.java:102)
        at sun.jvm.hotspot.oops.Metadata.instantiateWrapperFor(Metadata.java:68)
        at sun.jvm.hotspot.memory.DictionaryEntry.klass(DictionaryEntry.java:71)
        at sun.jvm.hotspot.memory.Dictionary.classesDo(Dictionary.java:66)
        at sun.jvm.hotspot.memory.SystemDictionary.classesDo(SystemDictionary.java:190)
        at sun.jvm.hotspot.memory.SystemDictionary.allClassesDo(SystemDictionary.java:183)
        at sun.jvm.hotspot.utilities.HeapHprofBinWriter.writeClasses(HeapHprofBinWriter.java:954)
        at sun.jvm.hotspot.utilities.HeapHprofBinWriter.write(HeapHprofBinWriter.java:427)
        at sun.jvm.hotspot.tools.HeapDumper.run(HeapDumper.java:62)
        at sun.jvm.hotspot.tools.Tool.startInternal(Tool.java:260)
        at sun.jvm.hotspot.tools.Tool.start(Tool.java:223)
        at sun.jvm.hotspot.tools.Tool.execute(Tool.java:118)
        at sun.jvm.hotspot.tools.HeapDumper.main(HeapDumper.java:83)

First result on Google:

:slight_smile:

Check the user which runs openhab vs the user you try to create the dump with

Here is a snap shot of what happened during a ‘Chrome Refresh’ on the PI3. The system is loosing the control of processes and IO

pi@MeteringPiRed:~/bin $ sudo ./startPerformanceMonitoring.sh
13:10:27 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917767 0 38124 899040 13354 0 3210 0 0 0
13:10:37 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917787 0 38139 903092 13356 0 3210 0 0 0
13:10:47 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917979 0 38167 906956 13357 0 3210 0 0 0
13:10:58 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917995 0 38181 911011 13357 0 3210 0 0 0
13:10:08 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918351 0 38229 914761 13364 0 3215 0 0 0
13:10:18 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918362 0 38257 918942 13364 0 3215 0 0 0
13:10:29 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918405 0 38294 922963 13365 0 3217 0 0 0
13:10:39 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918428 0 38323 927005 13367 0 3217 0 0 0
13:10:50 --- CLOSE_WAIT: 154 --- CPU_STAT: cpu  921434 0 38434 928022 13367 0 3226 0 0 0
13:10:00 --- CLOSE_WAIT: 316 --- CPU_STAT: cpu  924840 0 38534 928621 13367 0 3244 0 0 0
13:10:10 --- CLOSE_WAIT: 511 --- CPU_STAT: cpu  928567 0 38665 928999 13367 0 3264 0 0 0
13:10:21 --- CLOSE_WAIT: 657 --- CPU_STAT: cpu  932107 0 38779 929488 13367 0 3280 0 0 0
13:10:31 --- CLOSE_WAIT: 869 --- CPU_STAT: cpu  935709 0 38881 930004 13367 0 3300 0 0 0
13:10:42 --- CLOSE_WAIT: 934 --- CPU_STAT: cpu  939342 0 38980 930467 13367 0 3319 0 0 0
13:10:53 --- CLOSE_WAIT: 826 --- CPU_STAT: cpu  943116 0 39041 930885 13367 0 3320 0 0 0
13:10:03 --- CLOSE_WAIT: 721 --- CPU_STAT: cpu  946881 0 39144 931312 13367 0 3323 0 0 0
13:10:14 --- CLOSE_WAIT: 605 --- CPU_STAT: cpu  950653 0 39221 931733 13367 0 3325 0 0 0
13:10:25 --- CLOSE_WAIT: 493 --- CPU_STAT: cpu  954365 0 39293 932175 13368 0 3328 0 0 0
13:10:35 --- CLOSE_WAIT: 386 --- CPU_STAT: cpu  958126 0 39357 932601 13368 0 3330 0 0 0
13:10:46 --- CLOSE_WAIT: 283 --- CPU_STAT: cpu  961826 0 39418 933061 13368 0 3335 0 0 0
13:10:56 --- CLOSE_WAIT: 172 --- CPU_STAT: cpu  965586 0 39467 933498 13368 0 3336 0 0 0
13:10:07 --- CLOSE_WAIT: 79 --- CPU_STAT: cpu  969328 0 39549 933872 13368 0 3340 0 0 0
13:10:18 --- CLOSE_WAIT: 2 --- CPU_STAT: cpu  973039 0 39619 934281 13368 0 3343 0 0 0
13:10:28 --- CLOSE_WAIT: 206 --- CPU_STAT: cpu  976940 0 39720 934473 13368 0 3370 0 0 0
13:10:39 --- CLOSE_WAIT: 376 --- CPU_STAT: cpu  980638 0 39801 934909 13368 0 3385 0 0 0
13:10:49 --- CLOSE_WAIT: 594 --- CPU_STAT: cpu  984336 0 39899 935341 13368 0 3410 0 0 0
13:10:00 --- CLOSE_WAIT: 811 --- CPU_STAT: cpu  987988 0 40007 935814 13368 0 3431 0 0 0
13:10:11 --- CLOSE_WAIT: 1031 --- CPU_STAT: cpu  991954 0 40148 935983 13368 0 3457 0 0 0
13:10:21 --- CLOSE_WAIT: 1232 --- CPU_STAT: cpu  997565 0 40283 936520 13368 0 3493 0 0 0
13:10:37 --- CLOSE_WAIT: 1556 --- CPU_STAT: cpu  1001307 0 40402 936993 13368 0 3525 0 0 0
13:10:48 --- CLOSE_WAIT: 1769 --- CPU_STAT: cpu  1004952 0 40574 937417 13368 0 3574 0 0 0
13:10:59 --- CLOSE_WAIT: 1980 --- CPU_STAT: cpu  1008984 0 40739 937594 13369 0 3609 0 0 0
13:10:10 --- CLOSE_WAIT: 2198 --- CPU_STAT: cpu  1012765 0 40902 937995 13369 0 3644 0 0 0
13:10:21 --- CLOSE_WAIT: 2428 --- CPU_STAT: cpu  1016923 0 41053 938171 13369 0 3665 0 0 0
13:10:32 --- CLOSE_WAIT: 2662 --- CPU_STAT: cpu  1020760 0 41213 938628 13370 0 3686 0 0 0
13:10:44 --- CLOSE_WAIT: 2812 --- CPU_STAT: cpu  1024514 0 41353 939093 13370 0 3707 0 0 0
13:10:54 --- CLOSE_WAIT: 2703 --- CPU_STAT: cpu  1028728 0 41490 939249 13370 0 3717 0 0 0

Were you able to create the heap-dump using the user under which openHAB is running?

Here is the ‘complete’ dump

http://kulturnetz-drensteinfurt.de/Medien/heap_03.bin.zip

Yes that one works, it has 2751 connections ‘open’/

  1. can you enable some debug logging:

http://docs.openhab.org/administration/logging.html

  1. Can you check the Chrome developer tools whether the browser server really fired so many requests, especially the network tab is relevant, just check what requests are being fired at the server, most likely requests fail and are somehow retried and fail again, at least that is what I hope :slight_smile:

maybe a wireshark file for the traffic between http client and host?

1 Like

Great

And if you also make a thread dump that might also give some details, if you are in time I might be able to took a look at them in train back home :slight_smile:

OK here they are.
Wireshark from two systems - linux kubuntu 16.4 32 bit and windows 7 professional 64 bit

http://kulturnetz-drensteinfurt.de/Medien/firefox_linux.pcap
http://kulturnetz-drensteinfurt.de/Medien/firefox_windows.pcap

Please apply as a filter in wireshark

ip.dst == 192.168.0.100 && http

You will see immediately:

  • under Linux: the refresh works perfectly - the complete data of the page is requested each time the refresh button is pressed.
  • under Windows: after the first refresh the Browser generates tons of chart requests and destroys the service

under Linux: the refresh works perfectly - the complete data of the page is requested each time the refresh button is pressed.

Does it render graphs correctly or are there no (correct) graphs? Because the logging shows:

2017-10-09 11:02:00.113 [WARN ] [thome.ui.internal.chart.ChartServlet] - Illegal argument in chart: No Persistence service found.
2017-10-09 11:02:00.086 [WARN ] [thome.ui.internal.chart.ChartServlet] - Illegal argument in chart: No Persistence service found.
2017-10-09 11:02:00.073 [WARN ] [thome.ui.internal.chart.ChartServlet] - Illegal argument in chart: No Persistence service found.

From my point of view this is a subsequent error.

The first refresh under windows crashes the service. No chart is refreshed.

Before I refresh, the view in Firefox / Chrome updates according to the configuration of the openhab service.

Can you try to resolve No Persistence service found. for example by installing RRD persistence addon

If that resolves it I might be able to setup a server at home which can reproduce this.

The ZeroW is a little bit slower in answering - but it works anyway:

pi@MeteringPiZeroW:~/bin $ sudo  ./startPerformanceMonitoring.sh
15:10:51 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  588714 0 54926 1229134 17284 0 10779 0 0 0
15:10:01 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  588976 0 54966 1229904 17288 0 10780 0 0 0
15:10:12 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  589324 0 55004 1230559 17291 0 10782 0 0 0
15:10:23 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  589352 0 55036 1231538 17291 0 10783 0 0 0
15:10:33 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  589987 0 55080 1231914 17301 0 10788 0 0 0
15:10:44 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  590166 0 55113 1232742 17303 0 10790 0 0 0
15:10:54 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  590229 0 55145 1233686 17304 0 10790 0 0 0

This service is already installed :wink:
That’s where the chart data is written to.

Then I fear that you are demanding too much of the poor Pi’s every update of data item will lead to a refresh of chart, it seems that it just cannot keep up the pace. I see 18 requests in first second. And it only seems to be increasing, maybe you have to render / refresh your charts in a different way.

If you close the browser does it then normalize after some time? Will the connections be closed?

It is obviously linked to the usage of a windows browser like firefox or chrome. As long as I use a browser from a linux system it seems to be OK. When I start the view from Winwows, the number of hanging connections increases immediately due to the high number of requests from the windows browser:

16:10:09 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2494333 0 93288 3263097 24973 0 11154 0 0 0
16:10:20 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2496464 0 93338 3264981 24977 0 11164 0 0 0
16:10:30 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2497074 0 93364 3268464 24984 0 11168 0 0 0
16:10:40 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2497183 0 93382 3272438 24994 0 11171 0 0 0
16:10:51 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2499070 0 93427 3274581 25004 0 11181 0 0 0
16:10:01 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2501079 0 93484 3276583 25021 0 11186 0 0 0
16:10:11 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2503287 0 93564 3278361 25031 0 11195 0 0 0
16:10:22 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2504508 0 93602 3281214 25041 0 11204 0 0 0
16:10:32 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2506503 0 93652 3283259 25051 0 11214 0 0 0
16:10:42 --- CLOSE_WAIT: 98 --- CPU_STAT: cpu  2509080 0 93745 3284745 25054 0 11234 0 0 0
16:10:53 --- CLOSE_WAIT: 359 --- CPU_STAT: cpu  2512920 0 93894 3284936 25054 0 11264 0 0 0
16:10:03 --- CLOSE_WAIT: 603 --- CPU_STAT: cpu  2516567 0 94047 3285297 25054 0 11288 0 0 0
16:10:14 --- CLOSE_WAIT: 876 --- CPU_STAT: cpu  2520228 0 94183 3285696 25054 0 11309 0 0 0
16:10:24 --- CLOSE_WAIT: 1042 --- CPU_STAT: cpu  2523920 0 94291 3286108 25054 0 11335 0 0 0
16:10:35 --- CLOSE_WAIT: 990 --- CPU_STAT: cpu  2529506 0 94405 3286617 25065 0 11354 0 0 0
16:10:51 --- CLOSE_WAIT: 925 --- CPU_STAT: cpu  2533408 0 94501 3286850 25093 0 11359 0 0 0
16:10:01 --- CLOSE_WAIT: 802 --- CPU_STAT: cpu  2537180 0 94610 3287274 25093 0 11363 0 0 0
16:10:12 --- CLOSE_WAIT: 671 --- CPU_STAT: cpu  2540926 0 94704 3287656 25103 0 11366 0 0 0
16:10:23 --- CLOSE_WAIT: 537 --- CPU_STAT: cpu  2544659 0 94771 3287933 25261 0 11368 0 0 0
16:10:33 --- CLOSE_WAIT: 415 --- CPU_STAT: cpu  2548419 0 94837 3288345 25261 0 11370 0 0 0
16:10:44 --- CLOSE_WAIT: 289 --- CPU_STAT: cpu  2552204 0 94902 3288770 25261 0 11370 0 0 0
16:10:55 --- CLOSE_WAIT: 152 --- CPU_STAT: cpu  2555920 0 94978 3289194 25275 0 11375 0 0 0
16:10:05 --- CLOSE_WAIT: 18 --- CPU_STAT: cpu  2559799 0 95082 3289386 25278 0 11387 0 0 0
16:10:16 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2561755 0 95126 3290790 25942 0 11394 0 0 0
16:10:26 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562647 0 95164 3293971 25952 0 11401 0 0 0
16:10:36 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562664 0 95185 3298056 25955 0 11402 0 0 0
16:10:47 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562693 0 95200 3302193 25958 0 11402 0 0 0
16:10:57 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562713 0 95221 3306219 26036 0 11403 0 0 0
16:10:08 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562978 0 95252 3310016 26061 0 11403 0 0 0

As soon as I close the windows browser the connections are closed and the systems comes back to normal operation.
The Linux view is still working.

Maybe this is a topic for a new thread ‘hight amount of connections after refresh in windows web browser’?

Okay, that is great so the bug is mainly on the browser side. I fear that it is by design to refresh on updated items. I had similar but way smaller problems on my raspberry pi with my smart meter, I got so much data that it could not keep up anymore, but it did not come to this breaking point.

What you could maybe do is linking statically to the chart or switching to another charting solution like InfluxDB+Grafana persistence and graphing

@rlkoshak do you have any other suggestions.