java.io.IOException: Too many open files


(Jürgen Blümer) #25

Here is a snap shot of what happened during a ‘Chrome Refresh’ on the PI3. The system is loosing the control of processes and IO

pi@MeteringPiRed:~/bin $ sudo ./startPerformanceMonitoring.sh
13:10:27 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917767 0 38124 899040 13354 0 3210 0 0 0
13:10:37 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917787 0 38139 903092 13356 0 3210 0 0 0
13:10:47 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917979 0 38167 906956 13357 0 3210 0 0 0
13:10:58 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  917995 0 38181 911011 13357 0 3210 0 0 0
13:10:08 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918351 0 38229 914761 13364 0 3215 0 0 0
13:10:18 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918362 0 38257 918942 13364 0 3215 0 0 0
13:10:29 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918405 0 38294 922963 13365 0 3217 0 0 0
13:10:39 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  918428 0 38323 927005 13367 0 3217 0 0 0
13:10:50 --- CLOSE_WAIT: 154 --- CPU_STAT: cpu  921434 0 38434 928022 13367 0 3226 0 0 0
13:10:00 --- CLOSE_WAIT: 316 --- CPU_STAT: cpu  924840 0 38534 928621 13367 0 3244 0 0 0
13:10:10 --- CLOSE_WAIT: 511 --- CPU_STAT: cpu  928567 0 38665 928999 13367 0 3264 0 0 0
13:10:21 --- CLOSE_WAIT: 657 --- CPU_STAT: cpu  932107 0 38779 929488 13367 0 3280 0 0 0
13:10:31 --- CLOSE_WAIT: 869 --- CPU_STAT: cpu  935709 0 38881 930004 13367 0 3300 0 0 0
13:10:42 --- CLOSE_WAIT: 934 --- CPU_STAT: cpu  939342 0 38980 930467 13367 0 3319 0 0 0
13:10:53 --- CLOSE_WAIT: 826 --- CPU_STAT: cpu  943116 0 39041 930885 13367 0 3320 0 0 0
13:10:03 --- CLOSE_WAIT: 721 --- CPU_STAT: cpu  946881 0 39144 931312 13367 0 3323 0 0 0
13:10:14 --- CLOSE_WAIT: 605 --- CPU_STAT: cpu  950653 0 39221 931733 13367 0 3325 0 0 0
13:10:25 --- CLOSE_WAIT: 493 --- CPU_STAT: cpu  954365 0 39293 932175 13368 0 3328 0 0 0
13:10:35 --- CLOSE_WAIT: 386 --- CPU_STAT: cpu  958126 0 39357 932601 13368 0 3330 0 0 0
13:10:46 --- CLOSE_WAIT: 283 --- CPU_STAT: cpu  961826 0 39418 933061 13368 0 3335 0 0 0
13:10:56 --- CLOSE_WAIT: 172 --- CPU_STAT: cpu  965586 0 39467 933498 13368 0 3336 0 0 0
13:10:07 --- CLOSE_WAIT: 79 --- CPU_STAT: cpu  969328 0 39549 933872 13368 0 3340 0 0 0
13:10:18 --- CLOSE_WAIT: 2 --- CPU_STAT: cpu  973039 0 39619 934281 13368 0 3343 0 0 0
13:10:28 --- CLOSE_WAIT: 206 --- CPU_STAT: cpu  976940 0 39720 934473 13368 0 3370 0 0 0
13:10:39 --- CLOSE_WAIT: 376 --- CPU_STAT: cpu  980638 0 39801 934909 13368 0 3385 0 0 0
13:10:49 --- CLOSE_WAIT: 594 --- CPU_STAT: cpu  984336 0 39899 935341 13368 0 3410 0 0 0
13:10:00 --- CLOSE_WAIT: 811 --- CPU_STAT: cpu  987988 0 40007 935814 13368 0 3431 0 0 0
13:10:11 --- CLOSE_WAIT: 1031 --- CPU_STAT: cpu  991954 0 40148 935983 13368 0 3457 0 0 0
13:10:21 --- CLOSE_WAIT: 1232 --- CPU_STAT: cpu  997565 0 40283 936520 13368 0 3493 0 0 0
13:10:37 --- CLOSE_WAIT: 1556 --- CPU_STAT: cpu  1001307 0 40402 936993 13368 0 3525 0 0 0
13:10:48 --- CLOSE_WAIT: 1769 --- CPU_STAT: cpu  1004952 0 40574 937417 13368 0 3574 0 0 0
13:10:59 --- CLOSE_WAIT: 1980 --- CPU_STAT: cpu  1008984 0 40739 937594 13369 0 3609 0 0 0
13:10:10 --- CLOSE_WAIT: 2198 --- CPU_STAT: cpu  1012765 0 40902 937995 13369 0 3644 0 0 0
13:10:21 --- CLOSE_WAIT: 2428 --- CPU_STAT: cpu  1016923 0 41053 938171 13369 0 3665 0 0 0
13:10:32 --- CLOSE_WAIT: 2662 --- CPU_STAT: cpu  1020760 0 41213 938628 13370 0 3686 0 0 0
13:10:44 --- CLOSE_WAIT: 2812 --- CPU_STAT: cpu  1024514 0 41353 939093 13370 0 3707 0 0 0
13:10:54 --- CLOSE_WAIT: 2703 --- CPU_STAT: cpu  1028728 0 41490 939249 13370 0 3717 0 0 0

(Martin van Wingerden) #26

Were you able to create the heap-dump using the user under which openHAB is running?


(Jürgen Blümer) #27

Here is the ‘complete’ dump

http://kulturnetz-drensteinfurt.de/Medien/heap_03.bin.zip


(Martin van Wingerden) #28

Yes that one works, it has 2751 connections ‘open’/

  1. can you enable some debug logging:

http://docs.openhab.org/administration/logging.html

  1. Can you check the Chrome developer tools whether the browser server really fired so many requests, especially the network tab is relevant, just check what requests are being fired at the server, most likely requests fail and are somehow retried and fail again, at least that is what I hope :slight_smile:

(Jürgen Blümer) #29

maybe a wireshark file for the traffic between http client and host?


(Martin van Wingerden) #30

Great


(Martin van Wingerden) #31

And if you also make a thread dump that might also give some details, if you are in time I might be able to took a look at them in train back home :slight_smile:


(Jürgen Blümer) #32

OK here they are.
Wireshark from two systems - linux kubuntu 16.4 32 bit and windows 7 professional 64 bit

http://kulturnetz-drensteinfurt.de/Medien/firefox_linux.pcap
http://kulturnetz-drensteinfurt.de/Medien/firefox_windows.pcap

Please apply as a filter in wireshark

ip.dst == 192.168.0.100 && http

You will see immediately:

  • under Linux: the refresh works perfectly - the complete data of the page is requested each time the refresh button is pressed.
  • under Windows: after the first refresh the Browser generates tons of chart requests and destroys the service

(Martin van Wingerden) #33

under Linux: the refresh works perfectly - the complete data of the page is requested each time the refresh button is pressed.

Does it render graphs correctly or are there no (correct) graphs? Because the logging shows:

2017-10-09 11:02:00.113 [WARN ] [thome.ui.internal.chart.ChartServlet] - Illegal argument in chart: No Persistence service found.
2017-10-09 11:02:00.086 [WARN ] [thome.ui.internal.chart.ChartServlet] - Illegal argument in chart: No Persistence service found.
2017-10-09 11:02:00.073 [WARN ] [thome.ui.internal.chart.ChartServlet] - Illegal argument in chart: No Persistence service found.

(Jürgen Blümer) #34

From my point of view this is a subsequent error.

The first refresh under windows crashes the service. No chart is refreshed.

Before I refresh, the view in Firefox / Chrome updates according to the configuration of the openhab service.


(Martin van Wingerden) #35

Can you try to resolve No Persistence service found. for example by installing RRD persistence addon

If that resolves it I might be able to setup a server at home which can reproduce this.


(Jürgen Blümer) #36

The ZeroW is a little bit slower in answering - but it works anyway:

pi@MeteringPiZeroW:~/bin $ sudo  ./startPerformanceMonitoring.sh
15:10:51 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  588714 0 54926 1229134 17284 0 10779 0 0 0
15:10:01 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  588976 0 54966 1229904 17288 0 10780 0 0 0
15:10:12 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  589324 0 55004 1230559 17291 0 10782 0 0 0
15:10:23 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  589352 0 55036 1231538 17291 0 10783 0 0 0
15:10:33 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  589987 0 55080 1231914 17301 0 10788 0 0 0
15:10:44 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  590166 0 55113 1232742 17303 0 10790 0 0 0
15:10:54 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  590229 0 55145 1233686 17304 0 10790 0 0 0

(Jürgen Blümer) #37

This service is already installed :wink:
That’s where the chart data is written to.


(Martin van Wingerden) #38

Then I fear that you are demanding too much of the poor Pi’s every update of data item will lead to a refresh of chart, it seems that it just cannot keep up the pace. I see 18 requests in first second. And it only seems to be increasing, maybe you have to render / refresh your charts in a different way.

If you close the browser does it then normalize after some time? Will the connections be closed?


(Jürgen Blümer) #39

It is obviously linked to the usage of a windows browser like firefox or chrome. As long as I use a browser from a linux system it seems to be OK. When I start the view from Winwows, the number of hanging connections increases immediately due to the high number of requests from the windows browser:

16:10:09 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2494333 0 93288 3263097 24973 0 11154 0 0 0
16:10:20 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2496464 0 93338 3264981 24977 0 11164 0 0 0
16:10:30 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2497074 0 93364 3268464 24984 0 11168 0 0 0
16:10:40 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2497183 0 93382 3272438 24994 0 11171 0 0 0
16:10:51 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2499070 0 93427 3274581 25004 0 11181 0 0 0
16:10:01 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2501079 0 93484 3276583 25021 0 11186 0 0 0
16:10:11 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2503287 0 93564 3278361 25031 0 11195 0 0 0
16:10:22 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2504508 0 93602 3281214 25041 0 11204 0 0 0
16:10:32 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2506503 0 93652 3283259 25051 0 11214 0 0 0
16:10:42 --- CLOSE_WAIT: 98 --- CPU_STAT: cpu  2509080 0 93745 3284745 25054 0 11234 0 0 0
16:10:53 --- CLOSE_WAIT: 359 --- CPU_STAT: cpu  2512920 0 93894 3284936 25054 0 11264 0 0 0
16:10:03 --- CLOSE_WAIT: 603 --- CPU_STAT: cpu  2516567 0 94047 3285297 25054 0 11288 0 0 0
16:10:14 --- CLOSE_WAIT: 876 --- CPU_STAT: cpu  2520228 0 94183 3285696 25054 0 11309 0 0 0
16:10:24 --- CLOSE_WAIT: 1042 --- CPU_STAT: cpu  2523920 0 94291 3286108 25054 0 11335 0 0 0
16:10:35 --- CLOSE_WAIT: 990 --- CPU_STAT: cpu  2529506 0 94405 3286617 25065 0 11354 0 0 0
16:10:51 --- CLOSE_WAIT: 925 --- CPU_STAT: cpu  2533408 0 94501 3286850 25093 0 11359 0 0 0
16:10:01 --- CLOSE_WAIT: 802 --- CPU_STAT: cpu  2537180 0 94610 3287274 25093 0 11363 0 0 0
16:10:12 --- CLOSE_WAIT: 671 --- CPU_STAT: cpu  2540926 0 94704 3287656 25103 0 11366 0 0 0
16:10:23 --- CLOSE_WAIT: 537 --- CPU_STAT: cpu  2544659 0 94771 3287933 25261 0 11368 0 0 0
16:10:33 --- CLOSE_WAIT: 415 --- CPU_STAT: cpu  2548419 0 94837 3288345 25261 0 11370 0 0 0
16:10:44 --- CLOSE_WAIT: 289 --- CPU_STAT: cpu  2552204 0 94902 3288770 25261 0 11370 0 0 0
16:10:55 --- CLOSE_WAIT: 152 --- CPU_STAT: cpu  2555920 0 94978 3289194 25275 0 11375 0 0 0
16:10:05 --- CLOSE_WAIT: 18 --- CPU_STAT: cpu  2559799 0 95082 3289386 25278 0 11387 0 0 0
16:10:16 --- CLOSE_WAIT: 1 --- CPU_STAT: cpu  2561755 0 95126 3290790 25942 0 11394 0 0 0
16:10:26 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562647 0 95164 3293971 25952 0 11401 0 0 0
16:10:36 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562664 0 95185 3298056 25955 0 11402 0 0 0
16:10:47 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562693 0 95200 3302193 25958 0 11402 0 0 0
16:10:57 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562713 0 95221 3306219 26036 0 11403 0 0 0
16:10:08 --- CLOSE_WAIT: 0 --- CPU_STAT: cpu  2562978 0 95252 3310016 26061 0 11403 0 0 0

As soon as I close the windows browser the connections are closed and the systems comes back to normal operation.
The Linux view is still working.

Maybe this is a topic for a new thread ‘hight amount of connections after refresh in windows web browser’?


(Martin van Wingerden) #40

Okay, that is great so the bug is mainly on the browser side. I fear that it is by design to refresh on updated items. I had similar but way smaller problems on my raspberry pi with my smart meter, I got so much data that it could not keep up anymore, but it did not come to this breaking point.

What you could maybe do is linking statically to the chart or switching to another charting solution like InfluxDB+Grafana persistence and graphing

@rlkoshak do you have any other suggestions.


(Rich Koshak) #41

That is what I would suggest. I think the important part is to link to the prerendered chart so all the work goes on in the Grafana server and the browser has just the one request and one image to download each refresh period.


(Jürgen Blümer) #42

The problem does not occur because of the refresh period. The service breaks down as soon as I refresh manually the view inside Chrome / Firefox. As long as the browser performs the automatic reload nothing strange happens. Everything seems to work.

Things change because of manual interaction of the user.

Anyway - tomorrow I will check the proposed combination of DB an chart service on my raspberries.


(Jürgen Blümer) #43

This morning I tested the ‘too many client connections’ problem with a 64 bit Windows 8.1 pro. I was not able to reproduce it. The Pi3 serves faster than the PiZeroW (surprise surprise), but I got a response on every refresh browser view action.

Only the Windows 7 pro System causes the trouble with the connections.


(Martin van Wingerden) #44

Interesting but bizarre…


Grafana stops working?