Duplicate Data InfluxDB

stefan13 · March 8, 2020, 5:31pm

Hello,

I write some weather data from netatmo rain sensor to my InfuxDB.
I setup the rule to write if everyDay only.

everyDay     : "0 0 0 * * ?"

Netatmo_Rain_24hours : strategy = everyDay, restoreOnStartup

When I retrieve the data I was wondering why the value is always double the original.

When I checked the InfluxDB table with Grafana I see same value is stored twice always.

|2020-03-07 00:00:00|7.30|
|2020-03-06 23:59:00|7.30|

Any clue what I did wrong?

Many thanks
Stefan

Bruce_Osborne · March 8, 2020, 6:33pm

Those are 1 minute apart, on different days.

stefan13 · March 8, 2020, 8:16pm

Yes I know.
But this doesn’t make sense as the strategy is to save only once per day at midnight.
Why is the same value always stored at 23:59 and one minute later at 0:00 again?

Bruce_Osborne · March 8, 2020, 11:01pm

Perhaps it calculated daylight savings time at midnight instead of 2 am.

rossko57 · March 8, 2020, 11:52pm

That’s really interesting, isn’t it?
Out of curiosity, can you show us a bit more with entries actually on the same date?
It feels like some artefact of the clock rolling over at midnight.

Id suggest a possible workaround of making everyDay as 00:01

stefan13 · March 9, 2020, 6:08pm

Hello @rossko57 - thanks for your reply.

I did some screenshots in Garfana of the item.

It seems the issue start to occur on 16. Feb 2020 only. Earlier records are correct as you can see in the screenshot below. I remove ‘0’ values in this output:

When I fill the empty records with ‘0’ it looks good for old records (before Feb 16) - only at 0:00:00 the entry is stored in the database.

When I check for recent entries, I see they are duplicated at 23:59:00 and 00:00:00, else is ‘0’

As you can see, the values are always a duplicate :

I will change the strategy for this item to

0 23 59 * * ?

I suppose this should store only at 23:59:00:

I let you know whether this solve the problem.

BTW: Is it expected to see the ‘0’ values every minute, even the record is only stored once per day?
Is this just a Garfana behavior?
I would expect to only see entries every 24hours, not every minutes.

Many thanks
Stefan

Edit:
I’m testing this setup now - at 23:58:00, in order to easier identify the change

 everyDayAt2358 : "0 23 58 * * ?" 
 
 Netatmo_Rain_24hours : strategy = everyDayAt2358, restoreOnStartup

stefan13 · March 11, 2020, 7:22am

Since I changed the strategy nothing is logged anymore
Is the name invalid I used?

stefan13 · March 11, 2020, 8:18am

Found the problem in my strategy.

Should be

everyDayAt2358 : "0 58 23 * * ?"

I suppose.
I mixed Minutes and Hours

stefan13 · March 12, 2020, 4:21pm

Problem still persist, even with the updated Cron Time.

The value is still written twice - at 23:58:00 as per strategy, but also one minute before at 23:57:00.

Any clue what could trigger this behavior suddenly?
Is it possible to trace/debug this somehow?

Thanks for your help.

Stefan

jswim788 · March 12, 2020, 4:37pm

Small suggestion: query this directly in InfluxDB on the command line to see what is stored there. You are checking through Grafana which should presumably be the same, but it would be good to see the data directly in InfluxDB to rule out any issue in Grafana.

Also, every InfluxDB write is is likely logged in /var/log/syslog (assuming you are using a typical Linux platform). See if the extraneous write is logged there.

rossko57 · March 12, 2020, 8:31pm

You can do this with openHAB REST API as well

stefan13 · March 13, 2020, 5:42pm

Hello @jswim788, @rossko57,

I only used Grafana to verify the data.
I found that the 24hours value I retrieved in openhab for the item
netatmo_rain_24hours was always the double of the item netatmo_rain_1hour for the same duration of time I checked.

Below is getting the correct value. The strategy for this item is to store write the data every hour.
rain_last_24hrs = Netatmo_Rain_1hour.sumSince(now.minusHours(24))

When I retrieve the data for the 24hours item with OpenHAB REST API I get the following 2 datapoint. I think this confirms the data from Grafana.

    {
      "name": "Netatmo_Rain_24hours",
      "datapoints": "2",
      "data": [
    {
      "time": 1584053877967,
      "state": "3.299999952316284"
    },
    {
      "time": 1584053880967,
      "state": "3.299999952316284"
    }
      ]
    }

The following is the statistic with 1sec steps.

Below are the records of the 1hour item

{
  "name": "Netatmo_Rain_1hour",
  "datapoints": "26",
  "data": [
    {
      "time": 1583967600533,
      "state": "0.0"
    },
    {
      "time": 1583971200580,
      "state": "0.0"
    },
    {
      "time": 1583974800581,
      "state": "0.0"
    },
    {
      "time": 1583978400581,
      "state": "0.0"
    },
    {
      "time": 1583982000581,
      "state": "0.11100000143051147"
    },
    {
      "time": 1583985599960,
      "state": "0.0"
    },
    {
      "time": 1583985600961,
      "state": "0.0"
    },
    {
      "time": 1583989200961,
      "state": "0.0"
    },
    {
      "time": 1583992800961,
      "state": "0.11100000143051147"
    },
    {
      "time": 1583996400961,
      "state": "0.3330000042915344"
    },
    {
      "time": 1584000000961,
      "state": "0.11100000143051147"
    },
    {
      "time": 1584003600963,
      "state": "2.3239998817443848"
    },
    {
      "time": 1584007200964,
      "state": "0.0"
    },
    {
      "time": 1584010800964,
      "state": "0.0"
    },
    {
      "time": 1584014400964,
      "state": "0.0"
    },
    {
      "time": 1584018000965,
      "state": "0.0"
    },
    {
      "time": 1584021599824,
      "state": "0.0"
    },
    {
      "time": 1584021600824,
      "state": "0.0"
    },
    {
      "time": 1584025200026,
      "state": "0.0"
    },
    {
      "time": 1584028800026,
      "state": "0.0"
    },
    {
      "time": 1584032400026,
      "state": "0.0"
    },
    {
      "time": 1584036000027,
      "state": "0.0"
    },
    {
      "time": 1584039600027,
      "state": "0.0"
    },
    {
      "time": 1584043200027,
      "state": "0.0"
    },
    {
      "time": 1584046800028,
      "state": "0.0"
    },
    {
      "time": 1584050400028,
      "state": "0.0"
    }
  ]
}

stefan13 · May 3, 2020, 11:58am

I’m still struggling with this issue. I renamed the item for testing but still see duplicates in the records with 1sec time difference.
Today I discovered this is happening as well for the 1hour item statistic randomly. (see picture)
I really have no clue what the trigger is for this strange behavior, nor how I can debug it.
Both, Openhab and InfluxDB are running in a docker container.

Many thanks for your support

Stefan

jswim788 · May 3, 2020, 4:15pm

If you are running on Linux, every time there is a write to the InfluxDB, there is a corresponding entry in the /var/log/syslog (assuming that hasn’t been turned off). Look there at these times, and look in your OH events.log. See if anything odd is happening then. Maybe you can post a snip of these 2 log files around the time one of these happens.

rossko57 · May 3, 2020, 4:44pm

I pondered what it would look like if you somehow had two instances of persistence add-on running.

Still would make no sense - with persist config for 23:50, records for
Thursday, March 12, 2020 10:57:57.967 PM
Thursday, March 12, 2020 10:58:00.967 PM
where would the early record come from?

For confirmation, influx is running on same box as OH? i.e. should have same system time.
From above, we can see there is a timezone thing at work; OH and influx are on different timezones, but persistence service can surely sort that out.

The “early” records are a huge clue, but I don’t know what to

stefan13 · May 4, 2020, 6:53pm

I’m running everything on the same QNAP NAS inside a Docker Container.
The clock shouldn’t be the issue I guess?! What is the best way to check this? the difference between the two entries is only 1 sec. Not sure how to verify this.

I suppose the best approach is to trace this with openhab and InfluxDB .
Can I TRACE the entries for Persistence Service somehow?

[~] # docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
9096faea0541        mthenw/frontail          "/frontail/docker-..."   45 hours ago        Up 45 hours         0.0.0.0:9001->9001/tcp                           frontail_openhab
fc94f5b11a4a        openhab/openhab:2.5.4    "/entrypoint.sh go..."   2 days ago          Up 2 days                                                            openhab_250
1c3c374b87ac        influxdb:latest          "/entrypoint.sh -c..."   4 weeks ago         Up 2 days           0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp   influxdb
d6a180481a30        grafana/grafana:latest   "/run.sh"                4 months ago        Up 2 days           0.0.0.0:3000->3000/tcp                           grafana
[~] #

I was trying to get the log file inside the InfluxDB container, but cannot find it - seems it doesn’t work well

root@1c3c374b87ac:/var/log# ls -l
total 356
-rw-r--r-- 1 root     root        326 Sep 12  2019 alternatives.log
drwxr-xr-x 1 root     root       4096 Sep 12  2019 apt
-rw-rw---- 1 root     utmp          0 Sep 10  2019 btmp
-rw-r--r-- 1 root     root      24428 Sep 13  2019 dpkg.log
-rw-r--r-- 1 root     root      32000 Sep 13  2019 faillog
drwxr-xr-x 2 influxdb influxdb   4096 Aug 21  2019 influxdb
-rw-rw-r-- 1 root     utmp     292000 Sep 13  2019 lastlog
-rw-rw-r-- 1 root     utmp          0 Sep 10  2019 wtmp
root@1c3c374b87ac:/var/log#

This is how I start InfluxDB:

#!/bin/sh

INFLUX_DIR_DB=$(readlink -f ./data)
INFLUX_DIR_CONF=$(readlink -f ./config)
INFLUXDB_VER='latest'
INFLUXDB_NAME='influxdb'

if [ -z $INFLUX_DIR_DB ]; then
        echo "Database directory does not exist!"
        exit 1
fi

if [ -z $INFLUX_DIR_CONF ]; then
        echo "Config directory does not exist!"
        exit 1
fi

echo "Datebase directory: $INFLUX_DIR_DB"
echo "Config directory: $INFLUX_DIR_CONF"

docker stop $INFLUXDB_NAME
docker rm $INFLUXDB_NAME

    docker run -it \
        --name $INFLUXDB_NAME \
        -v $INFLUX_DIR_DB:/var/lib/influxdb \
        -v /etc/localtime:/etc/localtime:ro \
        -d \
        --restart=always \
        -p 8086:8086 \
        -p 8083:8083 \
        -v $INFLUX_DIR_CONF/influxdb.conf:/etc/influxdb/influxdb.conf:ro \
        influxdb:$INFLUXDB_VER -config /etc/influxdb/influxdb.conf
[/share/Container/influxdb] #

rossko57 · May 4, 2020, 7:23pm

Dunno,but I doubt it. From what you say, openHAB and influxDB must share the same system clock.
That makes an “extra” record being written - or rather, getting timestamped - a few seconds before the expected time really odd.

I’m making an assumption that we would expect cron persist like -
everyDayAt2358 : “0 58 23 * * ?”
to trigger a write record at (or after) second 00, as specified.in the cron.

Yes,let’s find out if there is a double write request (surely must be, somewhere!?/)