Aggregate and export persistence in intervalls

I persist most of my items in MySQL Database - and every sensor data. Most sensors don’t have a given interval, but fuzzy timestamps and sometimes the intervals are rapid.

So my use case is the following:

  1. aggregate the sensors’ data
  2. define SUM(), AVG(), … for a given sensor (temperature would be AVG, consumption SUM, …)
  3. use the aggregated data
  4. even “compact” old data that way for the item’s persistence

I know, that’s not really a OH3-core question, but a MySQL or script question.
What I did right now in a python script:

  1. define start and end date
  2. SELECT the existing timestamps within an interval
  3. INSERT SUM or AVG of that interval in a new table
  4. repeat until end date is reached

but:

  • slow as hell
  • I tried it with my smartmeter persistence and sometimes I get negative consumptions?

Is there a more elegant way to do that? like in pure MySQL functions or is there already a “compact/aggregate” function in OH3 for that, that I didn’t see?

No that is a DB task.
What you can do is to calculate the average yourself.
Introduce ‘average’ virtual items, call them from cron every minute and calculate <original item>.averageSince(now.minusMinutes(1)), then remove persistence of the original items.
This is also quite useful when you want to act on time-averaged sensor value rather than single value
Think a lux sensor you use to close/reopen your blinds, on a cloudy day you don’t want it to act upon every change.

Or consider moving to rrd4j it offers aggregation per default (you can apply some tweaking to intervals).

1 Like

The most immanent use case is to have aggregated 1min intervals of my smartmeter for calculating my planned Photovoltaic. :wink:
So I’m dependend on existing data, but for the future, the proxy item with an average would be worth considering. So I’ll have to dig deeper into MySQL functions…

?? You must not dimension your PV based on consumption, that’ll lead to suboptimal results
(best possible advice is: put as many panels onto your roof as possible).
For battery size maybe ok but that you don’t need 1 min intervals for.

For the future, why not get my EMS. It uses right that, proxy items to average PV yield, consumption etc. And it does all the optimization computations in real time, and much more.

yes, it’s for calculating the battery size. and of course I’ll put as much modules up there as possible! :wink:

yes, that’s top of the list after ordering the PV system! :wink:

I’ve even put some modules on top of my northern side. It’s borderline but worth considering.

I’ll do that. 28° slope allows for a reasonable yield there also. and also I do have solarthermic modules on the south side already, so there’s not THAT much space there anymore.

You can’t remove persistence on the original Items or it will break the call to averageSince(). In this case you could save the original Items to RRD4J which won’t grow too big and lets you calculate the average like this and only save the virtual Items to MySQL. But then, as you say RRD4J offers aggregation by default so maybe it’s worth moving over entirely.

If this is a one time operation does that really matter? Once you’ve processed all your data and created the new tables you are done and never have to run the tool again, or at least need to run it rarely.

I can’t help with the negative values except to say look in your data for negative entries. It’s either the data or somehow adding numbers is broken in some way.

you’re right. for a one-off it should work with the slow. What makes me wonder, my script relies on the smartmeter read-outs - and I don’t think my provider would like it to have negative consumption (without me having PV right now! :wink: )…

But perhaps I have some logic error here (python):

start = (2020, 8, 1, 0, 0, 0, 0, 0, 0)
end = (2021, 8, 1, 0, 0, 0, 0, 0, 0)
reading = 50839.0817
intervalMinutes = 1

start_date = int(time.mktime(start))
end_date = int(time.mktime(end))
current_date = start_date

while (current_date < end_date):
    current_datetime = str(datetime.fromtimestamp(current_date ))
    queryreading = ('SELECT * FROM openHAB.item0303 WHERE time > STR_TO_DATE("' + current_datetime + '", "%Y-%m-%d %H:%i:%s.%f") ORDER by time asc LIMIT 1')
    print queryreading

    cur.execute(queryreading)
    rowsreading = cur.fetchone()
    current_reading = rowsreading[1]

    consumption = current_reading - reading 
    reading = current_reading
    
    insertconsumption = ('INSERT INTO `openHAB`.`consumption_' + str(intervalMinutes) + '` (`time`, `value`) VALUES (STR_TO_DATE("' + str(datetime.fromtimestamp(current_date)) + '", "%Y-%m-%d %H:%i:%s.%f"), "' + str(consumption) + '")')
    print insertconsumption 

    cur2.execute(insertconsumption)
    con2.commit()
    
    current_date = current_date + (intervalMinutes * 60)

from what I see, there should not be something like negative “insertconsumptions”, but the overall consumption over one year is correct… :man_shrugging:

But this assumes that what ever API or scraping you are doing to get the read-outs is without error and not reporting negative numbers. Looking to see if you actually have negative numbers in the DB can prove that at least one problem is with the source data or whether it can only be a logic error.

OK so this isn’t just summing up the values but subtracting. That changes things and makes it more likely that the there is a logic error that stems from an assumption about the readings.

current_reading is initial-zed to the second result returned by the SQL query (the first entry in an array is 0). Then you subtract 50839.0817 for every hour between the current date and the end date.

What’s 50839.0817? Is the assumption that current_reading > (number of hours between start time and end time * 50839.0817) correct for all end_dates? The negative values seem to indicate the answer is no. At least some of the time that assumption is false resulting in negative numbers.

start = (2020, 8, 1, 0, 0, 0, 0, 0, 0)
=> the reading on that date (2020-08-01) was:
reading = 50839.0817

so, from that point on, it should go like this:

  1. Query current_reading from the current_date timestamp
  2. then substract reading from current_reading => that should be the consumption of the intervalMinutes=> write that away
  3. give reading the current_reading value
  4. move current_date to the next interval
  5. repeat 1-4 until we reached the end-datetime.

but, of course:

I use the “optical” readout from my smartmeter, which then is going through USB to my OH3, which uses the “smartmeter”-binding to update my item0303! :wink:
It could be, however unlikely, that there’s some non linear read-out and some later timestamped value of item0303 is in fact lower than the timestamped value before…?