OH3: how to persist data to an external relational database

edolis · December 21, 2020, 2:14pm

Hello everyone.
Just wanted to share the results of the investigation and my current approach in persisting data to an external database (mySQL in specific) from OH3. Absolute beginner so please take what I say about OH3 with critical spirit.

What is all this for?
Once your data are stored in a relational database, you can - if you are handy at SQL - define the relationships between data to be able to reaggregate and query them more quickly.
In any case, you can extract and work on the data using common spreadsheets like MS Excel

The problem
In short, I need to persist data from a sensor I made (ESP8266) which is broadcasting data using a MQTT broker. OpenHAB connects the broker and displays data. Nice.

I wish I could
I wish I could store the data, after charting, to a mySQL database -as I need them stored accurately to further develop my project (in the future I think of using different readings of luminosity,humidity,temperature,soil humidity to find the ideal condition to let different plants grow in my orchard, by controlling shading, watering and the like)
What I tried
I tried setting up a mySQL persistence in OH3. I did not know exactly what I would end up to, to I tried setting up. Various issues, but i fixed them
What was not satisfying
I think natively there is just a partial coverage of my “wish I could”. Limitations being:
1. When you’ve got a sensor board broadcasting different data at the same time, data seem to be always treated individually by openHAB -I still have to investigate how I can attach metadata and if this can help to fix the issue, but to make it short the practical effect is that you’ll end up having the JDBC mySQL persistence create different tables for each of the data items.
  Below, for instance, the three tables created when persisting the three readings of my sensor board
  
  image666×774 80.3 KB
  
  This is not OK for me for some reasons:
  a) you miss the link between the data. Three readings for the same original timestamp are being treated as (potentially - following item on this) three unrelated reading. This is messy when you want to analyze data, using the timestamp to identify a set of data read “at the same time”
  b) openHAB optimization, to compress data I bet, posts data to these tables only on value changes. That means that if you have a equipment sending 3 sensor data every second and of the three sensor data one is constant while the other is everchanging, you’ll get in one table 1 line (0 potentially) and 10 in the other.
  c) memory use in the database. In mySQL every datatime uses 5bytes. a DECIMAL(3,1) uses 2bytes. So to store the original package set of 1 timestmp and three readings in a proper form, I need 5+2+2+2=11 bytes. With OH persistence -which creates a table with a DATETIME and DOUBLE fields - (neglecting the optimization mentioned) (5+4)*3=27 bytes. Things get obviuosly worse if your thing has multiple sensors and broadcasts even more data readings at the same time. Moreover double is an approximate value, while DECIMAL is exact value (it will make analysis easier to read, I’m aware that obviously the sensor readings are approx ).
  d) OH optimization is good but brings data loss. Two examples: if a reading is not being broadcasted when it stays constant, you cannot simply compute an average with a database function, as a 10 minutes value gets the same relevance of a one time glitch. Second one is that you don’t know if the silence from OH means that the data is constant or simply your sensor is dead and is not broadcasting anymore. Not good.

OK, enough for the foreword.

My current setup to fix this
a) have the sensor broadcast a JSON packet through MQTT and create an Item in OH to get it
{ "time": "2020-12-21 14:53:37.", "temperature": 13.6, "humidity": 66.2, "senstemp": 12.7 }
b) potentially use JSONPATH to extract the single data and use them to feed Items showing the three data in the OH UI
c) OH will persist the data in a table similar to this

don’t get confued by the time value -more on that in item (e). It should be an actual timestamp from OH

d) set up a new table with optimized structure, something like

e) add a trigger to the table created by the JSONDB mySQL persistence, that is:
CREATE DEFINER=root@localhost TRIGGER bundle_tsdc_0007_BEFORE_INSERT BEFORE INSERT ON bundle_tsdc_0007 FOR EACH ROW BEGIN
INSERT INTO data_tsdc SET date=NEW.value->"$.time",temp=NEW.value->"$.temperature",tempsens=NEW.value->"$.senstemp",
umidita=NEW.value->"$.humidity";
SET NEW.time=“2000-01-01 00:00:00”;
END

basically what I am doing here is - when a new data point is pushed to the table by OH, I

pick the values and push them to the optimized table
alter the timestamp to a standard time point, here “2000-01-01 00:00:00”. This will bring to overwrite the previous entry and have - actually - OH store just one row in this table.

Result is what I expected, that is:
optimized table

OH transit table

perk of this are

I can read directly in the DB when was the last trasmission from the equipment
if any error occurs during the data push to the optimized table, the unimported value will sit there and won’t get lost. It can be examined at anytime and recovered by the db admin.

Hope this helps some other newbie.

Improvements

looking into this Design Pattern: Group Based Persistence to restrict mySQL persistence to this bundle only, as the rest is not useful anymore
Trying to figure out if I can add a field in the OH table to identify the equipment, so that I can control easily at database level (extracting the last timestamp from the data) when every equipment transmitting sensor data was seen. Can be done adding an equipment ID to the JSON string, though.
Would be nice to improve the JDBC persistence with some of this. Not sure where to start tough. Not even found the code of the persistence in the GIT repo.

rlkoshak · December 21, 2020, 6:32pm

Maybe just a typo but you have it backwards. It’s stored to a database first and the charts are generated from there.

openHAB is very self centered when it comes to persistence. It saves data for it’s own purposes (analysis is not one of it’s purposes) in it’s own ways and that usually means per Item. If you want something else, you need to do that outside of openHAB. openHAB is a home automation platform, not a data collection and analysis platform.

Perhaps by default. But you can define one an Item, by Item basis if you want, exactly when and how openHAB stores data. See the docs (Persistence | openHAB). You can save periodically (e.g. everyMinute), on change, on update, or on command.

DOUBLE more closely represents the value as it’s stored in memory in openHAB before it gets stored anyway. And again, persistence is designed for openHAB’s use.

As I said, you have full control over this.

Or you can do this with an Item and the timestamp profile.

One important thing to understand is openHAB’s interface to databases is, by necessity, the lowest common denominator between SQL (JSBD), NoSQL (e.g. MongoDB), Time Series (e.g InfluxDB), and round robin databases (e.g. rrd4j). And openHAB’s purpose with Persistence is to save the states of Items to:

generate charts of Items for display
restore the state of Items to their most recently saved value on startup
use the historic state of an Item (or aggregation thereof) in Rules.

Combining Items into a table with multiple columns isn’t part of that because the concept of tables and columns doesn’t even exists in all the supported databases. Adding metadata and extra timestamps and such is supported because openHAB doesn’t need it. So anything like what you’ve done, (thank you for posting it by the way), will necessarily have to be done outside of openHAB. If you want something more friendly for analysis outside of openHAB, you should either bypass openHAB and save it to a database directly or do something like you have here and create something outside of openHAB to restructure the data as desired.

The rest of the issues are not actually issues as you have full ability to define which Items are saved under what conditions.

edolis · December 21, 2020, 6:40pm

well, after, before -not really important for me - I need just data storage and charting in OH

Well, I expected that using a JDBC mySQL persistence was finalized at having the data persisted in usable form in a standard DB. If data can be persisted only as OH desires, I don’t get it -what’s the point of bothering using a specific persistence?

rlkoshak · December 21, 2020, 7:05pm

That’s quite a subjective term; “usable”. It persists the data in a form that’s usable to itself. It persists it in a form that is usable for a large number of external tools to analyze. It may not be “usable” for your specific use case but that doesn’t negate all the rest.

Because different people have different requirements and needs. Some are able to use the data just as openHAB stores it. Some may have tools and/or preferences for different types of databases.

Ultimately, it’s because at some point a developer decided they needed to support database X and they spent their time coding up that support and submitted it to the project.

edolis · December 21, 2020, 7:11pm

OK I hope readers understand what I mean.
Usable the same way relational databases are usable. You can also dump the individual data points in a folder as text files, that’s a database in broad terms, not a database usable for data analysis and processing. Lacking the relations between data, in the logical and time frames.
Not here for arguments, though. If you have suggestions to change the title of the topic, please share.

rlkoshak · December 21, 2020, 7:22pm

No, it’s a good tutorial. If I’d change anything I’d include MySQL/MariaDB in the title since it’s only applicable to those databases. Even other SQL type databases have different ways to set up triggers and not all databases that OH 3 supports does this tutorial even make sense for.

I’ve never argued against the tutorial itself. I’m just trying to explain why OH persistence works the say it does and correct the parts that impliy that you can’t configure how often the data is stored which covers a lot of your concerns.

edolis · December 21, 2020, 9:11pm

changed the title reference to relational databases. mySQL is not really important (it was the only relational I knew in the list when I had to pick one) - SSRV, mySQL, etc they have their own dialect things, but it takes nothing to translate using internet resources.