"Garbage Collection" on MySQL persistence: deleting old Item states

Intro
If you’re using OpenHAB2’s persistence for some reason or another, you will sooner or later come to one question: why do I save all those old states, which I don’t Need for. My OH2 was running for nearly a year now and I persist nearly all items and the size of the DB growing by the day. So I wrote basically a stored procedure in MySQL to come up with a garbage collection. Why directly in the database? I don’t use other persistences and I don’t want to mess around with connections, the right script languae, cron jobs on one Server or another - so I decided to put the garbage collector, where my data resides.

Persistence in MySQL
Basically I assume, everybody is aware of how MySQL-persistance in OH2 is handled. In short:

  • you get a table “Items” within your database (configured in mysql.cfg)
  • for every Item you add (and your .persist strategy meets!), you will get a new row in Items and a new table Itemxxx
  • the row in Items will get a ItemID (Integer) and the new row will have the Name Item+ItemID

What to do to delete old unwanted data

  1. add column to distinguish “garbage” input from wanted input, which shouldn’t been deleted
  2. create an stored procedure
  3. activate MySQL Event Scheduler
  4. delete garbage entries daily

ad 1)

ALTER TABLE `openHAB2`.`Items` 
ADD COLUMN `ItemGarbage` INT(11) NOT NULL DEFAULT 0 AFTER `ItemName`;

this adds a column “ItemGarbage” to your Items-Repository in the MySQL database. To be on the “safe” side, the default value for this column is “0”, so if OH2 adds another column to your MySQL via the strategy in mysql.persist it won’t be deleted. You have to manually change the value to > 0 to activate the daily garbage collection! :wink:
As a goodie (see 2), you can choose your days, you want the garbage collector to delete old data:
1: just one day
2: two days
… and so on.

just browse through your “Items”-table an mark those Items you don’t need persisted with a timeframe in days.

Hint: my database for oh2 reads “openHAB2”, of yours differ, just change that name in the SQL (also down there in the stored procedure)

ad 2)
I prefer to write a stored procedure, because you can execute it for testing purposes better than written all SQLs within the event scheduler itself:

DELIMITER $$
CREATE PROCEDURE `openHAB2`.`DeleteOldItemValues`()
BEGIN
# declare variables
DECLARE var_table, var_interval VARCHAR(20);
DECLARE var_tableId, var_done, var_garbage INT DEFAULT 0;

# This builds an cursor on all ItemIds, which are marked as "garbage"
DECLARE myCursor CURSOR FOR SELECT ItemId, ItemGarbage FROM Items WHERE ItemGarbage > 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;

# go through all ItemIds and delete old values 
read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' DAY'); # Set the interval for SQL-Statement in days, e.g. ItemGarbage == 1 => SQL-WHERE: ' WHERE TIME < (NOW() - INTERVAL 1 DAY) '
	SET var_table = concat('Item', var_tableID);

	# small workaround as variables won't just fit in as table names - so we have to concat the table_name and then we can PREPARE and EXECUTE the statement
	SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE TIME < (NOW() - INTERVAL ', var_interval, ')');
	PREPARE stmt FROM @sql_text;
	EXECUTE stmt;
	DEALLOCATE PREPARE stmt;
END LOOP;

# close everything and go to sleep 
CLOSE myCursor;
END$$

Basically this procedure will find all Item-Tables with an entry other than 0 and will then delete all values older than the days, you specified in the garbage - column.

ad 3)
now it’s time to activate the MySQL event scheduler_

SET GLOBAL event_scheduler = ON;

with this SQL statement, the MySQL Event Scheduler is activated (something like a cronjob, but within the database itself)

ad 3)
lastly you have to decide, when the scheduler should run your procedure, in that case, it’ll run at 4:00 in the morning everyday. (you could leave out the “STARTS”-part, then MySQL will propably start the procedure at midnight I guess.

CREATE EVENT cleanup_event ON SCHEDULE EVERY 1 DAY STARTS '2017-09-21 04:00:00' DO call DeleteOldItemValues();

that’s it, the database will now delete everything you flagged with a INT higher than 0, saving you some space on your DB-Server over the years…

16 Likes

I few notes / comments which might help some other MySQL newbies like me.

My table is called Openhab2.items and not Openhab2.Items.

Furthermore, my item-tables are listed as item0001 in stead of item1.
I’ve changed the type into int(4) unsigned zerofill in the database. Now the itemID shows 0001.
I’ve done the same in your code; change

DECLARE var_tableId, var_done, var_garbage INT DEFAULT 0;
to
DECLARE var_tableId, var_done, var_garbage INT(4) zerofill DEFAULT 0;

and it all works.

That’s odd! I’m not aware of a change in the MySQL-persistence. Do you use the actual mysql one or do you use the jdbc-mysql?

I use the jdbc-mysql (combined with MariaDB on a Synology).

ok. it seems, the jdbc-mysql handles the tables differently! i use plan mysql.
Thanks for your help!

Hi All!
First of all, very nice Procedure @binderth, I was about doing the exact same before I found your post.

I have one addition to this:
In some cases, you want to delete the trash but keep the latest record even if it’s more than X days old.
So, that openHAB will be able to restore those values after a restart.
To do so, I just select the max(Time) value of each table into a variable and exclude this one in the WHERE clause of the deletion.

These are only a few lines of code:

SET @latest_time = concat('SELECT max(Time) INTO @latest_var FROM ', var_table);
PREPARE psmt FROM @latest_time;
EXECUTE psmt;
DEALLOCATE PREPARE psmt;

SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE TIME < (NOW() - INTERVAL ', var_interval, ') and Time <> \'', @latest_var, '\'');

To use the snipped, just replace the “SET @sql_text…” line with those four lines.

One thing to mention: it does not support completely empty tables because the max() value would then be NULL.

Regards,
Felix

1 Like

Great addition!
I prevent this from happening with forcing the persistence to have at least a value “everyday” (in my case at midnight) - so I have one in case it didn’t change over the day.

How should I change the script to use item-name as my tables in stead of item_suffix names?

More than a year later but I’m trying the procedure like you are mentioning but no rows are deleted. I edited one thing for testing purposes, instead of INTERVAL DAY, I made it INTERVAL MINUTE.

read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' MINUTE');

I also edited this bit with the zerofill on your suggestion:

BEGIN
DECLARE var_table, var_interval VARCHAR(20);
DECLARE var_tableId, var_done, var_garbage INT(4) zerofill DEFAULT 0;

The bit I don’t understand is this:

Furthermore, my item-tables are listed as item0001 in stead of item1 .
I’ve changed the type into int(4) unsigned zerofill in the database. Now the itemID shows 0001.

I changed something but I don’t know if it’s the correct change.

Can you or somebody else point me in the right direction? Thanks!

As a Newbie: Where do I have to save that skript under ad 2) ??? I’m able to change all mysql Code to my Version, but not familiar with scripts like this?

Looks like a very great example, but for Beginners would be great to add some more information like: Save skript under /… or enter this Skript via phpmyAdmin at Navigation point xxx :wink:

Out of Date: New Version later in Thread

Based on examples in this Thread I put all hints and further usefull things together:

If you want to cleanup Old Item-Data all code can be send directly in SQL-Database. For jdbc-mysql-persistence all Commands have to be send in SQL (for example in SQL Window in phpmyadmin). has to be changed to your exact table name, you can see in phpmyadmin.

Following Examples are tested with OH 2.5 SNAPSHOT 1680:

1) Alter Table for Items to set maximal history of data

ALTER TABLE `OPENHAB2-TABLE`.`items` 
ADD COLUMN `deleteafterdays` INT(11) NOT NULL DEFAULT -1 AFTER `itemname`;

Explanation: Adds a new column where you can enter after how many days the data will be deleted. -1 means: No data will deleted, 0 means all Data except newest value will be deleted, values from 1 to x means all values older than x days except newest value will be deleted.

2) Create procedure can be scheduled directly in SQL-Database

DELIMITER $$
CREATE PROCEDURE `OPENHAB2-TABLE`.`DeleteOldItemValues`()
BEGIN
# declare variables
DECLARE var_interval VARCHAR(20);
DECLARE var_table, var_itemname VARCHAR(500);
DECLARE var_tableId INT(4) zerofill DEFAULT 0;
DECLARE var_done, var_garbage, var_noEntry INT DEFAULT 0;
# This builds an cursor on all ItemIds, which are marked as "garbage"
DECLARE myCursor CURSOR FOR SELECT ItemId, itemname, deleteafterdays FROM items WHERE deleteafterdays >= 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;
# go through all ItemIds and delete old values
read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_itemname, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' DAY');
    SET var_table = concat(lower(var_itemname), '_', var_tableID);
	
    SET @latest_time = concat('SELECT max(time) INTO @latest_var FROM ', var_table);
    PREPARE psmt FROM @latest_time;
    EXECUTE psmt;
    DEALLOCATE PREPARE psmt;
# small workaround as variables won't just fit in as table names - so we have to concat the table_name and then we can PREPARE and EXECUTE the statement
    IF @latest_var is not null THEN
        SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE time < (NOW() - INTERVAL ', var_interval, ') and time <>  "', @latest_var, '"');
        PREPARE stmt FROM @sql_text;
	    EXECUTE stmt;
	    DEALLOCATE PREPARE stmt;
    END IF;
END LOOP;

CLOSE myCursor;
END$$

Explanation: This SQL-Script is stored in Database and can be used by SQL-Scheduler / EventHandler. If you want to change this script later, you can find it in phpmyadmin at database OPENHAB2-TABLE under procedures (To see after at least on procedure is stored in database).

3) Activate Event-Handler in SQL-Database

SET GLOBAL event_scheduler = ON;

Explanation: Turns on event handler in sql-Database. For more information just google for mysql event handler

4) Create “Cron”-Job for calling cleanup-event

CREATE EVENT cleanup_event ON SCHEDULE EVERY 1 DAY STARTS '2018-05-02 00:00:00' DO call DeleteOldItemValues();

Explanation: Starts Procedure to delete old Data every midnight. If you want to change this CRON later, you can find it in phpmyadmin at database OPENHAB2-TABLE under events (To see after at least on event is activated database).

5) Find configured and unconfigured delete-times with amount of stored values

SELECT A.itemid, A.itemname, A.deleteafterdays, B.TABLE_ROWS from openhab2.items as A, `information_schema`.`TABLES` AS B WHERE concat(A.itemname,'_',LPAD(A.ItemID,4,'0')) = B.TABLE_NAME COLLATE utf8_general_ci AND B.`TABLE_SCHEMA` LIKE 'openhab2' ORDER BY TABLE_ROWS DESC

Explanation: Requests a list with item id, item name, configured value to delete old data and ammount of stored data to priority times to delete in dependence of used storage

Changes done to original Script

  • Changed column names to actual names (uppercase, lowercase,…)
  • ‘Speaking name’ for column to configure days after which data is deleted
  • Changed script to get right data-table-names for item-data, now with lowercased itemname, underscore and id with zerofill
  • Changed length of itemnames, now similar to data-table
  • Using -1 for unconfigured delete-times, so Option available to delete everything except last value
  • integrated and corrected code of emilfelix (missing double-quotes) to except newest value equal of age
  • UPDATE 07.05.2019 - Changed Procedure to prevent error if DeleteData configured and Data-Table is empty
  • UPDATE 11.07.2019 - Added SQL-Command for phpmyadmin to find configured values in dependency of stored data
  • UPDATE 10.11.2020 - Add Out of Date Hint
    Open wishes
  • -2 for unconfigured deleting-times, so that -1 could be used all data including last value
3 Likes

Updated Procedure to prevent Errors if Data-Table is empty and no Max-Value found

I am using this approach:

  1. Create a stored procedure (a tip from here):
DELIMITER //
DROP PROCEDURE IF EXISTS `openhab_old_records_cleaning` //

CREATE PROCEDURE openhab_old_records_cleaning (IN db_name VARCHAR(100))


BEGIN
DECLARE finish INT DEFAULT 0;
DECLARE tab VARCHAR(100);
DECLARE cmnd VARCHAR(200);
DECLARE cur_tables CURSOR FOR SELECT table_name FROM information_schema.tables WHERE table_schema = db_name AND table_name REGEXP "^Item[0-9]{1,3}$" AND table_type = 'base table';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET finish = 1;
OPEN cur_tables;
MY_LOOP:LOOP
FETCH cur_tables INTO tab;
IF finish = 1 THEN
LEAVE MY_LOOP;
END IF;

SET @cmnd = CONCAT('DELETE FROM ', tab,' WHERE Time < (NOW() - INTERVAL 30 DAY)');
PREPARE stmt FROM @cmnd;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE cur_tables;
END; //

DELIMITER ;
  1. Check if event_scheduler process is running

SHOW PROCESSLIST;

if not:

Enable scheduler on the server level (my.cnf/mariadb.cnf):

[mysqld]
event_scheduler = ON
  1. Create a scheduled event:

     DELIMITER //
     DROP EVENT IF EXISTS clean_openhab //
     CREATE DEFINER = 'root'@'localhost' EVENT clean_openhab
         ON SCHEDULE
     		EVERY 1 DAY 
     	STARTS '2019-10-01 01:0:00.000000'
     	ON COMPLETION PRESERVE
     	ENABLE	
         DO
     		CALL openhab_old_records_cleaning('openHAB');//
    
     DELIMITER ;
    

Great tips here!
One more thing to add is, that - at least in my setup - I have quite a lot consecutive data rows with the same value. So if someone could add something like deleting all but the first and maybe last entry of these sequences - also depending on some configuration - would be really appreciated.

Your approach deletes all older than 30 days if I see right. That’s totally different to be able to configure each item individually when to delete old data :wink:

Part One: Deleting Old Values

Based on examples in this Thread I put all hints and further usefull things together:

If you want to cleanup Old Item-Data all code can be send directly in SQL-Database. For jdbc-mysql-persistence all Commands have to be send in SQL (for example in SQL Window in phpmyadmin). has to be changed to your exact table name, you can see in phpmyadmin.

Following Examples are tested with OH 2.5 SNAPSHOT 1680:

1.1) Alter Table for Items to set maximal history of data as last column

ALTER TABLE `OPENHAB2-TABLE`.`items` 
ADD COLUMN `deleteafterdays` INT(11) NOT NULL DEFAULT -1;

Explanation: Adds a new column where you can enter after how many days the data will be deleted. -1 means: No data will deleted, 0 means all Data except newest value will be deleted, values from 1 to x means all values older than x days except newest value will be deleted.

1.2) Create procedure can be scheduled directly in SQL-Database

DELIMITER $$
CREATE PROCEDURE `OPENHAB2-TABLE`.`DeleteOldItemValues`()
BEGIN
# declare variables
DECLARE var_interval VARCHAR(20);
DECLARE var_table, var_itemname VARCHAR(500);
DECLARE var_tableId INT(4) zerofill DEFAULT 0;
DECLARE var_done, var_garbage, var_noEntry INT DEFAULT 0;
# This builds an cursor on all ItemIds, which are marked as "garbage"
DECLARE myCursor CURSOR FOR SELECT ItemId, itemname, deleteafterdays FROM items WHERE deleteafterdays >= 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;
# go through all ItemIds and delete old values
read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_itemname, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' DAY');
    SET var_table = concat(lower(var_itemname), '_', var_tableID);
	
    SET @latest_time = concat('SELECT max(time) INTO @latest_var FROM ', var_table);
    PREPARE psmt FROM @latest_time;
    EXECUTE psmt;
    DEALLOCATE PREPARE psmt;
# small workaround as variables won't just fit in as table names - so we have to concat the table_name and then we can PREPARE and EXECUTE the statement
    IF @latest_var is not null THEN
        SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE time < (NOW() - INTERVAL ', var_interval, ') and time <>  "', @latest_var, '"');
        PREPARE stmt FROM @sql_text;
	    EXECUTE stmt;
	    DEALLOCATE PREPARE stmt;
    END IF;
END LOOP;

CLOSE myCursor;
END$$

Explanation: This SQL-Script is stored in Database and can be used by SQL-Scheduler / EventHandler. If you want to change this script later, you can find it in phpmyadmin at database OPENHAB2-TABLE under procedures (To see after at least on procedure is stored in database).

1.3) Activate Event-Handler in SQL-Database

SET GLOBAL event_scheduler = ON;

Explanation: Turns on event handler in sql-Database. For more information just google for mysql event handler

1.4) Create “Cron”-Job for calling cleanup-OldValues

CREATE EVENT cleanup_OldValues ON SCHEDULE EVERY 1 DAY STARTS '2018-05-02 00:00:00' DO call DeleteOldItemValues();

Explanation: Starts Procedure to delete old Data every midnight. If you want to change this CRON later, you can find it in phpmyadmin at database OPENHAB2-TABLE under events (To see after at least on event is activated database).

1.5) Find configured and unconfigured delete-times with amount of stored values

SELECT A.itemid, A.itemname, A.deleteafterdays, B.TABLE_ROWS from openhab2.items as A, `information_schema`.`TABLES` AS B WHERE concat(A.itemname,'_',LPAD(A.ItemID,4,'0')) = B.TABLE_NAME COLLATE utf8_general_ci AND B.`TABLE_SCHEMA` LIKE 'openhab2' ORDER BY TABLE_ROWS DESC

Explanation: Requests a list with item id, item name, configured value to delete old data and ammount of stored data to priority times to delete in dependence of used storage

Changes done to original Script

  • Changed column names to actual names (uppercase, lowercase,…)
  • ‘Speaking name’ for column to configure days after which data is deleted
  • Changed script to get right data-table-names for item-data, now with lowercased itemname, underscore and id with zerofill
  • Changed length of itemnames, now similar to data-table
  • Using -1 for unconfigured delete-times, so Option available to delete everything except last value
  • integrated and corrected code of emilfelix (missing double-quotes) to except newest value equal of age
  • UPDATE 07.05.2019 - Changed Procedure to prevent error if DeleteData configured and Data-Table is empty
  • UPDATE 11.07.2019 - Added SQL-Command for phpmyadmin to find configured values in dependency of stored data
  • UPDATE 10.11.2020 - Just minor change of cleanup-event name
  • UPDATE 15.11.2020 - editorial changes

Part Two: Deleting tables for inactive items

ADDED 10.11.2020 Please read Part One first, explainations already given there are not repeated.

If you want to cleanup old tables from unuses / now inactive Items, this could be with a mix of SQL and some minor changes in openhab2 (item-configuration and one new rule)

Be careful to avoid data loss or openhab2 didn’t work anymore. All scripts tested on my own openhab2 V2.5 Release System with jdbcSQL Persistence

2.1) Alter Table for Items to set maximal inactive time until tables and item is deleted as last column

ALTER TABLE `OPENHAB2-TABLE`.`items` 
ADD COLUMN `inactivecycles` INT NOT NULL DEFAULT 0 

Explanation: Adds a new column where the cycles (in my case monthly) are stored, while the item is inactive.

2.2) Create procedure can be scheduled directly in SQL-Database

DELIMITER $$
CREATE PROCEDURE `OPENHAB2-TABLE`.`DeleteOldItems`()
BEGIN
# declare variables
DECLARE var_table, var_itemName VARCHAR(500);
DECLARE var_itemID INT(4) zerofill DEFAULT 0;
DECLARE var_done, var_inactivecycles, var_noEntry INT DEFAULT 0;
# This builds an cursor on all ItemIds, which are marked as "garbage"
DECLARE myCursor CURSOR FOR SELECT ItemId, itemname, inactivecycles FROM items WHERE inactivecycles > 6;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;
# go through all ItemIds and delete old ValueTable and then Item from Items_table
read_loop: LOOP
	FETCH myCursor INTO var_itemID, var_itemName, var_inactivecycles;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
    SET var_table = concat(lower(var_itemName), '_', var_itemID);
	
# Drops table for inactive item:
    SET @sql_text = concat('DROP TABLE ', var_table);
    PREPARE stmt FROM @sql_text;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;

# Deletes entry for inactive Item from items-table
    SET @sql_text = concat('DELETE FROM items WHERE items.ItemId = ', var_itemID);
    PREPARE stmt FROM @sql_text;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;

END LOOP;

CLOSE myCursor;
END$$

Explanation: In my case the inactivecycles must be higher then 6 (month). You can change in 4th line starting with DECLARE to change it.

2.3) Activate Event-Handler in SQL-Database (only if not activated with Part One 1.3)

SET GLOBAL event_scheduler = ON;

Explanation: Look at 1.3

2.4) Create “Cron”-Job for calling cleanup-OldItems

CREATE EVENT cleanup_OldItems ON SCHEDULE EVERY 1 DAY STARTS '2020-11-10 23:55:00' DO call DeleteOldItems();

Explanation: Look at 1.4

2.5) Add a group for all items in openhab2-configuration.
Example for groups.items:

Group gAll

Explanation: This group all items has to be in is used to lookup if item is active in following rule.

2.6) Add all items and groups to this group

Examples:

String Itemname "Label" <icon> (gAll)
Group Groupname "Label" <icon> (gAll)

Work carefully at this step: Missing Item in this group can cause data loss and openhab2 problems at restart. Be sure to add group at every item and group!

2.7) Add rule, which raises inactive cycle periodically and mark active items with -1 in field inactive cycles

Rule SQLmaintenance.rules:

var mysql_host = "localhost"
var mysql_user = "MYSQL-USER-OPENHAB"
var mysql_pw = "MYSQL-USER-PASSWORD"
var mysql_db = "openhab2"

rule "Mark inactive Items in SQL"
when
// Monthly every 9th at 23:45
    Time cron "0 45 23 9 * ? *"
then
    logInfo("SQL-Maintenance", "Increase cycle item is inactive for all items in SQL")
    var String sql_query = "/bin/sh@@-c@@mysql -h " + mysql_host + " -u " + mysql_user + " -p" + mysql_pw + " -e \"UPDATE items set inactivecycles = inactivecycles + '1';\" -D " + mysql_db
    var String mysql = executeCommandLine(sql_query, 5000)

    logInfo("SQL-Maintenance", "Mark openhab2-items as active (-1) in SQL now")
    gAll?.members.forEach   [activeItem|
        sql_query = "/bin/sh@@-c@@mysql -h " + mysql_host + " -u " + mysql_user + " -p" + mysql_pw + " -e \"UPDATE items set inactivecycles = '-1' WHERE itemname = '" + activeItem.name + "';\" -D " + mysql_db
        mysql = executeCommandLine(sql_query, 5000)
    ]
end

Explanation: First Part increases every time the rule fires the field inactivecycles. Second Part do a loop through all items and groups in gAll and overwrite inactivecycles with -1 (-1 means active item)

Be sure to check the file rights for the rule because of listed mysql-Credentials. Should be openhab:openhab 640 to avoid reading through others.

2.8) Check inactive items to be sure all active are in group gAll

If you didn’t change cycle you now have 6 month time to check the table items in mysql. All Items with value > 0 will be deleted and their tables locked if the entry reaches 7. If you want to hurry up: Just change rule and fire it minutly two or three times and set it back to month. then you can check immediatly.

To check look items with inactivecycles > 0 and do a search over your item configuration for the item name.

  • UPDATE 15.11.2020 - editorial changes, deleted double explanation to Part One

Part Three: Deleting unneeded values

ADDED 15.11.2020 Please read Part One first, explainations already given there are not repeated.

Example:
You historicize values at changes and minutly to get clear changes in graphs. If you have a) 1, b) 1, c) 1, d) 1, e) 2, f) 2, g) 3, so all would look same with only a) 1, d) 1, e) 2, f) 2, g) 3. The following deletes double values in mysql-table except first and last one with same value.

Because of performance-reasons the procedure will only cleanup part of the table every time it runs, starting with newest data, maximal the first 200 values. So prefer set the trigger to every hour. If new data is cleared, old deleted, then after some time your data will be fine.

Be careful to avoid data loss or openhab2 didn’t work anymore. All scripts tested on my own openhab2 V2.5 Release System with jdbcSQL Persistence

Additional information: after one week of running, amount of data rows for openhab2 went down from 9.7m to 4.2m, in storage from 360MB to 140 MB.

3.1) Alter Table for Items to configure this function

ALTER TABLE `OPENHAB2-TABLE`.`items` 
ADD COLUMN `deletedouble` INT NOT NULL DEFAULT 0 

Explanation: Adds a new column where you can configure which tables should be considered (value manually has to be set to 1).

3.2) Create procedure can be scheduled directly in SQL-Database

DELIMITER $$
CREATE PROCEDURE `OPENHAB2-TABLE`.`DeleteDoubleValues`()
BEGIN
# declare variables
DECLARE var_table, var_itemname VARCHAR(500);
DECLARE var_tableId INT(4) zerofill DEFAULT 0;
DECLARE var_done INT DEFAULT 0;
# This builds an cursor on all Item-Tables
DECLARE myCursor CURSOR FOR SELECT ItemId, itemname FROM items WHERE deletedouble=1;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;

# go through all ItemIds and delete double and unneeded values (previousValue and next value are the same)
read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_itemname;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
    SET var_table = concat(lower(var_itemname), '_', var_tableID);
	select concat('DEBUG - ', 'Bearbeite Tabelle: ', var_table) AS 'Debug:';
    SET @del_sql = concat('DELETE del FROM ', var_table, ' AS del JOIN (SELECT * FROM (((SELECT ROW_NUMBER() OVER(ORDER BY time DESC) AS rowA, time as timeA, value as valueA from ', var_table, ' LIMIT 200) as A JOIN (SELECT ROW_NUMBER() OVER(ORDER BY time DESC) AS rowB, value as valueB from ', var_table, ' LIMIT 200) as B ON A.rowA=B.rowB+1 ) JOIN (SELECT ROW_NUMBER() OVER(ORDER BY time DESC) AS rowC, value as valueC from ', var_table, ' LIMIT 200) as C ON A.rowA=C.rowC-1 ) where A.valueA=B.valueB and A.valueA=C.valueC) as indel ON del.time=indel.timeA');
    PREPARE psmt FROM @del_sql;
    EXECUTE psmt;
    DEALLOCATE PREPARE psmt;
END LOOP;

CLOSE myCursor;
END$$

Explanation: For performance reasons only first 200 entries are considered. If you want to do a bigger block each run, you have to change LIMIT 200 three times in the Procedure. But be carefull, raise this will increase mysql exponential. You can test increase after some time if all existing tables are optimized.

3.3) Activate Event-Handler in SQL-Database (only if not activated with Part One 1.3 or 2.3)

SET GLOBAL event_scheduler = ON;

Explanation: See 1.3

3.5) Create “Cron”-Job for calling cleanup-OldItems

CREATE EVENT cleanup_OldItems ON SCHEDULE EVERY 1 HOUR STARTS '2020-11-10 23:55:00' DO call DeleteDoubleValues();

3.6) Optional: Check running time of procedure

Following commands could be used to check runtime (Bash and mysql):

bash:#/ mysql -u USER -p
mysql> connect openhab2;
mysql> call DeleteDoubleValues();

Explanation: Logs in to mysql, then connect to openhab-Database. Then it calls the procedure and you have an incrementing value of seconds you can see the full runtime. User and Database name must be changed to yours.

  • UPDATE 15.11.2020 - Added information about check the running time
  • UPDATE 19.11.2020 - Added Feedback

Hope you like it, hints, failures or wishes just per personell message to avoid tutorial will get lost in thread

Open wishes

  • Waiting for further interesting ideas to keep sql perfomant and cleaned up :slight_smile: Just write per personally message.
5 Likes

Version for openhab3.jdbc.persistence in new thread under tutorials and examples with updated syntax in DSL rules:

3 Likes

Hey Jens,
I use your Procedere with success to keep my database clean. My SQL knowledge is not that much so I am not able to add some features :-).
Is there an “easy” possibility to keep the last two values after the cleaning instead of only the last?

Thanks in advantage and greetings
Andy

The easiest way would be to raise deleteafter days e.g. to 30 (days), so all values last 30 days will stay. But thats not the same as you wanted.

May I ask why you specially want to have the last 2 values?

Hey, thanks for the fast reply.
I use five upnp players with some kind of “Auto Play” function when they appear in the network after I switch them on. Therefore I store a numeric value which represents the station selected (0=OFF, 1=Statio1, and so on.).
So the last value is 0 when I switch them of. I need the second value which represents the latest Station which was selected before it was switched off. Sometimes the Players are not used for some weeks and after the cleaning process the last Station is gone.
Hope my description is understandable :slight_smile:
Greetings Andy