"Garbage Collection" on MySQL persistence: deleting old Item states

Tags: #<Tag:0x00007f617e081df8> #<Tag:0x00007f617e081c40>

Intro
If you’re using OpenHAB2’s persistence for some reason or another, you will sooner or later come to one question: why do I save all those old states, which I don’t Need for. My OH2 was running for nearly a year now and I persist nearly all items and the size of the DB growing by the day. So I wrote basically a stored procedure in MySQL to come up with a garbage collection. Why directly in the database? I don’t use other persistences and I don’t want to mess around with connections, the right script languae, cron jobs on one Server or another - so I decided to put the garbage collector, where my data resides.

Persistence in MySQL
Basically I assume, everybody is aware of how MySQL-persistance in OH2 is handled. In short:

  • you get a table “Items” within your database (configured in mysql.cfg)
  • for every Item you add (and your .persist strategy meets!), you will get a new row in Items and a new table Itemxxx
  • the row in Items will get a ItemID (Integer) and the new row will have the Name Item+ItemID

What to do to delete old unwanted data

  1. add column to distinguish “garbage” input from wanted input, which shouldn’t been deleted
  2. create an stored procedure
  3. activate MySQL Event Scheduler
  4. delete garbage entries daily

ad 1)

ALTER TABLE `openHAB2`.`Items` 
ADD COLUMN `ItemGarbage` INT(11) NOT NULL DEFAULT 0 AFTER `ItemName`;

this adds a column “ItemGarbage” to your Items-Repository in the MySQL database. To be on the “safe” side, the default value for this column is “0”, so if OH2 adds another column to your MySQL via the strategy in mysql.persist it won’t be deleted. You have to manually change the value to > 0 to activate the daily garbage collection! :wink:
As a goodie (see 2), you can choose your days, you want the garbage collector to delete old data:
1: just one day
2: two days
… and so on.

just browse through your “Items”-table an mark those Items you don’t need persisted with a timeframe in days.

Hint: my database for oh2 reads “openHAB2”, of yours differ, just change that name in the SQL (also down there in the stored procedure)

ad 2)
I prefer to write a stored procedure, because you can execute it for testing purposes better than written all SQLs within the event scheduler itself:

DELIMITER $$
CREATE PROCEDURE `openHAB2`.`DeleteOldItemValues`()
BEGIN
# declare variables
DECLARE var_table, var_interval VARCHAR(20);
DECLARE var_tableId, var_done, var_garbage INT DEFAULT 0;

# This builds an cursor on all ItemIds, which are marked as "garbage"
DECLARE myCursor CURSOR FOR SELECT ItemId, ItemGarbage FROM Items WHERE ItemGarbage > 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;

# go through all ItemIds and delete old values 
read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' DAY'); # Set the interval for SQL-Statement in days, e.g. ItemGarbage == 1 => SQL-WHERE: ' WHERE TIME < (NOW() - INTERVAL 1 DAY) '
	SET var_table = concat('Item', var_tableID);

	# small workaround as variables won't just fit in as table names - so we have to concat the table_name and then we can PREPARE and EXECUTE the statement
	SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE TIME < (NOW() - INTERVAL ', var_interval, ')');
	PREPARE stmt FROM @sql_text;
	EXECUTE stmt;
	DEALLOCATE PREPARE stmt;
END LOOP;

# close everything and go to sleep 
CLOSE myCursor;
END$$

Basically this procedure will find all Item-Tables with an entry other than 0 and will then delete all values older than the days, you specified in the garbage - column.

ad 3)
now it’s time to activate the MySQL event scheduler_

SET GLOBAL event_scheduler = ON;

with this SQL statement, the MySQL Event Scheduler is activated (something like a cronjob, but within the database itself)

ad 3)
lastly you have to decide, when the scheduler should run your procedure, in that case, it’ll run at 4:00 in the morning everyday. (you could leave out the “STARTS”-part, then MySQL will propably start the procedure at midnight I guess.

CREATE EVENT cleanup_event ON SCHEDULE EVERY 1 DAY STARTS '2017-09-21 04:00:00' DO call DeleteOldItemValues();

that’s it, the database will now delete everything you flagged with a INT higher than 0, saving you some space on your DB-Server over the years…

14 Likes

I few notes / comments which might help some other MySQL newbies like me.

My table is called Openhab2.items and not Openhab2.Items.

Furthermore, my item-tables are listed as item0001 in stead of item1.
I’ve changed the type into int(4) unsigned zerofill in the database. Now the itemID shows 0001.
I’ve done the same in your code; change

DECLARE var_tableId, var_done, var_garbage INT DEFAULT 0;
to
DECLARE var_tableId, var_done, var_garbage INT(4) zerofill DEFAULT 0;

and it all works.

That’s odd! I’m not aware of a change in the MySQL-persistence. Do you use the actual mysql one or do you use the jdbc-mysql?

I use the jdbc-mysql (combined with MariaDB on a Synology).

ok. it seems, the jdbc-mysql handles the tables differently! i use plan mysql.
Thanks for your help!

Hi All!
First of all, very nice Procedure @binderth, I was about doing the exact same before I found your post.

I have one addition to this:
In some cases, you want to delete the trash but keep the latest record even if it’s more than X days old.
So, that openHAB will be able to restore those values after a restart.
To do so, I just select the max(Time) value of each table into a variable and exclude this one in the WHERE clause of the deletion.

These are only a few lines of code:

SET @latest_time = concat('SELECT max(Time) INTO @latest_var FROM ', var_table);
PREPARE psmt FROM @latest_time;
EXECUTE psmt;
DEALLOCATE PREPARE psmt;

SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE TIME < (NOW() - INTERVAL ', var_interval, ') and Time <> \'', @latest_var, '\'');

To use the snipped, just replace the “SET @sql_text…” line with those four lines.

One thing to mention: it does not support completely empty tables because the max() value would then be NULL.

Regards,
Felix

1 Like

Great addition!
I prevent this from happening with forcing the persistence to have at least a value “everyday” (in my case at midnight) - so I have one in case it didn’t change over the day.

How should I change the script to use item-name as my tables in stead of item_suffix names?

More than a year later but I’m trying the procedure like you are mentioning but no rows are deleted. I edited one thing for testing purposes, instead of INTERVAL DAY, I made it INTERVAL MINUTE.

read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' MINUTE');

I also edited this bit with the zerofill on your suggestion:

BEGIN
DECLARE var_table, var_interval VARCHAR(20);
DECLARE var_tableId, var_done, var_garbage INT(4) zerofill DEFAULT 0;

The bit I don’t understand is this:

Furthermore, my item-tables are listed as item0001 in stead of item1 .
I’ve changed the type into int(4) unsigned zerofill in the database. Now the itemID shows 0001.

I changed something but I don’t know if it’s the correct change.

Can you or somebody else point me in the right direction? Thanks!

As a Newbie: Where do I have to save that skript under ad 2) ??? I’m able to change all mysql Code to my Version, but not familiar with scripts like this?

Looks like a very great example, but for Beginners would be great to add some more information like: Save skript under /… or enter this Skript via phpmyAdmin at Navigation point xxx :wink:

Based on examples in this Thread I put all hints and further usefull things together:

If you want to cleanup Old Item-Data all code can be send directly in SQL-Database. For jdbc-mysql-persistence all Commands have to be send in SQL (for example in SQL Window in phpmyadmin). has to be changed to your exact table name, you can see in phpmyadmin.

Following Examples are tested with OH 2.5 SNAPSHOT 1680:

1) Alter Table for Items to set maximal history of data

ALTER TABLE `OPENHAB2-TABLE`.`items` 
ADD COLUMN `deleteafterdays` INT(11) NOT NULL DEFAULT -1 AFTER `itemname`;

Explanation: Adds a new column where you can enter after how many days the data will be deleted. -1 means: No data will deleted, 0 means all Data except newest value will be deleted, values from 1 to x means all values older than x days except newest value will be deleted.

2) Create procedure can be scheduled directly in SQL-Database

DELIMITER $$
CREATE PROCEDURE `OPENHAB2-TABLE`.`DeleteOldItemValues`()
BEGIN
# declare variables
DECLARE var_interval VARCHAR(20);
DECLARE var_table, var_itemname VARCHAR(500);
DECLARE var_tableId INT(4) zerofill DEFAULT 0;
DECLARE var_done, var_garbage, var_noEntry INT DEFAULT 0;
# This builds an cursor on all ItemIds, which are marked as "garbage"
DECLARE myCursor CURSOR FOR SELECT ItemId, itemname, deleteafterdays FROM items WHERE deleteafterdays >= 0;
DECLARE CONTINUE HANDLER FOR NOT FOUND SET var_done = 1;
OPEN myCursor;
# go through all ItemIds and delete old values
read_loop: LOOP
	FETCH myCursor INTO var_tableId, var_itemname, var_garbage;
    IF var_done THEN
      LEAVE read_loop;
    END IF;
	SET var_interval = concat(var_garbage, ' DAY');
    SET var_table = concat(lower(var_itemname), '_', var_tableID);
	
    SET @latest_time = concat('SELECT max(time) INTO @latest_var FROM ', var_table);
    PREPARE psmt FROM @latest_time;
    EXECUTE psmt;
    DEALLOCATE PREPARE psmt;
# small workaround as variables won't just fit in as table names - so we have to concat the table_name and then we can PREPARE and EXECUTE the statement
    IF @latest_var is not null THEN
        SET @sql_text = concat('DELETE FROM ', var_table, ' WHERE time < (NOW() - INTERVAL ', var_interval, ') and time <>  "', @latest_var, '"');
        PREPARE stmt FROM @sql_text;
	    EXECUTE stmt;
	    DEALLOCATE PREPARE stmt;
    END IF;
END LOOP;

CLOSE myCursor;
END$$

Explanation: This SQL-Script is stored in Database and can be used by SQL-Scheduler / EventHandler. If you want to change this script later, you can find it in phpmyadmin at database OPENHAB2-TABLE under procedures (To see after at least on procedure is stored in database).

3) Activate Event-Handler in SQL-Database

SET GLOBAL event_scheduler = ON;

Explanation: Turns on event handler in sql-Database. For more information just google for mysql event handler

4) Create “Cron”-Job for calling cleanup-event

CREATE EVENT cleanup_event ON SCHEDULE EVERY 1 DAY STARTS '2018-05-02 00:00:00' DO call DeleteOldItemValues();

Explanation: Starts Procedure to delete old Data every midnight. If you want to change this CRON later, you can find it in phpmyadmin at database OPENHAB2-TABLE under events (To see after at least on event is activated database).

5) Find configured and unconfigured delete-times with amount of stored values

SELECT A.itemid, A.itemname, A.deleteafterdays, B.TABLE_ROWS from openhab2.items as A, `information_schema`.`TABLES` AS B WHERE concat(A.itemname,'_',LPAD(A.ItemID,4,'0')) = B.TABLE_NAME COLLATE utf8_general_ci AND B.`TABLE_SCHEMA` LIKE 'openhab2' ORDER BY TABLE_ROWS DESC

Explanation: Requests a list with item id, item name, configured value to delete old data and ammount of stored data to priority times to delete in dependence of used storage

Changes done to original Script

  • Changed column names to actual names (uppercase, lowercase,…)
  • ‘Speaking name’ for column to configure days after which data is deleted
  • Changed script to get right data-table-names for item-data, now with lowercased itemname, underscore and id with zerofill
  • Changed length of itemnames, now similar to data-table
  • Using -1 for unconfigured delete-times, so Option available to delete everything except last value
  • integrated and corrected code of emilfelix (missing double-quotes) to except newest value equal of age
  • UPDATE 07.05.2019 - Changed Procedure to prevent error if DeleteData configured and Data-Table is empty
  • UPDATE 11.07.2019 - Added SQL-Command for phpmyadmin to find configured values in dependency of stored data

Open wishes

  • -2 for unconfigured deleting-times, so that -1 could be used all data including last value
3 Likes

Updated Procedure to prevent Errors if Data-Table is empty and no Max-Value found

I am using this approach:

  1. Create a stored procedure (a tip from here):
DELIMITER //
DROP PROCEDURE IF EXISTS `openhab_old_records_cleaning` //

CREATE PROCEDURE openhab_old_records_cleaning (IN db_name VARCHAR(100))


BEGIN
DECLARE finish INT DEFAULT 0;
DECLARE tab VARCHAR(100);
DECLARE cmnd VARCHAR(200);
DECLARE cur_tables CURSOR FOR SELECT table_name FROM information_schema.tables WHERE table_schema = db_name AND table_name REGEXP "^Item[0-9]{1,3}$" AND table_type = 'base table';
DECLARE CONTINUE HANDLER FOR NOT FOUND SET finish = 1;
OPEN cur_tables;
MY_LOOP:LOOP
FETCH cur_tables INTO tab;
IF finish = 1 THEN
LEAVE MY_LOOP;
END IF;

SET @cmnd = CONCAT('DELETE FROM ', tab,' WHERE Time < (NOW() - INTERVAL 30 DAY)');
PREPARE stmt FROM @cmnd;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END LOOP;
CLOSE cur_tables;
END; //

DELIMITER ;
  1. Check if event_scheduler process is running

SHOW PROCESSLIST;

if not:

Enable scheduler on the server level (my.cnf/mariadb.cnf):

[mysqld]
event_scheduler = ON
  1. Create a scheduled event:

     DELIMITER //
     DROP EVENT IF EXISTS clean_openhab //
     CREATE DEFINER = 'root'@'localhost' EVENT clean_openhab
         ON SCHEDULE
     		EVERY 1 DAY 
     	STARTS '2019-10-01 01:0:00.000000'
     	ON COMPLETION PRESERVE
     	ENABLE	
         DO
     		CALL openhab_old_records_cleaning('openHAB');//
    
     DELIMITER ;