ZRAM status

Tags: #<Tag:0x00007f2fb5a5dac0>

This article is to track and document the current status of the zram feature in openHABian.
I’ll update it if there’s significant news.

What is zram and how does it work ?

The zram config utility for swap, directory & log is an OS enhancement for IoT / maker projects for reducing SD, NAND and eMMC block wear via write operations such as logging and persistence in openHAB. It uses compression to minimise precious memory footprint and extremely infrequent write outs and near ram speed working dirs with memory compression ratios depending on compression algorithm chosen.

Uses a table in /etc/ztab where any combination and number of zram drives can be created. This branch uses an OverlayFS mount with zram so that syncFromDisk on start is not needed. This should allow for quicker boots and larger directories as no complete directory copy needed as it’s the lower mount in the OverlayFS.

In openHABian implementation, swap, /var/log/* and /var/lib/openhab2/* are moved into zram.
There’s a log in /usr/local/share/zram-config/log/

On proper service shutdown (zram-config stop), zram’ed directories (OverlayFS) will be a) synced to the ‘lower’ filesystem (the same dirname on / filesystem located on default boot medium)
and b) the ‘upper’ filesystem (the part in memory) will be lazily unmounted.
Lazy unmounting means there can still be processes to have open files on this directory. That’s required to run (and keep running) system processes using dirs such as e.g. /var/log.

Do’s and Don’ts

  • DON’T USE reboot and halt COMMANDS. reboot/halt on some Unices (plural of “UNIX”) including openHABian/Raspbian jessie(?) and older may skip the steps to properly rundown your machine’s services including zram result in all (recently changed) files being lost.
    buster (the Raspbian release that the latest openHABian 1.5 image is built on) is known to work, but it only applies to boxes installed using this image or properly upgraded to buster which does not happen automatically if your box was installed before this was available…
    There’s reports of older OS versions to fail here although it can be misleading that even a proper shutdown kills networking before eventually syncing zram to disk so you might believe it skips shutdown scripts although it does not. But you only can see that if you have attached a console.
    By the way ’shutdown -r' is the proper command to use.
    If you want to be on the safe side: stop openHAB using sudo systemctl openhab2 stop, wait for it to finish and then manually stop zram: sudo zram-config stop
    You must use a proper procedure to shutdown/reboot your server to have it sync files to disk - otherwise they get lost.
    That’s /var/log and /var/lib/openhab2 and essentially applies to logs and persistence data.
    Short of undiscovered bugs, the tool/script zram-config will take care of this sync when called with the stop parameter. Note there’s a large number of alternative commands/options to reboot and how they work differs in Raspbian let alone among UNIX implementations.
    We don’t neither know or could document all of them. If you don’t find “your” way of booting mentioned to be known to work then it possibly isn’t safe.

  • DON’T USE zram-config sync (even if available in your installed version) unless you know what you’re doing for test purposes.
    This is known to NOT work in a number of situations and may cause data loss.

  • you MUST NOT switch off your openHABian server unless you have properly shut it down.
    While this has been a requirement unrelated to zram and essentially ever since UNIX exists, it’s amazing how many people still do this today.
    Put it on a UPS to safeguard it from power outages.

  • Make use of a backup solution such as Amanda to have daily backups of your zram’ed directories.

  • you shouldn’t use zram unless you’re on a SBC (small, “single board” computer such as Raspberry Pis) to run its OS off a medium based on flash memory such as SD card or (!) USB stick.
    Yes, USB sticks are no better or safer than internal SD cards - another good reason for zram.

  • don’t run zram on machines to have even LESS than 1GB of RAM such as a Pi Zero W.

  • don’t run it on non-SBCs or modified SBCs to run off “safe” media such as SSD or HDD.
    Well you can do, but there’s just a negligible benefit in doing that so the cost/risk-to-benefit ratio is bad.

  • If (and only if) you have a SBC system with MORE than 1 GB such as a Raspi 4 with 2 or 4 GB or an Odroid C2 with 2 GB, you may increase the amount of RAM assigned to zram in /etc/ztab, starting with /var/lib/openhab2.
    That’ll help staying away from the ‘cache’ issue mentioned below. Don’t worry zram will not occupy that full maximum amount unless it really needs to.

  • you may change the number and size of directories to cache by changing entries in /etc/ztab .
    For example if you consider it to be too risky to run all of /var/lib/openhab2 off zram because you want say jsondb to not be on there, you can put a comment sign in front of that line or replace it with 2 lines for /var/lib/openhab2/tmp and …/cache. But be aware that that on the downside this will increase write load on your SD and thus the likelihood of getting hit by wearout.

  • here’s statistics available: the file /sys/block/zram<id>/mm_stat has all the info you need. The stat file represents device’s mm statistics. It consists of a single line of text and contains the following stats separated by whitespace:

  orig_data_size   uncompressed size of data stored in this disk.
		  This excludes same-element-filled pages (same_pages) since
		  no memory is allocated for them.
                  Unit: bytes
 compr_data_size  compressed size of data stored in this disk
 mem_used_total   the amount of memory allocated for this disk. This
                  includes allocator fragmentation and metadata overhead,
                  allocated for this disk. So, allocator space efficiency
                  can be calculated using compr_data_size and this statistic.
                  Unit: bytes
 mem_limit        the maximum amount of memory ZRAM can use to store
                  the compressed data
 mem_used_max     the maximum amount of memory zram have consumed to
                  store the data
 same_pages       the number of same element filled pages written to this disk.
                  No memory is allocated for such pages.

For the complete set of statistics see the kernel docs.

Known issues

You might run into a misbehavior. Please let me know when you do and let me know the relevant details, have you done thing anything latety which possibly was a trigger ? Edited a .items or .rules file ? Added a thing or binding that might come with a problem ? Changed openHAB version (incl. milestones, snapshot)?

  • after systemctl stop zram-config or /usr/local/bin/zram-config stop, a zram device /dev/zramX might persist (X is a number). You can try removing it via zramctl -r /dev/zramX. If that fails, reboot to get rid of it. Eventually systemctl disable openhab2 before so it doesn’t start automatically after boot.
  • Running openHAB2 off zram’ed directories (which is what we want as it is the whole point about this feature), OH2 makes use of Karaf and that comes with a ‘cache’. Depending on your OH config that’s generating 200+ MBs of changed data that zram needs to hold in RAM right from the OH start. Unfortunately the Karaf ‘cache’ size cannot be controlled to limit use to a specified maximum amount of RAM. There’s no choice to cache some parts while dropping others (so the name ‘cache’ is somewhat misleading here).
    The zram RAM size assignments are defined in /etc/ztab and they’re a tradeoff between what OH uses (usually ~500-600 MB on ARM, varies depending on OH config) and what Karaf needs for its cache (~200-250 MB, also depending on OH config). You’ll quickly notice there’s not much headroom on any SBC to typically have 1GB of RAM.
    That means that if you start OH from an empty/cleared cache, it’ll use those 200+ MB in RAM/zram right away, and any change in operations will be on top.
    The (hopefully safe) method to work around is to:
  1. shutdown openHAB2: sudo systemctl stop openhab2

  2. delete the cache: sudo rm -rf /var/lib/openhab2/tmp/* /var/lib/openhab2/cache/* /opt/zram/openhab2.bind/tmp/* /opt/zram/openhab2.bind/cache/*

  3. start openHAB2, have it complete initialization of items, rules etc: sudo systemctl start openhab2

  4. shutdown openHAB2 again: sudo systemctl stop openhab2

  5. stop zram to make it sync to disk (notably the Karaf-generated files in /var/lib/openhab2/cache and …/tmp): sudo /usr/local/bin/zram-config stop

  6. start zram again: sudo /usr/local/bin/zram-config start

  7. start openHAB2 again: sudo systemctl start openhab2

    4-7 should execute if you (see above) properly reboot (shutdown -r) your machine but again if you want to be on the safe side do it manually.

  • zram-config sync is a feature to allow for “online sync” that is being worked on, but there’s no ETA.
    For now it is said to fail in a number of situations. I encourage everyone to help with testing but be aware that’s at your own risk of data loss. Drop me a note if you’re volunteering.

  • you might see permission denied messages saying the system cannot write to disk into either of these directories or files below because they`re read-only.
    This has not been reported by users but occasionally showed up during testing. There’s a lot of potential reasons for this and it is still subject to investigation as to what’s the most likely/most frequent reason to cause this.
    Zram does never sync to disk unless you properly shut it down (and to do so you need to unmount the zram directories).
    Just like on disk, the amount of RAM that ZRAM needs to have available must be larger than the amount of data to store in there but it’s a little more complicated than that because the data also is compressed with a varying compression factor so the exact amount of raw data to fit in is unknown.
    The amount of memory ZRAM has available is static (defined in /etc/ztab).
    Now if something happens to write more changed data to zram’ed directories than there fits into, ‘permission denied’ is how the system protects itself against even more severe impacts.
    You can try deleting zram’ed data but usually you would need to (and the safer method clearly is to) shutdown your system.

9 Likes

@mstormi Markus,

My system with zram is misbehaving. I suspect I have corrupted the zram configuration. I tried to re-install zram using openhabian-config but it can no longer find you zram repository in gitHub. Does this mean that this BETA trial is not moving forward at this time? If your repository is no longer available, should the openhabian-config entry be deleted as well?

I have disabled zram (systemctl disable zram-config) for now. Is there anything I need to do to remove the directories/mounts? Any other cleanup?

At this point, utilities cannot seem to be able to write any logs. I have run the “fix permissions” entry in openhabian-config. I’ve had to disable logging for mosquitto for it to function properly. Something is seriously hosed and I need to resolve it. Any guidance is greatly appreciated.

Regards.

Mike

No. I’m waiting for the original author to transfer his repo, he’s slow to answer unfortunately. Meanwhile you can install from his, see this thread.

You might also be able to fix your install, unless you deleted binaries or directories I believe all you need to do is fix /etc/ztab (if you changed it), re-enable and reboot. The original is at /opt/openhabian/includes/ztab. I’d suggest to increase disksize for/var/lib/openhab2 to 600+MB while you’re at it.

No, you need to stop and eventually reboot. See if df and zramctl show any remainders.
Anything else I’m afraid likely is no zram issue.

Thanks Markus. I’ll make note of that thread for future reference.

I have decided to revisit ZRAM at a later date. Right now the Chromecast binding (and others) are in conflict with the REST Docs (Jackson library incompatibility). I am planning to migrate to MQTT2 and leveraging the REST Docs to clone my Things more easily (than hundreds of PaperUI clicks). All this to say that I’ll revisit ZRAM when I revisit MQTT2 (hopefully M4 resolves the Jackson library issues).

Mike

P.S.

I’m not sure what happened with Mosquitto. I had to uninstall it and reinstall it to resolve the issue. My openHAB deployment is stable again… I hope :wink: