UTF-8 encoding in JSR223

I’m having trouble getting Jython/JSR223 to correctly output my UTF-8 encoded umlauts:

This Python code
battery_check.log.info("Batteriestände werden überprüft")

in an (of course) UTF-8 encoded file results in
Batteriestände werden überprüft

Any idea what I’m missing ?
FWIW, using 2.5M6, Jython 2.7.1, Visual Studio 1.40.2

1 Like

Would this work:

battery_check.log.info(u"Batteriestände werden überprüft")

It forces unicode

1 Like

Do you have: -Dfile.encoding="UTF-8" in your JAVA_EXTRA_OPTS? If not, does adding it help?

Yes I added that after posting (and yes of course, restarted OH/Java), but unfortunately it did not change anything. Is it really “UTF-8” ? Read “utf8” somewhere, too.

Not sure, but WIndows has it set to UTF-8 (Without the quotes if that makes any difference) by default in setenv.bat as a result of a fix for Windows users getting the same issue.

Yes! Thanks. Now how do I make that the default ?

In Python 2, the default encoding is ASCII, and you can set the encoding for the file… PEP 263 – Defining Python Source Code Encodings | peps.python.org

# -*- coding: utf-8 -*-

I played with this before when helping @RRoe migrate lucid back into the openhab Scripters repo, which you’ll see in ideAlarm. I am getting an error using this right now, but I’m not sure if that is due to my version of Jython (2.7.0) or an issue with my Jython bundle. BTW, I have gotten errors using 2.7.1 and 2.7.2b2 due to this bug. In Python 3 (look for GraalVM in OH3), the default is UTF-8.

When you are using special characters, you need a unicode string. I’ll add an entry for unicode strings in the But How Do I section of the HL docs. This will be probably be one of the trickiest bits for noobs to understand, using scripted automation. Using English, you don’t run into this often, except for QuantityTypes, e.g…

items["Temperature_Item"] > QuantityType(u"55 °F")

When writing your rules, if a variable could possibly contain special characters, like Item names, Item labels, QuantityTypes, etc., and it is used in a string, use unicode!

1 Like