General translation directives

DanielMalmgren · December 9, 2020, 1:08pm

Hi.
I thought I’d heed the call from @ysc to help with translation, I’m kinda good at swedish

One first question: As a general rule, should openHAB specific terms like “Item”, “Thing” etc be translated or is it better to stick to the english term? Of course there are swedish words for this, but my general feeling is that it will be confusing for the user if they are translated.

ysc · December 9, 2020, 1:14pm

@DanielMalmgren https://crowdin.com/project/openhab-webui/discussions/10 just answered that very question here
You have language-specific discussion rooms on Crowdin btw!

Nadahar · July 4, 2022, 4:51am

I have just completed the translation to Norwegian for every openHAB project I could find on Crowdin that has Norwegian defined as a target language, except for the add-ons project which is huge and the quality of the strings seems “questionable” several places I looked. In the add-ons I only translated a selection of bindings that I use myself. I have no idea what many of them even do, so translating them would be very hard in addition to very time consuming.

I am the owner of another project that is on Crowdin, and a manager for yet another, so I know Crowdin reasonably well. I don’t know how you have solved things internally and what are the “rules” for languages to be included with releases. Even though there were some 40% or so already translated to Norwegian, I can’t remember to have seen any Norwegian words anywhere in openHAB this far, so I assume there’s a certain threshold that must be met. If the threshold requires that the strings are approved in addition to being translated, I hope you have an active and able proofreader for Norwegian because there are probably around 10000 new words translated in Norwegian across the sub-projects. So, my primary goal with this post is to make sure that the time I have put it to this isn’t worthless - that the translations are used in the next release.

My secondary goal is to try to call for a bit better discipline among those who write the original (English) strings. I’ve seen many strange things there, some sub-projects have a LOT of duplicate strings, and while they are generally quick to fill in using TM suggestions, it give you as a translator a feeling that this isn’t really something anybody cares much about. I know from my own experience how much it helps to make things a little bit easier for the translators, provide some context and not the least, source strings that are actually correct and have a clear meaning. Every second saved by not caring to write a proper sentence in English is paid for in multiples by each and every translator, and that can be very demotivating if you find it to be a rule rather than an exception.

Capitalization in particular is really crazy many places. So many works are capitalized for seemingly no good reason, I sometimes suspected that they were written by a German speaker that constantly “forgot” that nouns aren’t capitalized in most languages, including English. There’s also a lack of consistency, I read a “warning” that Items, Things, Sitemap etc should be capitalized and NOT be translated because that had been decided, even though I’d argue that it’s not a good decision, at least not for Norwegian. I get the idea that you want them to be named consistently, but there’s another, much better solution for that. If you define these words (and other phrases that should be used consistently too) in the glossary per language, Crowding will do a good job at reminding the translators to use the same terms everywhere. The reason I think it doesn’t work well is because of the different forms, the endings you have to use depending on case, time, plural form etc simply don’t “fit” or exist for the English words. Because of that, the whole thing feels very “ad hoc” and unprofessional when you have to try to improvise as best you can to get something sensible out of it.

Anyway, what I was starting to say was about the inconsistency. Even though it is mandated that we should NOT translate these words and that they should be capitalized, in 30-40% of the cases they are NOT capitalized in English. This is demotivating because it feels like you’re being asked to do something that the authors doesn’t bother to do themselves, but it also means that Crowdin’s “information message” regarding these rules aren’t shown (because they are tagged to the capitalized version only). Punctuation is also highly variable, and I understand that there are many different authors involved, but some minimum standard could be ensured during code review.

A few strings simply don’t make any sense, at least to me, so I couldn’t translate them as I had no idea what they were supposed to say. Quite a few words that shouldn’t be translated like name or String.format() format strings are also included for translation even though they very much should not be. And that’s my last point - I saw a few comments made by translators, some quite old, and I made quite a few myself. But, not once did I see an answer to any of the issues or requests for context. If you want to have a thriving community of translators, you must ensure that when there are questions or problems, they are addressed in a timely fashion. Remember that it’s very unlikely that the translator that wrote the comment is the only one facing that problem. Most people just don’t see the point in adding a lot of “me too” comments beneath, so by ignoring these you might cause frustration for a lot of different people. It all adds up, so be aware that some translators can j"suddenly disappear". This can happen for a multitude of reasons obviously, but my experience is that if you manage to make them feel that they do something useful and that the work they put into it is actually used, they are much more likely to stick around.

DanielMalmgren · July 4, 2022, 7:08am

I wish I could say that @Nadahar is wrong, but unfortunately I have to agree with most of what is said above.

One thing I might mention is the possibility for a developer to put in a description for a string. Sometimes it’s impossible to know from just a short string what it’s used for, but if the developer has put in a short note about it, it’s much easier.

Also, these problems don’t only scare away translators, they of course affect the end product, which makes it look unprofessional. If I start using a program that’s badly translated it doesn’t really matter how good it performs, it looks amateurish anyway…

wborn · July 4, 2022, 7:20am

Hi @Nadahar! Thanks for helping out with the translations.

Like anything in openHAB, it’s an ongoing effort which can always be improved upon.

If you know where to find it, some of the issues you mention are documented:

When translations are merged: Managing translations
How to use or not use caps: Formatting Labels and Descriptions

There is a check static-code-analysis#360 which we can enable to make texts more consistent.

The String.format patterns are there because the pattern used for formatting of date/time is locale specific. Typically you’ll only find patterns for date/time but it could for instance also make sense to add patterns which involve currencies.

If there aren’t many proofreaders for a language, strings are also approved after some sanity checks to get the translations going! Even if you’re not a native speaker, you can still check if they look like actual translations (chars match language alphabet) and it is not just copy/pasted English.

Nadahar · July 4, 2022, 11:32am

Yeah, I was a bit afraid of sounding “to harsh” - which wasn’t my intention. As I’m both a developer and a Norwegian translator in other projects, I know this from both sides and I won’t claim that the strings I myself create are always “perfect”. But since I know both sides, I think I try to care a little bit more than many when I create them. My post is meant as constructive criticism, to hopefully give the “ongoing effort” a small “push” in the right direction

Thanks, as with anything in openHAB it can be somewhat overwhelming to find just that piece of information that you’re looking for. When it comes to capitalization I am generally familiar with the “capitalization rules” for labels, titles etc. What I was trying to describe is the lack of consistency of how this is applied, but also that there’s a lot of capitalized words in “longer texts” that aren’t labels/titles/names.

Without digging into the details of that check, I can’t evaluate to which degree I think it would make a difference. It could very well be helpful, but in general I’ve found that analyzing “human language texts” automatically usually has a limited impact. I think that a bit of extra attention from the humans involved might be the best solution

I was probably unclear on this, I know why they are there and I use them quite a lot myself when creating translatable strings. What I’m trying to address is the fact that there are strings that only consists of format patterns and that doesn’t have any natural language in them, that are submitted. They obviously have nothing to do at Crowdin - they should not be translated.

I employ a different strategy in the projects I manage. As the manager I can approve translations in all languages, but I rarely do in languages that I do not understand. I can evaluate Norwegian, Swedish, Danish and to some extent German, but then it pretty much stops. Many other European languages I might be able to evaluate somewhat if I’m “lucky” with the string content, but there’s a lot of languages that I won’t touch at all (Arabic, Hebraic, Farsi, Chinese, Japanese, Korean, Filipino, Russian, Ukrainian etc.).

Instead I use a different strategy: I try to make sure to have at least one native proofreader for each language. I look at contributions and if they are in a reasonable quantity and looks to be of a decent quality, I contact the translators via Crowdin PM and have a short conversation about their intentions, whether they are willing to stick around etc. I usually manage to find “decent” proofreaders without too much effort.

In addition I include non-approved translations. My reasoning is the following: Crowdin has a voting system where translators can vote translations up or down. The highest voted translation will be used unless one is approved, in which case that will always be used. If you limit yourself to only approved translations, you “disable” this functionality completely. I only want quality translations to be approved, so I need a native speaker to make that decision. A less-than-perfect translation is usually better than no translation, in which the English string will be presented to the users instead.

If the worry is pure “spam”, that people fill in “bullshit” to somehow sabotage, I’ve yet to experience this. Also, my proofreader for that language will quickly pick that up and delete those (proofreaders can delete translations). I can also “ban” such users manually should it be necessary. Copy/pasted English almost falls into the same category, but: A person that’s there to sabotage probably will do “worse” than just pasting the English text. A person that’s just lazy won’t volunteer to translate in the first place, so you generally don’t find them among translators. The one exception was when one of the cryptocurrencies had some program to “help” open source software by awarding people with coins for “helping” in various ways, where the number of approved Crowdin translations were one of the criteria. That was rough, I got PMs with people asking for proofreader permissions constantly, and a lot of really bad translations were submitted. But this is a one-time thing that lasted for a month or so, and I haven’t seen anything like it since. Also, the result of just pasting the English text isn’t that catastrophic. The English strings is what will be used where translations are missing anyway, so to the user that will make no difference. You obviously don’t want such “translations” to be approved though, but instead be subject to the “voting rules” and preferably be deleted by the proofreader, so that it doesn’t “count” like a translation making you think you have a better coverage than you do.

wborn · July 5, 2022, 7:22am

Feel free to come up with PRs to fix any incorrect usage of caps.

IIRC @cweitkamp initially configured the Crowdin config for core/add-ons with something that seemed reasonable and so far there has not been a compelling reason to make any changes to it.

It might make sense to change it if the number of translations for languages without a proofreader becomes overwhelming such that the managers cannot keep up with it.

It would also be nice if Crowdin had an option to only require accepted translations for languages that have active proofreaders.

Also note that some repos like openhab-webui use a different config that is more like what you use… though I don’t think it results in more translations or the translations being merged faster.

Nadahar · July 5, 2022, 1:57pm

I would actually have done that on several occasions already (not for translation strings though) were it not for the DCO that requires you to post your full name and e-mail online. Unfortunately, it effectively blocks me, and many others I’m sure, from contributing.

Sure, I’m just trying to give some feedback after spending some intense 24 hours or so doing little else than making openHAB translations. I’m not saying it’s terrible, but I see some potential improvements that I think will increase both the quantity and quality of translations.

My decisions for how to do things isn’t about getting translations in as fast as possible. It’s about getting the highest quality translations. I always prefer approved translations, and Crowdin will always use the approved translation if one exists. I just don’t want “inferior” translations to be approved, so I want a native speaker with some dedication to do the approvals. If that can’t be achieved, I think Crowdin’s vote system has a better chance to achieve that than managers that don’t really know the language approving “what they think looks OK”.

wborn · July 5, 2022, 2:12pm

If it is just about changing caps/typos, that would fall into the small patch exception category:

There are several exceptions to the signing requirement. Currently these are:

Your patch fixes spelling or grammar errors.

Your patch is a single line change to documentation.

Nadahar · July 5, 2022, 2:14pm

OK, I wasn’t aware of that. I’ve just generally had to accept that I can’t contribute, and has long since removed the “openHAB version of Eclipse” etc.

Nadahar · July 5, 2022, 7:31pm

I just made this:

github.com/openhab/openhab-android

Remove untranslatable string from strings.xml

openhab:main ← Nadahar:Fix

opened 07:27PM - 05 Jul 22 UTC

Nadahar

+3 -9

There is one string in `strings.xml` where there's nothing to translate. I see t…hat you have added some explanatory context, but I argue that it doesn't really help a translator much in this case. What exactly should a translator do? Replace the colon? It's very hard for a translator to know what the consequences of that would be, and I don't know if there's a language where such a change would even be viable. I see it instead as an "obstacle" that confuse translators. Not a big deal by itself, but many small streams... I have never seen Kotlin code before, and have absolutely no idea about its syntax. I don know Java well though, and this change was simple enough that I gave it a shot. I managed to get it running on an emulator in Android Studio, although I didn't figure out how to trigger using that exact string. At least it's compiling/running. I made this as a consequence of [this post](https://community.openhab.org/t/general-translation-directives/109812/9), but I see now that even this tiny change exceeds the [tiny patch exception](https://www.openhab.org/docs/developer/contributing.html#small-patch-exception), so it probably will never be merged even if you happen to agree with the change itself. I'll let it stay for a little while anyway, since I've already done it, so that one of you can use it as a "guide" to make the same change if you so wish.

However, it’s apparent to me that even this tiny change goes beyond the “small patch exception”. So, while this was a waste of time, it serves as a good example of the unreasonableness of the whole DCO arrangement. I’d like to see that lawsuit that I could win by claiming copyright over the code with such a change…

hmerk · July 5, 2022, 7:45pm

No, absolutely not.
Yes, checks show that DCO failed, but this is because it cannot be configured with exceptions.
Maintainers can and will override the failed DCO check in such cases.
So don‘t worry, it was no waste of time.