Jenkins build problems

Kai · April 1, 2017, 5:39pm

All,

Our builds are broken since yesterday (see https://openhab.ci.cloudbees.com/job/openHAB2-Bundles/ and https://openhab.ci.cloudbees.com/job/openHAB1-Addons/) - I honestly have no clue what the error message wants to tell me and why this happens all of a sudden.

If anyone of you has an idea and can help, please let me know - I would highly appreciate it!

Cheers,
Kai

MHerbst · April 1, 2017, 6:50pm

Hi Kai,

looks really strange: I have compared it with the last successful build and can’t see an obvious reason for the problem.

Some time ago I had a strang build problem that was caused by a damaged jar file in the local maven repository. I have removed the subdirectory from the maven repository to trigger a new download and this helped. Maybe it is a similar problem. You could try to remove directory /home/jenkins/.m2/repository/org/apache/karaf.

You could try to run maven with options -X -e to get more detailed information.

Martin

Kai · April 2, 2017, 4:14pm

Thanks for the suggestion, unfortunately, both do not really seem to help much, see https://openhab.ci.cloudbees.com/job/openHAB2-Bundles/351/consoleFull.

MHerbst · April 2, 2017, 5:11pm

Perhaps I came a bit closer to the cause of the problem: the missing class is org.sonatype.aether.RepositorySystem. This class is part of “aether-util”. If you now look at the classpath you can see that it contains two versions of this library:

18:11:42.641  |  [ERROR] urls[3] = file:/home/jenkins/.m2/repository/org/sonatype/aether/aether-util/1.11/aether-util-1.11.jar
18:11:42.641  |  [ERROR] urls[4] = file:/home/jenkins/.m2/repository/org/eclipse/aether/aether-util/0.9.0.M2/aether-util-0.9.0.M2.jar

The log file even shows references to aether-util 1.7.

Is it possible that one of pom files was changed after the last successful build and this change now causes the problem.

Kai · April 2, 2017, 6:51pm

My local build does not load aether-util-0.9.0.M2.jar, only 1.11 and 1.7. So yes, maybe this is the issue - but I have no clue, why this (pretty old) version is loaded on Cloudbees.

Is it possible that one of pom files was changed after the last successful build and this change now causes the problem.

No, not really - it especially also started failing for openhab1 and openhab2 at the same time.

MHerbst · April 2, 2017, 7:13pm

It is really confusing. The Sonatype project page on github shows this message:

DEPRECATED: This project moved to Eclipse, please follow the link below to find the new sources. Archived Projects | The Eclipse Foundation

The last version published by Sonatype was 1.13. In the Eclipse repo you find versions from 0.9.0 M1 to 1.0.2. But I am not really sure if the Eclipse and the Sonatype packages contain the same classes …

No, not really - it especially also started failing for openhab1 and openhab2 at the same time.
Maybe the problem is caused by a third party component that has changed its POM and contains a wrong reference.

I have checked the log of the last successful build and tried to compare it with the first one that failed. But I can’t see any relevant differences .

I don’t know whether it is possible for you to run a test build with a newer Karaf version at least for the failed build step.

Kai · April 2, 2017, 7:21pm

It is not the Maven build itself, something must be up with the Cloudbees build plan.
For a test, I just successfully executed https://openhab.ci.cloudbees.com/job/openHAB-Core/ - this is doing nothing else than what https://openhab.ci.cloudbees.com/job/openHAB2-Bundles/ is failing with…

Kai · April 2, 2017, 9:14pm

Just spent 3 hours on this crap, giving up now

chris · April 2, 2017, 10:02pm

Maybe it’s worth speaking to Cloudbees support - I recall a problem about a year ago where the builds broke and we couldn’t work out why. I spoke to CB support and they admitted there was a problem with a recent upgrade and they reverted it for us after I asked. Maybe it’s a similar issue…

bob_dickenson · April 2, 2017, 11:53pm

My advice – sleep on it. It is amazing how much a little rest can do to clarify things. I cannot begin to count the number of times in my career where the next morning (or even during a REM state) some heretofore opaque but now obvious solution appears. Seriously, sleep on it.

MHerbst · April 3, 2017, 6:17am

Did you try to delete the job’s workspace before executing the job?

If this doesn’t work it is probably the best to contact the support.

Kai · April 3, 2017, 9:39am

Ok,

I slept - still no good idea or success
I contacted Cloudbees support - no changes were done on their end and they don’t have any advice
I deleted the workspace and also created a fully new build plan - same effect

Will try to have new ideas - any suggestions welcome

noctarius · April 3, 2017, 10:32am

Coming from the Hazelcast experience with Cloudbees, they often have issues with DNS and other systems. Sometimes restarting the underlying VM just helps but I would recommend (if not yet enabled) to clean the repo on each build and re-download all the artifacts.

Kai · April 3, 2017, 3:38pm

Tried with a clean local Maven repo many times, didn’t help either…

noctarius · April 3, 2017, 3:52pm

mh Did Cloudbees restart the VM?

Kai · April 3, 2017, 4:59pm

Partial success: By running the very old Maven 3.0.5, the PR builds actually work again: https://openhab.ci.cloudbees.com/job/PR-openHAB2-Addons/4369/

This does not help us for https://openhab.ci.cloudbees.com/job/openHAB2-Bundles/ though, since we require at least Maven 3.1 on that one…

And I still have no clue why this downgraded solves that issue…

noctarius · April 3, 2017, 5:01pm

I would guess a transitive dependency update and something is broken with that.

MARZIMA · April 4, 2017, 8:25am

@Kai,
it also could be something “twisted” on the actual. build machine at cloudbees. My expirience is that if things like this happen, they changed configs or settings of the jenkins build host.
Do we have ssh access? If not, can we ask them to assign a “new” machine to test this maybe?
Just some ideas…

MHerbst · April 4, 2017, 4:57pm

We could also try to execute it on a Jenkins installation that is not hosted on Cloudbees (if this is possible). I have got a Jenkins installation (on a VServer) where I could try it.
I am not wrong it should be possible to execute this job without any other preceeding jobs. Maybe I would need some login data (Artifactory, Travis) or I would remove these steps from the job.

Kai · April 4, 2017, 9:16pm

A different instance won’t really help us.
Meanwhile, Cloudbees support has managed to set up a copied build plan (https://openhab.ci.cloudbees.com/job/test-zd47405/) that after many re-configurations suddenly succeeds. Unfortunately, nobody sees any difference to our real build plan and the problem remains on other plans like the distro build…
I am now working on cleaning/restructuring the pom&dependency setup of the projects in the hope that I can then more clearly identify and isolate the root cause. Will keep you posted.
At least, the PR builds work for now, so this takes away some time pressure.