Jenkins build problems

Did you try to delete the job’s workspace before executing the job?

If this doesn’t work it is probably the best to contact the support.

Ok,

  • I slept - still no good idea or success
  • I contacted Cloudbees support - no changes were done on their end and they don’t have any advice
  • I deleted the workspace and also created a fully new build plan - same effect

Will try to have new ideas - any suggestions welcome :frowning:

Coming from the Hazelcast experience with Cloudbees, they often have issues with DNS and other systems. Sometimes restarting the underlying VM just helps but I would recommend (if not yet enabled) to clean the repo on each build and re-download all the artifacts.

Tried with a clean local Maven repo many times, didn’t help either…

mh :frowning: Did Cloudbees restart the VM?

Partial success: By running the very old Maven 3.0.5, the PR builds actually work again: https://openhab.ci.cloudbees.com/job/PR-openHAB2-Addons/4369/

This does not help us for https://openhab.ci.cloudbees.com/job/openHAB2-Bundles/ though, since we require at least Maven 3.1 on that one…

And I still have no clue why this downgraded solves that issue…

I would guess a transitive dependency update and something is broken with that.

@Kai,
it also could be something “twisted” on the actual. build machine at cloudbees. My expirience is that if things like this happen, they changed configs or settings of the jenkins build host.
Do we have ssh access? If not, can we ask them to assign a “new” machine to test this maybe?
Just some ideas…

1 Like

We could also try to execute it on a Jenkins installation that is not hosted on Cloudbees (if this is possible). I have got a Jenkins installation (on a VServer) where I could try it.
I am not wrong it should be possible to execute this job without any other preceeding jobs. Maybe I would need some login data (Artifactory, Travis) or I would remove these steps from the job.

A different instance won’t really help us.
Meanwhile, Cloudbees support has managed to set up a copied build plan (https://openhab.ci.cloudbees.com/job/test-zd47405/) that after many re-configurations suddenly succeeds. Unfortunately, nobody sees any difference to our real build plan and the problem remains on other plans like the distro build…
I am now working on cleaning/restructuring the pom&dependency setup of the projects in the hope that I can then more clearly identify and isolate the root cause. Will keep you posted.
At least, the PR builds work for now, so this takes away some time pressure.

~That~ Did you already try it with the check disabled or even the complete dependency to checks gone, that could just be a PR?

A mvn dependency tree reveals some small conflicts but none look really interesting:

[11:52:21] : [Step 1/4] [INFO] | | | - (org.slf4j:slf4j-api:jar:1.6.4:provided - omitted for conflict with 1.7.12)
[11:52:21] : [Step 1/4] [INFO] | | - (jline:jline:jar:2.7:provided - omitted for conflict with 2.14.1)

Some code was just merged to improve the compatibility of the static-code-analysis project with older maven version.

Tried it with the static code analysis tool completely being removed from the pom.xml - no difference.

This plan for openhab-core just succeeded: https://openhab.ci.cloudbees.com/job/openHAB-Core/
It also runs the Karaf verification (which fails on the other builds).

Did that and spent many hours for it. All repos should now be nicely decoupled from each other. Unfortunately there is no improvement on Jenkins :frowning:

Just released a new version of the static analysis tool with this change, but it does not make a difference (as expected after aboves observation that even without that tool it fails).

I removed our plan and renamed the new build plan and changed it to deploy the artifacts AFTER a successful build to Artifactory (this change cannot possibly have any relevance) - executed it after the renaming and it swiftly failed as well :frowning:

  • Loaded Jenkins config from disk - no effect
  • Restarted Jenkins - no effect

Again very much out of ideas :rotating_light:
Sorry for that…

Do we fetch the data from the same artifactory as we publish it to? Can it be that we publish (corrupt) distributables which can not be loaded back in properly?

While running on my buildserver it fetches this one remotely:

[19:06:49][Step 1/3] Downloading: https://repo.eclipse.org/content/repositories/snapshots/org/openhab/io/org.openhab.io.transport.feed/2.1.0-SNAPSHOT/maven-metadata.xml
[19:06:49][Step 1/3] Downloading: JFrog
[19:06:51][Step 1/3] Downloaded: JFrog (2 KB at 1.0 KB/sec)

I have compared the log files and found a difference in the maven parameters:

The successful build used these parameters:

Executing Maven:  -B -f /scratch/jenkins/workspace/test-zd47405/pom.xml -U clean install -DskipChecks=true -P verify-features

The in the failed jobs the profile “verify-features” was missing:

Executing Maven:  -B -f /scratch/jenkins/workspace/openHAB2-Bundles/pom.xml -U clean install -DskipChecks=true

Another difference was this message that only appeared in the failed builds (several times):

[INFO] Using Groovy-Eclipse compiler to compile both Java and Groovy files

Can you change the job to create a verbose log of the Karaf feature generation using e.g.

(cd openhab-core/features/openhab-core; mvn -X org.apache.karaf.tooling:karaf-maven-plugin:4.0.8:features-generate-descriptor)

Hm, AFAIK it is part of “aether-api”

@Kai did you also check if Jenkins is not using some kind of mixed up Maven version? I can reproduce exactly the same Exception when I copy /lib/aether-*.jar from a Maven 3.0.5 to a Maven 3.3.3 /lib installation folder.

Hm, AFAIK it is part of “aether-api”

From looking onto these jars, I can confirm that.

Additionally, there is a “switch” in karaf’s DependencyHelperFactory which either uses RepositorySystem from the sonatype namespace or the eclipse one. From looking at the code, it seems as if it decides for the sonatype one because the plexus thinks it is available, but actually it’s not in the classpath.

I don’t really get why it is not reproducible outside of cloudbees. There must be something in the way how maven is started there which is leads to this difference. Nevertheless, I created a build with debug logging and exception stacktraces (i.e. -e -X active) here. Unfortunately it fails too early to give us (me?) any additional useful info and also the effectively used aether versions cannot be seen.

Any further ideas anybody?