Description
A user reported an NPE on startup here:
http://mail-archives.apache.org/mod_mbox/oozie-user/201507.mbox/%3cCALBGZ8oZ0GZ+hf76nQYKxiATHH5g2gbQ_0sQ78uQv_=r4Hct=Q@mail.gmail.com%3e
I did some digging and the problem is that Oozie is trying to load the Sharelib from but the FileSystem class variable is null because the ShareLibService wasn't able to create it on init. That would normally cause Oozie to fail on startup, but the default value of oozie.service.ShareLibService.fail.fast.on.startup is false, so it gets ignored.
The code in question is this:
try { fs = FileSystem.get(has.createJobConf(uri.getAuthority())); //cache action key sharelib conf list cacheActionKeySharelibConfList(); updateLauncherLib(); updateShareLib(); } catch (Throwable e) { if (failOnfailure) { LOG.error("Sharelib initialization fails", e); throw new ServiceException(ErrorCode.E0104, getClass().getName(), "Sharelib initialization fails. ", e); } else { // We don't want to actually fail init by throwing an Exception, so only create the ServiceException and // log it ServiceException se = new ServiceException(ErrorCode.E0104, getClass().getName(), "Not able to cache sharelib. An Admin needs to install the sharelib with oozie-setup.sh and issue the " + "'oozie admin' CLI command to update the sharelib", e); LOG.error(se); } }
where failOnfailure is false by default. So, fs ends up being null, and if anything later tries to use it, you get an NPE.
I think we should do two things here:
- Creating the FileSystem should be in a different try-catch so that the failOnfailure doesn't affect it. The original intention of that behavior was to ignore ShareLib failures, not Hadoop failures.
- We should improve the default Hadoop configuration (i.e. oozie.service.HadoopAccessorService.hadoop.configurations). This has been a problem for a while now where out-of-the-box, Oozie doesn't work even for a local psuedo-cluster because of this config's default. If that's not possible, we need to make it more obvious that user's must configure this before doing anything.
Attachments
Issue Links
- relates to
-
OOZIE-1877 Setting to fail oozie server startup in case of sharelib misconfiguration
- Closed