Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
3.0.0-alpha-1
-
None
-
None
Description
We're running into an issue using the spark integration when using Hadoop 2.7.2. The problem is this line of code from HBaseContext.scala
ugi.setAuthenticationMethod(AuthenticationMethod.PROXY)
I'm not an expert but I think that's wrong code. If we were to create a Proxy user then we'd need to use {{UserGroupInformation.createProxyUser(...) }} which would also set the realUser etc. Also: I don't think it makes sense to create a proxy user on the client side? The chances are good that the user we're authenticating as doesn't exen have proxy privileges as it's usually only granted to servers.
We've tried to trace where this line of code came from in Git but it was a code drop back in Ted's original repo.
The error we're seeing actually occurs when (in a Spark job) we access HDFS because KMSClientProvider has code like this:
actualUgi = (UserGroupInformation.getCurrentUser().getAuthenticationMethod() == UserGroupInformation.AuthenticationMethod.PROXY) ? UserGroupInformation .getCurrentUser().getRealUser() : UserGroupInformation
But we've never set up the realUser so actualUgi is null which later leads to a NullPointerException.
I think the proper fix is to just remove that line as I have no idea what its intention is. I can provide a patch but I'd like to get input first. Maybe I'm mistaken?