Description
Running the HBaseContext Examples available at https://github.com/apache/hbase/tree/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext
If you run them in sequence, let's say you run first HBaseBulkPutExample https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkPutExample.scala
and immediately after you run HBaseBulkGetExample https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkGetExample.scala
You get a NullPointerException.
In the API https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkGetExample.scala we have:
defhbaseBulkGet(hc: HBaseContext, tableName: TableName, batchSize: Int, f: (T) ⇒ Get): RDD[(ImmutableBytesWritable, Result)]
Implicit method that gives easy access to HBaseContext's bulk get. This will return a new RDD. Think about it as a RDD map function. In that every RDD value will get a new value out of HBase. That new value will populate the newly generated RDD.
hc
The hbaseContext object to identify which HBase cluster connection to use
tableName
The tableName that the put will be sent to
batchSize
How many gets to execute in a single batch
f
The function that will turn the RDD values in HBase Get objects
returns
A resulting RDD with type R objects
So it seems the function f passed to should be modified as an Scala partial function to handle the case when the Result is null.
One possible fix would be to call in an if Result.isEmpty() to make sure it isn't empty.
The API for Result.listCells expressly says it can return null if there are no results.