Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3807

SparkSql does not work for tables created using custom serde

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.1.0
    • 1.1.1, 1.2.0
    • SQL
    • None

    Description

      SparkSql crashes on selecting tables using custom serde.

      Example:
      ----------------

      CREATE EXTERNAL TABLE table_name PARTITIONED BY ( a int) ROW FORMAT 'SERDE "org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer" with serdeproperties("serialization.format"="org.apache.thrift.protocol.TBinaryProtocol","serialization.class"="ser_class") STORED AS SEQUENCEFILE;

      The following exception is seen on running a query like 'select * from table_name limit 1':

      ERROR CliDriver: org.apache.hadoop.hive.serde2.SerDeException: java.lang.NullPointerException
      at org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer.initialize(ThriftDeserializer.java:68)
      at org.apache.hadoop.hive.ql.plan.TableDesc.getDeserializer(TableDesc.java:80)
      at org.apache.spark.sql.hive.execution.HiveTableScan.addColumnMetadataToConf(HiveTableScan.scala:86)
      at org.apache.spark.sql.hive.execution.HiveTableScan.<init>(HiveTableScan.scala:100)
      at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
      at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:188)
      at org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:364)
      at org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:184)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
      at org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:280)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
      at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
      at org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
      at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:402)
      at org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:400)
      at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:406)
      at org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:406)
      at org.apache.spark.sql.hive.HiveContext$QueryExecution.stringResult(HiveContext.scala:406)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:59)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:291)
      at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:413)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:226)
      at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
      at java.lang.reflect.Method.invoke(Unknown Source)
      at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)
      at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
      at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
      Caused by: java.lang.NullPointerException

      After fixing this issue, when some columns in the table were referred in the query, sparksql could not resolve those references.

      Attachments

        Activity

          People

            Unassigned Unassigned
            chiragaggarwal chirag aggarwal
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: