Details
Description
Encountered a compatibility issue while upgrading spark from 2.4 to 3.x (also scala is upgraded from 2.11 to 2.12).
This java code below used to work with spark 2.4 but when migrated to 3.x it gives the error (mentioned below) I have done my own research but couldn't find a solution or any related information.
Code.java
public void test() { final SparkSession spark = SparkSession.builder() .appName("Test") .getOrCreate(); DenseClass denseFactor1 = new DenseClass( new DenseVector( new double[]{0.13, 0.24})); DenseClass denseFactor2 = new DenseClass( new DenseVector( new double[]{0.24, 0.32})); final List<DenseClass> inputsNew = Arrays.asList(denseFactor1, denseFactor2); final Dataset<DenseClass> denseVectorDf = spark.createDataset(inputsNew, Encoders.bean(DenseClass.class)); denseVectorDf.printSchema(); } public static class DenseClass implements Serializable { private org.apache.spark.ml.linalg.DenseVector denseVector; }
The error occurs while creating the dataset denseVectorDf .
Error
}} {{org.apache.spark.sql.AnalysisException: Cannot up cast `denseVector` from struct<> to struct<type:tinyint,size:int,indices:array<int>,values:array<double>>. The type path of the target object is: - field (class: "org.apache.spark.ml.linalg.DenseVector", name: "denseVector") You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object}} {{
I have tried to use double instead of dense vector and it works just fine, but fails on using the dense vector with encoders bean.
StackOverflow link for the issue: https://stackoverflow.com/questions/73313660/error-while-creating-dataset-in-java-spark-3-x-using-encoders-bean-with-dense-ve