Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-7349

Spark structured streaming didnt work after upgrade from hudi 0.11 to 0.13

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.13.0
    • None
    • spark, spark-sql
    • None

    Description

      We have Spark structured streaming job writing data in hudi format. After we made an upgrade from hudi 0.11.0 to hudi 0.13.0, the streaming app doesn't write data to existing hudi table. The streaming app started successfully, triggered listing job but didn't trigger any other job to compact, clean , write data , etc. No errors in Spark UI nor Stdout/Stderr logs. When running the streaming application to write to new s3 location (hudie table), everything works fine.  We use append output mode and 30 seconds trigger processing time. 

      Here are hudi configurations used (confiscated some values with xxx): 

      'hoodie.datasource.write.table.type': 'MERGE_ON_READ',
      'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.CustomKeyGenerator',
      'hoodie.datasource.write.precombine.field': 'xxx',
      'hoodie.datasource.write.partitionpath.field': 'xxx:SIMPLE',
      'hoodie.embed.timeline.server': False,
      'hoodie.index.type': 'BLOOM',
      'hoodie.parquet.compression.codec': 'snappy',
      'hoodie.clean.async': True,
      'hoodie.clean.max.commits': 5,
      'hoodie.parquet.max.file.size': 125829120,
      'hoodie.parquet.small.file.limit': 104857600,
      'hoodie.parquet.block.size': 125829120,
      'hoodie.metadata.enable': True,
      'hoodie.metadata.validate': True,
      'hoodie.datasource.write.hive_style_partitioning': True,
      'hoodie.datasource.hive_sync.support_timestamp': True,
      'hoodie.datasource.hive_sync.jdbcurl': "xxx",
      'hoodie.datasource.hive_sync.username': 'xxx',
      'hoodie.datasource.hive_sync.password': 'xxx',
      'hoodie.datasource.hive_sync.partition_fields': 'xxx',
      'hoodie.datasource.hive_sync.enable': True,
      'hoodie.datasource.hive_sync.partition_extractor_class': 'org.apache.hudi.hive.MultiPartKeysValueExtractor',
      'hoodie.avro.schema.external.transformation': True,
      'hoodie.avro.schema.validate': True,
      'hoodie.table.name', 'xxx'
      'hoodie.datasource.write.table.name', 'xxx'
      'hoodie.datasource.write.recordkey.field', 'xxx'
      'hoodie.datasource.hive_sync.database', 'xxx'
      'hoodie.datasource.hive_sync.table', 'xxx'
      'hoodie.datasource.write.operation', 'upsert'

      Attachments

        Activity

          People

            Unassigned Unassigned
            haitham Haitham Eltaweel
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: