Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • SQL
    • None

    Description

      SPARK-24598 just updated the documentation in order to state that our addition is a Java style one and not a SQL style. But in order to follow the SQL standard we should instead throw an exception if an overflow occurs.

      Attachments

        Activity

          apachespark Apache Spark added a comment -

          User 'mgaido91' has created a pull request for this issue:
          https://github.com/apache/spark/pull/21599

          apachespark Apache Spark added a comment - User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/21599
          apachespark Apache Spark added a comment -

          User 'mgaido91' has created a pull request for this issue:
          https://github.com/apache/spark/pull/21599

          apachespark Apache Spark added a comment - User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/21599
          rxin Reynold Xin added a comment -

          The no-exception is by design. Imagine you have an ETL job that runs for hours, and then it suddenly throws an exception because one row overflows ...

           

          rxin Reynold Xin added a comment - The no-exception is by design. Imagine you have an ETL job that runs for hours, and then it suddenly throws an exception because one row overflows ...  
          mgaido Marco Gaido added a comment -

          rxin I see that. But the reason for this are:

          • the current behavior is against the SQL standard;
          • having incorrect results in some use cases is worse than having a job failure;
          • introducing a config which allows to choose which behavior to have allows users to decide which is the best option for their specific use cases. Moreover I can also see people maybe having the check for overflow turned on in dev environments, so that they can find issues and turned off in prod in order to avoid wasting hours as you mentioned.
          mgaido Marco Gaido added a comment - rxin I see that. But the reason for this are: the current behavior is against the SQL standard; having incorrect results in some use cases is worse than having a job failure; introducing a config which allows to choose which behavior to have allows users to decide which is the best option for their specific use cases. Moreover I can also see people maybe having the check for overflow turned on in dev environments, so that they can find issues and turned off in prod in order to avoid wasting hours as you mentioned.
          cloud_fan Wenchen Fan added a comment -

          Issue resolved by pull request 21599
          https://github.com/apache/spark/pull/21599

          cloud_fan Wenchen Fan added a comment - Issue resolved by pull request 21599 https://github.com/apache/spark/pull/21599
          apachespark Apache Spark added a comment -

          User 'luluorta' has created a pull request for this issue:
          https://github.com/apache/spark/pull/30585

          apachespark Apache Spark added a comment - User 'luluorta' has created a pull request for this issue: https://github.com/apache/spark/pull/30585
          apachespark Apache Spark added a comment -

          User 'luluorta' has created a pull request for this issue:
          https://github.com/apache/spark/pull/30585

          apachespark Apache Spark added a comment - User 'luluorta' has created a pull request for this issue: https://github.com/apache/spark/pull/30585

          People

            mgaido Marco Gaido
            mgaido Marco Gaido
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: