[SPARK-26218] Throw exception on overflow for integers - ASF JIRA

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 3.0.0
Fix Version/s: 3.0.0
Component/s: SQL
Labels:
None

Description

~~SPARK-24598~~ just updated the documentation in order to state that our addition is a Java style one and not a SQL style. But in order to follow the SQL standard we should instead throw an exception if an overflow occurs.

Attachments

Issue Links

is related to

SPARK-28024 Incorrect numeric values when out of range

Open

links to

[Github] Pull Request #21599 (mgaido91)

[Github] Pull Request #30585 (luluorta)

GitHub Pull Request #21599

GitHub Pull Request #27151

(2 links to)

Activity

Ascending order - Click to sort in descending order

Apache Spark added a comment - 29/Nov/18 14:31

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/21599

Apache Spark added a comment - 29/Nov/18 14:31 User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/21599

Apache Spark added a comment - 29/Nov/18 14:32

User 'mgaido91' has created a pull request for this issue:
https://github.com/apache/spark/pull/21599

Apache Spark added a comment - 29/Nov/18 14:32 User 'mgaido91' has created a pull request for this issue: https://github.com/apache/spark/pull/21599

Reynold Xin added a comment - 10/Apr/19 17:46

The no-exception is by design. Imagine you have an ETL job that runs for hours, and then it suddenly throws an exception because one row overflows ...

Reynold Xin added a comment - 10/Apr/19 17:46 The no-exception is by design. Imagine you have an ETL job that runs for hours, and then it suddenly throws an exception because one row overflows ...

Marco Gaido added a comment - 12/Apr/19 08:38

rxin I see that. But the reason for this are:

the current behavior is against the SQL standard;
having incorrect results in some use cases is worse than having a job failure;
introducing a config which allows to choose which behavior to have allows users to decide which is the best option for their specific use cases. Moreover I can also see people maybe having the check for overflow turned on in dev environments, so that they can find issues and turned off in prod in order to avoid wasting hours as you mentioned.

Marco Gaido added a comment - 12/Apr/19 08:38 rxin I see that. But the reason for this are: the current behavior is against the SQL standard; having incorrect results in some use cases is worse than having a job failure; introducing a config which allows to choose which behavior to have allows users to decide which is the best option for their specific use cases. Moreover I can also see people maybe having the check for overflow turned on in dev environments, so that they can find issues and turned off in prod in order to avoid wasting hours as you mentioned.

Wenchen Fan added a comment - 01/Aug/19 07:00

Issue resolved by pull request 21599
https://github.com/apache/spark/pull/21599

Wenchen Fan added a comment - 01/Aug/19 07:00 Issue resolved by pull request 21599 https://github.com/apache/spark/pull/21599

Apache Spark added a comment - 03/Dec/20 06:44

User 'luluorta' has created a pull request for this issue:
https://github.com/apache/spark/pull/30585

Apache Spark added a comment - 03/Dec/20 06:44 User 'luluorta' has created a pull request for this issue: https://github.com/apache/spark/pull/30585

Apache Spark added a comment - 03/Dec/20 06:45

User 'luluorta' has created a pull request for this issue:
https://github.com/apache/spark/pull/30585

Apache Spark added a comment - 03/Dec/20 06:45 User 'luluorta' has created a pull request for this issue: https://github.com/apache/spark/pull/30585

People

Assignee:: Marco Gaido

Reporter:: Marco Gaido

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 29/Nov/18 14:28

Updated:: 03/Dec/20 06:45

Resolved:: 01/Aug/19 07:00