Details
Description
When the first child of Union has duplicate columns like select a, a from t1 union select a, b from t2, spark only use the first column to aggregate the results, which would make the results incorrect, and this behavior is inconsistent with other engines like PostgreSQL, MySQL. We could alias the attribute of the first child of union to resolve this, or you could argue that this is the feature of Spark SQL.
sample query:
select
a,
a
from values (1, 1), (1, 2) as t1(a, b)
UNION
SELECT
a,
b
from values (1, 1), (1, 2) as t2(a, b)
result is
(1,1)
result from PostgreSQL and MySQL
(1,1)
(1,2)