Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Cannot Reproduce
-
1.5.0, 1.6.0
-
None
-
None
-
Linux Redhat
tested in cluster (hdfs) and embedded mode
Description
Doing a simple "select true, true, true from table" won't output true,true,true on all generated rows.
Step to reproduce.
generate a simple CSV files:
for i in {1..1000000}; do echo "Allo"; done > /users/fmethot/test.csv
Open a new fresh drill CLI.
Just to help for validation, switch output to CSV:
alter session set `store.format`='csv'
generate a table like this:
create table TEST_OUT as (select true,true,true,true from dfs.`/users/fmethot/test.csv')
Check content of /users/fmethot/test.csv
You will find false values in there!
If you generate another table, on the same session, the same way, chances are the value will be fine (all true). We can only reproduce this on the first CTAS run.
We came to test this select pattern after we realize our custom boolean UDF (as well as the one provided in Drill like "ilike") were not outputting consistent deterministic results (same input were implausibly generating random boolean output). We hope that fixing this ticket will also fix our issue with boolean UDFs.