Details
-
Sub-task
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
Impala 2.0.1
Description
The thrift fetch request specifies the number of rows that it would like but the Impala server may return fewer even though more results are available.
For example, using the default row_batch size of 1024, if the client requests 1023 rows, the first response contains 1023 rows but the second response contains only 1 row. This is because the server internally uses row_batch (1024), returns the requested count (1023) and caches the remaining row, then the next time around only uses the cache.
In general the end user should set both the row batch size and the thrift request size. In practice the query writer setting row_batch and the driver/programmer setting fetch size may often be different people.
There is one case that works fine now though - setting the batch size to less than the thrift req size. In this case the thrift response is always the same as batch size.
Code example:
dev@localhost:~/impyla$ git diff diff --git a/impala/_rpc/hiveserver2.py b/impala/_rpc/hiveserver2.py index 6139002..31fdab7 100644 --- a/impala/_rpc/hiveserver2.py +++ b/impala/_rpc/hiveserver2.py @@ -265,6 +265,7 @@ def fetch_results(service, operation_handle, hs2_protocol_version, schema=None, req = TFetchResultsReq(operationHandle=operation_handle, orientation=orientation, maxRows=max_rows) + print("req: " + str(max_rows)) resp = service.FetchResults(req) err_if_rpc_not_ok(resp) @@ -273,6 +274,7 @@ def fetch_results(service, operation_handle, hs2_protocol_version, schema=None, for (i, col) in enumerate(resp.results.columns)] num_cols = len(tcols) num_rows = len(tcols[0].values) + print("rec: " + str(num_rows)) rows = [] for i in xrange(num_rows): row = [] dev@localhost:~/impyla$ cat test.py from impala.dbapi import connect conn = connect() cur = conn.cursor() cur.set_arraysize(1024) cur.execute("set batch_size=1025") cur.execute("select * from tpch.lineitem") while True: rows = cur.fetchmany() if not rows: break cur.close() conn.close() dev@localhost:~/impyla$ python test.py | head Failed to import pandas req: 1024 rec: 1024 req: 1024 rec: 1 req: 1024 rec: 1024 req: 1024 rec: 1 req: 1024
Attachments
Issue Links
- is duplicated by
-
IMPALA-1790 FetchResults() sometimes returns very few resuts
- Resolved
-
IMPALA-3015 Thrift buffer size not honored when retrieving data from Impala
- Resolved
- is related to
-
IMPALA-4268 Rework coordinator buffering to buffer more data
- Resolved
- relates to
-
IMPALA-8819 BufferedPlanRootSink should handle non-default fetch sizes
- Resolved