[DRILL-5661] CSV reader created, holds onto two buffers per file with headers - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 1.10.0
Fix Version/s: Future
Component/s: Storage - Text & CSV
Labels:
None

Description

~~DRILL-5273~~ fixed a problem in the "compliant" (CSV) record reader that would cause Drill to exhaust memory. Each reader would allocate two direct memory blocks, but not free them until the end of the fragment. Scan 1000 files and we would get 1000 allocations, with only a single pair being active at a time.

As it turns out, ~~DRILL-5273~~ missed a second pair created when reading column headers:

 private String [] extractHeader() throws SchemaChangeException, IOException, ExecutionSetupException{
...
    TextInput hInput = new TextInput(settings,  hStream, oContext.getManagedBuffer(READ_BUFFER), 0, split.getLength());
    this.reader = new TextReader(settings, hInput, hOutput, oContext.getManagedBuffer(WHITE_SPACE_BUFFER));

If a query uses CSV column headings, the query is subject to the same memory exhaustion seen earlier for `columns` style queries. (And, before ~~DRILL-5273~~, queries with column headers were twice as subject to memory exhaustion.)

The solution is to simply reuse the existing buffers: the buffers are then first used for the header line, then reused for data lines. No need at all for two sets of buffers.

Attachments

Activity

People

Assignee:: Paul Rogers

Reporter:: Paul Rogers

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 06/Jul/17 05:08

Updated:: 06/Jul/17 05:48