Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
8.0.1
-
None
Description
I created a file in raw arrow format with the script given in the Py arrow cookbook here: https://arrow.apache.org/cookbook/py/io.html#saving-arrow-arrays-to-disk
In a Node.js application, this file can be read doing:
const r = await RecordBatchReader.from(fs.createReadStream(filePath)); await r.open(); for (let i = 0; i < r.numRecordBatches; i++) { const rb = await r.readRecordBatch(i); if (rb !== null) { console.log(rb.numRows); } }
However this method loads the whole file in memory (is that a bug?), which is not scalable.
To solve this scalability issue, I try to load the data with fetch as described in the the README.md. Both:
import { tableFromIPC } from "apache-arrow"; const table = await tableFromIPC(fetch(filePath)); console.table([...table]);
and
const r = await RecordBatchReader.from(await fetch(filePath));
await r.open();
fail with error:
Uncaught (in promise) Error: Expected to read 1329865020 metadata bytes, but only read 1123.
Attachments
Issue Links
- links to