Uploaded image for project: 'Apache Arrow'
  1. Apache Arrow
  2. ARROW-14453

[C data interface] Clarify that buffers must only be accessed past the offset

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Format
    • None

    Description

      I would like to propose that we clarify in the c data interface that the buffers can only be accessed past the offset (with the pointer arithmetic corresponding to the buffer).

      E.g. given a primitive array with an offset of 10 and buffer starting at pointer position `p`, consumers must not access any of the positions [p, p+1, ..., p-1+9].

      Without the condition above, it is not possible for a user to use a sliced buffer on a primitive array with a validity and an offset.

      E.g. consider an array with an offset of 10, a buffer of 12 u8s that has been sliced by 4. For the array to be exported correctly, we will need to offset the buffer by -6 (4 - 10), so that the consumer can jump the first 10 bytes and only "see" the bytes at positions 4, 5, 6, etc of the original pointer.

      Note that this behavior (of slicing a buffer and building an array with it) can only be done with buffers. In the booleanArray it is currently not possible to "slice" the buffer without it being a multiple of 8 slots, since the C data interface has no mechanism to share independent offsets.

      Attachments

        Activity

          People

            Unassigned Unassigned
            jorgecarleitao Jorge Leitão
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: