Uploaded image for project: 'Apache Hudi'
  1. Apache Hudi
  2. HUDI-5982

When the user's primary key data contains commas, BucketIdentifier cannot be used

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 0.12.0
    • None
    • index
    • None

    Description

      In the scenario of using composite primary keys and bucket index in a Hudi table, BucketIdentifier splits the recordKey using commas as a delimiter. This can cause exceptions to occur if the user's primary key data contains commas.

      // BucketIdentifier.java
      private static List<String> getHashKeysUsingIndexFields(String recordKey, List<String> indexKeyFields) {
        Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
            .map(p -> p.split(":"))
            .collect(Collectors.toMap(p -> p[0], p -> p[1]));
        return indexKeyFields.stream()
            .map(recordKeyPairs::get).collect(Collectors.toList());
      } 

      Attachments

        Activity

          People

            Unassigned Unassigned
            tangshangwen Wally Tang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: