Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-2284

Better locality for blobs collections over sharding

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.1.2
    • None
    • blob, mongomk

    Description

      Currently when use oak with mongo mk for blobs storage we can easily end up with all different chunks of a binary streams scattered across the shards.

      Now this is not ideal since it generates a large number of scattered gather queries over the shards for each individual files.

      To allow better locality I propose the addition of another field called _anchor
      This anchor field will be generated by the inverse order of the current timestamp of beginning of the storage of the file:

      //Milliseconds Second Minute HH
      SimpleDateFormat sdf = new SimpleDateFormat("SSSssmmHH");
      //store the parsed integer of this value for more storage efficiency
      String a = asdf.format(new Date());
      int _anchor = Integer.parseInt(asdf.format(new Date()));
      

      This new _anchor field should be part of the shard key which also requires to be indexed along side with _id

      Pull request is on the making!

      N.

      Attachments

        Activity

          People

            Unassigned Unassigned
            nleite Norberto Leite
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Time Tracking

                Estimated:
                Original Estimate - 96h
                96h
                Remaining:
                Remaining Estimate - 96h
                96h
                Logged:
                Time Spent - Not Specified
                Not Specified