Uploaded image for project: 'Ignite'
  1. Ignite
  2. IGNITE-10418

Implement lightweight profiling of messages processing

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • Docs Required

    Description

      There is a lack of capabilities to identify bottlenecks without extensive profiling on server and client side (JFR recording, sampling profilers, regular thread dumps, etc), which is not always possible. Even having profiling data not always helpful for determining several types of bottlenecks, for example, if there is a contention on single key/partition.

      Lightweight message profiling will allow to track each message execution, to collect a statistics of execution in executors for each grid node and for all nodes, collect histograms distributed by waiting/execution time for each type of message.

      We need to implement:

      1. histogram metrics for message execution time, queue waiting time, queue size at the moments of queue add and execution start, with distribution by message type;
      1. Dumping of messages if it’s execution/waiting time exceeds some threshold timeout, i.e.
        Slow message: *enqueueTs*=2018-11-27 15:10:22.241, *waitTime*=0.048, *procTime*=305.186, *messageId*=3a3064a9, *queueSzBefore*=0, *headMessageId*=null, *queueSzAfter*=0, *message*=GridNearTxFinishRequest [miniId=1, mvccSnapshot=null, super=GridDistributedTxFinishRequest [topVer=AffinityTopologyVersion [topVer=4, minorTopVer=0], futId=199a3155761-f379f312-ad4b-4181-acc5-0aacb3391f07, threadId=296, commitVer=null, invalidate=false, commit=true, baseVer=null, txSize=0, sys=false, plc=2, subjId=dda703a0-69ee-47cf-9b9a-bf3dc9309feb, taskNameHash=0, flags=32, syncMode=FULL_SYNC, txState=IgniteTxStateImpl [activeCacheIds=[644280847], recovery=false, mvccEnabled=false, txMap=HashSet [IgniteTxEntry [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], cacheId=644280847, txKey=IgniteTxKey [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], cacheId=644280847], val=[op=READ, val=null], prevVal=[op=NOOP, val=null], oldVal=[op=NOOP, val=null], entryProcessorsCol=null, ttl=-1, conflictExpireTime=-1, conflictVer=null, explicitVer=null, dhtVer=null, filters=CacheEntryPredicate[] [], filtersPassed=false, filtersSet=false, entry=GridCacheMapEntry [key=KeyCacheObjectImpl [part=8, val=8, hasValBytes=true], val=null, ver=GridCacheVersion [topVer=0, order=0, nodeOrder=0], hash=8, extras=GridCacheObsoleteEntryExtras [obsoleteVer=GridCacheVersion [topVer=2147483647, order=0, nodeOrder=0]], flags=2]GridDistributedCacheEntry [super=]GridDhtCacheEntry [rdrs=ReaderId[] [], part=8, super=], prepared=0, locked=false, nodeId=null, locMapped=false, expiryPlc=null, transferExpiryPlc=false, flags=0, partUpdateCntr=0, serReadVer=null, xidVer=GridCacheVersion
      1. JMX tools and command line interface to get this metrics and print statistics view.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              ascherbakov Alexey Scherbakov
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated: