Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
Description
Debug logging in DirectChannel lets us know the IDs of receivers of a message and the toString of the message but it's very difficult to figure out what thread on the receiving end is supposed to process that message.
Here's an example of what we currently have:
[debug 2021/02/01 16:15:17.492 PST persistgemfire8_host1_8586 <vm_9_thr_25_persist8_host1_8586> tid=0x4f0] Sending (DLockRequestProcessor.DLockResponseMessage responding GRANT; serviceName=__PDX(version 4); objectName=PDX_LOCK; responseCode=0; keyIfFailed=null; leaseExpireTime=9223372036854775807; processorId=509; lockId=509) to 1 peers ([rs-GEM-3166-PL1535a2i32xlarge-hydra-client-36(persistgemfire9_host1_8517:8517)<ec><v51>:41005]) via tcp/ip
This does not tell you anything about the receiver except its ID. On the receiving side the thread that, in this run, would handle that message is this:
persistgemfire9_host1_8517 <P2P message reader for rs-GEM-3166-PL1535a2i32xlarge-hydra-client-36(persistgemfire8_host1_8586:8586)<ec><v51>:41006 unshared ordered uid=1036 dom #1 local port=47207 remote port=42068> tid=0x51
I've highlighted the uid here because that is the uniqueId of the sending Connection. If you looked through the logs or stack traces of the receiver and knew the uniqueId of the sending Connection you could easily locate the thread that should receive this DLockResponseMessage. Currently this is much harder than it needs to be because the DirectChannel Sending log message doesn't include the uniqueId of the Connections it is using to send the message.
Let's change that log message to include the uniqueId of each outgoing Connection. Maybe something like this:
Sending (message.toString()) to 1 peers (peer ID), uid=1036 via tcp/ip
and on the receiving side we could be clearer about what the uid in the thread's name means:
persistgemfire9_host1_8517 <P2P message reader for rs-GEM-3166-PL1535a2i32xlarge-hydra-client-36(persistgemfire8_host1_8586:8586)<ec><v51>:41006 unshared ordered sender uid=1036 dom #1 local port=47207 remote port=42068> tid=0x51
or something like that.
Now we can look at the Sending message and know that the receiving thread will have uid=1036 in its name. Knowing this it ought to be possible to write a program/script to trace a message and its consequences from one node to another.