Description
We see this periodically with high traffic loads. ATS crashes with 7000+ frames on the stack. The bulk of the frames are the following frame sequence.
#117 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, event=100, data=0x2b0bad0c7cf0) at ../iocore/eventsystem/I_Continuation.h:150 #118 0x000000000064c05d in Http2ClientSession::state_start_frame_read (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:451 #119 0x000000000064b0af in Http2ClientSession::main_event_handler (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:292 #120 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, event=100, data=0x2b0bad0c7cf0) at ../iocore/eventsystem/I_Continuation.h:150 #121 0x000000000064c386 in Http2ClientSession::state_complete_frame_read (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:483 #122 0x000000000064b0af in Http2ClientSession::main_event_handler (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:292 #123 0x00000000005159c8 in Continuation::handleEvent (this=0x2b0bdd101b90, event=100, data=0x2b0bad0c7cf0) at ../iocore/eventsystem/I_Continuation.h:150 #124 0x000000000064c05d in Http2ClientSession::state_start_frame_read (this=0x2b0bdd101b90, event=100, edata=0x2b0bad0c7cf0) at Http2ClientSession.cc:451
We had cherry picked in the fix for TS-4209 to correctly enforce the concurrent stream limit. But in the latest crash of this type, it looks like we are pulling small items from cache, so the stream lives and dies on the stack. The concurrent active connection count never reaches the limit.
I am going to try to change the state_state_start_frame_read/state_complete_frame_read logic from recursing handlers to a loop.