Details
-
Bug
-
Status: Closed
-
Blocker
-
Resolution: Fixed
-
None
-
None
-
None
Description
I think we can (1) make grid configuration significantly easier and (2) speed up failure detection.
Here are disco SPI configuration properties which are responsible for failure detection:
- reconnectCount,
- sockTimeout,
- networkTImeout,
- ackTImeout,
- maxAckTimeout,
- heartbeatFrequency
- maxMissedHearbeats
Same for communication SPI
- reconnectCount,
- maxConnTimeout,
- connTimeout
So, we have 10 or even more properties.
We did it to address half-opened sockets problem (which is pretty common for cloud environment) and GC pauses which may happen on cluster nodes - we can increase ack timeouts to prevent them from being kicked off the topology.
By setting value for these props I set timeout for failure detection. Why do we need such great number of parameters instead of having 1 on IgniteConfiguration - nodeResponseThreshold (or failureDetectionThreshold - can anyone propose better name?).
All other parameters will be calculated automatically (I think user can still set some of them for full control over situation - need to decide if this is needed.)
Attachments
Attachments
Issue Links
- is related to
-
IGNITE-7704 Document IgniteConfiguration, TcpDiscoverySpi, TcpCommunicationSpi timeouts and their relations
- Open