[KUDU-1586] If a single op is larger than consensus_max_batch_size_bytes, consensus gets stuck - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Blocker
Resolution: Fixed
Affects Version/s: 0.10.0
Fix Version/s: 1.0.0
Component/s: consensus
Labels:
None

Target Version/s:

1.0.0

Description

I noticed on a cluster test that a leader was spinning with log messages like:

I0829 14:17:31.870786 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
I0829 14:17:31.873234 6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
I0829 14:17:31.875713 22184 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)
I0829 14:17:31.878078 6186 log_cache.cc:307] T e7cacfdb22744496a6d5d66227a69823 P 5d15962d2f2445b1ba15b93ead4fb31b: Successfully read 1 ops from disk (866604..866604)

After investigation, it seems this op was larger than 1MB (default consensus batch size) and this caused this tight loop behavior with no progress.

Attachments

Activity

People

Assignee:: Todd Lipcon

Reporter:: Todd Lipcon

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Aug/16 05:05

Updated:: 10/Dec/21 19:40

Resolved:: 30/Aug/16 18:14