Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
Description
Orignal discussion from the user mailing list: http://mail-archives.apache.org/mod_mbox/couchdb-user/201401.mbox/%3cD14F971A540B974BB75ADC55F00F34CA69A356DA@SEX1.getback.ad2008r2.corp%3e
Digest:
During database compaction, the process fails at about 50% with the following error: http://pastebin.com/qeaZNHMj (CouchDB 1.2.0, Windows Server 2008 R2 Enterprise).
After server and CouchDB upgrade the error is still the same: http://pastebin.com/feJWu7bN (CouchDB 1.5.0, Ubuntu 12.04.3 LTS (GNU/Linux 3.8.0-33-generic x86_64)).
There was one prior attempt at compaction that failed because of insufficient disk space: http://pastebin.com/S1URXN0p
After this initial failure, I've made sure that there's sufficient disk space for the .compact file.
The .compact file was always removed before trying compaction again.
At the request of Robert Samuel Newson, I've also tried with an empty .compact file - the results were the same: http://pastebin.com/MJCgGM8C.
Our I/O subsystem consists of some RAID5 matrices - the admins claim that they've been running error-free since inception We have yet to run a parity check, since that'd require taking the matrix offline and I'd rather not do that without exhausting other options.
Config files from the 1.2.0/Windows server (since that's where the fault must have occured):
default.ini: http://pastebin.com/kUz0qyNk
local.ini: http://pastebin.com/srZUMwzB
Other than the default delayed_commits set to true, there are no options that could affect fsync()ing and such.
I've run:
curl localhost:5984/ecrepo/_changes?include_docs=true
curl localhost:5984/ecrepo/_all_docs?include_docs=true
and both calls succeeded, which would suggest that a faulty (incorrect checksum/length) is at fault somewhere.