Details
-
Improvement
-
Status: Closed
-
Minor
-
Resolution: Fixed
-
3.5.5
Description
This is a minor enhancement request to not fail the session initiation if the DNS is not able to resolve the hostname of one of the servers in the Zookeeper ensemble.
The Zookeeper client resolves all the hostnames in the ensemble while establishing the session.
In Kubernetes environment with coreDNS, the hostname entry gets removed from coreDNS during the POD restarts. Though we can manipulate the coreDNS settings to delay the removal of the hostname entry from DNS, we don't want to leave any race where Zookeeper clinet is trying to establish a session and it fails because the DNS temporarily is not able to resolve the hostname. So as long as one of the servers in the ensemble is able to be DNS resolvable, should we not fail the session establishment with hard error and instead try to establish the connection with one of the other servers?
Look at the below snippet where resolve_hosts() fails with ZSYSTEMERROR.
if ((rc = getaddrinfo(host, port_spec, &hints, &res0)) != 0) { //bug in getaddrinfo implementation when it returns //EAI_BADFLAGS or EAI_ADDRFAMILY with AF_UNSPEC and // ai_flags as AI_ADDRCONFIG #ifdef AI_ADDRCONFIG if ((hints.ai_flags == AI_ADDRCONFIG) && // ZOOKEEPER-1323 EAI_NODATA and EAI_ADDRFAMILY are deprecated in FreeBSD. #ifdef EAI_ADDRFAMILY ((rc ==EAI_BADFLAGS) || (rc == EAI_ADDRFAMILY))) { #else (rc == EAI_BADFLAGS)) { #endif //reset ai_flags to null hints.ai_flags = 0; //retry getaddrinfo rc = getaddrinfo(host, port_spec, &hints, &res0); } #endif if (rc != 0) { errno = getaddrinfo_errno(rc); #ifdef _WIN32 LOG_ERROR(LOGCALLBACK(zh), "Win32 message: %s\n", gai_strerror(rc)); #elif __linux__ && __GNUC__ LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", gai_strerror(rc)); #else LOG_ERROR(LOGCALLBACK(zh), "getaddrinfo: %s\n", strerror(errno)); #endif rc=ZSYSTEMERROR; goto fail; } }
Attachments
Issue Links
- links to