Details
-
Bug
-
Status: Resolved
-
Blocker
-
Resolution: Fixed
-
1.4.0
Description
HDDS-8178 added a feature to support multiple sub CA certs in trust chain, In SCM constructor if security is enabled and hdds.grpc.tls.enabled is true it tries to load the keyStoresFactory
if (conf.isSecurityEnabled() && conf.isGrpcTlsEnabled()) {
KeyStoresFactory serverKeyFactory =
certificateClient.getServerKeyStoresFactory();
This in turn calls loadKeyManager which tries to load the entire trust chain
private X509ExtendedKeyManager loadKeyManager(CertificateClient caClient) throws GeneralSecurityException, IOException { PrivateKey privateKey = caClient.getPrivateKey(); List<X509Certificate> newCertList = caClient.getTrustChain();
Loading the entire trust chain does a listCA call which is network call to SCMSecurityProtocolServer
public List<String> updateCAList() throws IOException { pemEncodedCACertsLock.lock(); try { pemEncodedCACerts = getScmSecureClient().listCACertificate();
All of this happens inside the StorageContainerManager constructor but the services in SCM are started only after constructor is initialised and scm.start() is called which means it is sending a request to security server before it is even started thus leading to connection refused messages in SCM startup like below,
10:45:45.506 AM INFO SCMRatisServerImpl starting Raft server for scm:7b4b7153-eb02-443b-b8f9-3b146931674c 10:45:47.563 AM INFO RetryInvocationHandler com.google.protobuf.ServiceException: java.net.ConnectException: Call From <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 after 1 failover attempts. Trying to failover after sleeping for 2000ms. 10:45:49.565 AM INFO RetryInvocationHandler com.google.protobuf.ServiceException: java.net.ConnectException: Call From <HOSTNAME>/<IP> to <HOSTNAME>:9961 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking $Proxy11.submitRequest over nodeId=node1,nodeAddress=<HOSTNAME>/<IP>:9961 after 2 failover attempts. Trying to failover after sleeping for 2000ms. (repeated)
StackTrace
java.net.ConnectException: Connection refused at java.base/sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at java.base/sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:777) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:205) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:586) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:730) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:843) at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:430) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1681) at org.apache.hadoop.ipc.Client.call(Client.java:1506) at org.apache.hadoop.ipc.Client.call(Client.java:1459) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118) at com.sun.proxy.$Proxy14.submitRequest(Unknown Source) at jdk.internal.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:431) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) at com.sun.proxy.$Proxy14.submitRequest(Unknown Source) at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.submitRequest(SCMSecurityProtocolClientSideTranslatorPB.java:102) at org.apache.hadoop.hdds.protocolPB.SCMSecurityProtocolClientSideTranslatorPB.listCACertificate(SCMSecurityProtocolClientSideTranslatorPB.java:374) at org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.updateCAList(DefaultCertificateClient.java:933) at org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.listCA(DefaultCertificateClient.java:921) at org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getTrustChain(DefaultCertificateClient.java:410) at org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.loadKeyManager(ReloadingX509KeyManager.java:204) at org.apache.hadoop.hdds.security.ssl.ReloadingX509KeyManager.<init>(ReloadingX509KeyManager.java:85) at org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.createKeyManagers(PemFileBasedKeyStoresFactory.java:83) at org.apache.hadoop.hdds.security.ssl.PemFileBasedKeyStoresFactory.init(PemFileBasedKeyStoresFactory.java:104) at org.apache.hadoop.hdds.security.x509.keys.SecurityUtil.getServerKeyStoresFactory(SecurityUtil.java:103) at org.apache.hadoop.hdds.security.x509.certificate.client.DefaultCertificateClient.getServerKeyStoresFactory(DefaultCertificateClient.java:948) at org.apache.hadoop.hdds.scm.ha.HASecurityUtils.createSCMRatisTLSConfig(HASecurityUtils.java:345) at org.apache.hadoop.hdds.scm.ha.SCMRatisServerImpl.<init>(SCMRatisServerImpl.java:109) at org.apache.hadoop.hdds.scm.ha.SCMHAManagerImpl.<init>(SCMHAManagerImpl.java:97) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.initializeSystemManagers(StorageContainerManager.java:646) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.<init>(StorageContainerManager.java:400) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:597) at org.apache.hadoop.hdds.scm.server.StorageContainerManager.createSCM(StorageContainerManager.java:609) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter$SCMStarterHelper.start(StorageContainerManagerStarter.java:171) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.startScm(StorageContainerManagerStarter.java:145) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:74) at org.apache.hadoop.hdds.scm.server.StorageContainerManagerStarter.call(StorageContainerManagerStarter.java:48) at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
Attachments
Issue Links
- blocks
-
HDDS-7391 Automated live rotation of CA certificates in a cluster with established trust
- Resolved
- is duplicated by
-
HDDS-9410 Upgrade stall from 1.3 to current code when gRPC TLS is enabled.
- Resolved
- is related to
-
HDDS-9410 Upgrade stall from 1.3 to current code when gRPC TLS is enabled.
- Resolved
- links to