Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
Impala 4.1.0
-
None
-
ghx-label-9
Description
When the files of n Iceberg table are dropped then a DROP TABLE will result in an error while the table will still show up in SHOW TABLES
Here are the steps to repro:
1) Run from Impala-shell
DROP DATABASE IF EXISTS `drop_incomplete_table2` CASCADE;
CREATE DATABASE `drop_incomplete_table2`;
CREATE TABLE drop_incomplete_table2.iceberg_tbl (i int) stored as iceberg;
INSERT INTO drop_incomplete_table2.iceberg_tbl VALUES (1), (2), (3);
2) Drop the folder of the table with hdfs dfs
hdfs dfs -rm -r hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl
3) Try to drop the table from Impala-shell
DROP TABLE drop_incomplete_table2.iceberg_tbl;
This results in the following error:
ERROR: NotFoundException: Failed to open input stream for file: hdfs://localhost:20500/test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json CAUSED BY: FileNotFoundException: File does not exist: /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)CAUSED BY: RemoteException: File does not exist: /test-warehouse/drop_incomplete_table2.db/iceberg_tbl/metadata/00001-e2568132-d74d-44c2-9b7f-8838453e5944.metadata.json at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:87) at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:77) at org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:159) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:2040) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:737) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:454) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:533) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:989) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:917) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2894)
While table is still there in show tables output even after an invalidate metadata.
Note, it's important for the repro to execute some SQL against the newly created table to load it in Impala. In this case I used an INSERT INTO but e.g. an ALTER TABLE would also be good. Apparently, when the table is "incomplete" (this is the state right after running CREATE TABLE) this works fine but not if the table is loaded.
The suspicious part of code is in StmtMetadataLoader.loadTables() and getMissingTables() where there is a distinction between loaded and Incomplete tables.
https://github.com/apache/impala/blob/2f74e956aa10db5af6a7cdc47e2ad42f63d5030f/fe/src/main/java/org/apache/impala/analysis/StmtMetadataLoader.java#L196
Note2, the issue is quite similar to https://issues.apache.org/jira/browse/IMPALA-11502 but here the repro steps and the error is somewhat different.
Attachments
Issue Links
- is a clone of
-
IMPALA-11502 Dropping files of Iceberg table in HadoopCatalog will cause DROP TABLE to fail
- Open
- is related to
-
IMPALA-11502 Dropping files of Iceberg table in HadoopCatalog will cause DROP TABLE to fail
- Open
-
IMPALA-11330 Handle missing Iceberg data/metadata gracefully
- Resolved