Details
Description
The s3 test bucket used in hadoop-aws tests of S3 select and large file reads is no longer publicly accessible
java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request ID: O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null
- Because
HADOOP-18830has cut s3 select, all we need in 3.4.1+ is a large file for some reading tests - changing the default value disables s3 select tests on older releases
- if fs.s3a.scale.test.csvfile is set to " " then other tests which need it will be skipped
Proposed
- we locate a new large file under the (requester pays) s3a://usgs-landsat/ bucket . All releases with
HADOOP-18168can use this - update 3.4.1 source to use this; document it
- do something similar for 3.3.9 + maybe even cut s3 select there too.
- document how to use it on older releases with requester-pays support
- document how to completely disable it on older releases.
How to fix (most) landsat test failures on older releases
add this to your auth-keys.xml file. Expect some failures in a few tests with-hardcoded references to the bucket (assumed role delegation tokens)
<property> <name>fs.s3a.scale.test.csvfile</name> <value>s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz</value> <description>file used in scale tests</description> </property> <property> <name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name> <value>us-east-1</value> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name> <value>false</value> <description>Don't try to purge uploads in the read-only bucket, as it will only create log noise.</description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.probe</name> <value>0</value> <description>Let's postpone existence checks to the first IO operation </description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name> <value>false</value> <description>Do not add the referrer header</description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name> <value>128k</value> <description>Use a small prefetch size so tests fetch multiple blocks</description> </property> <property> <name>fs.s3a.select.enabled</name> <value>false</value> </property>
Some delegation token tests will still fail; these have hard-coded references to the old bucket. Do not worry about these
[ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] » AccessDenied s3a://la... [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] » AccessDenied s3a://la... [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] » AccessDenied s3a://la... [ERROR] ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614 » AccessDenied [ERROR] ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614 » AccessDenied
Attachments
Issue Links
- causes
-
HADOOP-19146 noaa-cors-pds bucket access with global endpoint fails
- Resolved
- fixes
-
HADOOP-17784 hadoop-aws landsat-pds test bucket will be deleted after Jul 1, 2021
- Resolved
- is related to
-
SPARK-36024 Switch the datasource example due to the depreciation of the dataset
- Open
- relates to
-
HADOOP-18194 Public dataset class for S3A integration tests
- Open
-
HADOOP-14661 S3A to support Requester Pays Buckets
- Resolved
- links to