Description
A. Current state
1. The datanode host / bucket /volume should be defined in the defaultFS (eg. o3://datanode:9864/test/bucket1)
2. The root file system points to the bucket (eg. 'dfs -ls /' lists all the keys from the bucket1)
It works very well, but there are some limitations.
B. Problem one
The current code doesn't support fully qualified locations. For example 'dfs -ls o3://datanode:9864/test/bucket1/dir1' is not working.
C.) Problem two
I tried to fix the previous problem, but it's not trivial. The biggest problem is that there is a Path.makeQualified call which could transform unqualified url to qualified url. This is part of the Path.java so it's common for all the Hadoop file systems.
In the current implementations it qualifies an url with keeping the schema (eg. o3:// ) and authority (eg: datanode: 9864) from the defaultfs and use the relative path as the end of the qualified url. For example:
makeQualfied(defaultUri=o3://datanode:9864/test/bucket1, path=dir1/file) will return o3://datanode:9864/dir1/file which is obviously wrong (the good would be o3://datanode:9864/TEST/BUCKET1/dir1/file). I tried to do a workaround with using a custom makeQualified in the Ozone code and it worked from command line but couldn't work with Spark which use the Hadoop api and the original makeQualified path.
D.) Solution
We should support makeQualified calls, so we can use any path in the defaultFS.
I propose to use a simplified schema as o3://bucket.volume/
This is similar to the s3a format where the pattern is s3a://bucket.region/
We don't need to set the hostname of the datanode (or ksm in case of service discovery) but it would be configurable with additional hadoop configuraion values such as fs.o3.bucket.buckename.volumename.address=http://datanode:9864 (this is how the s3a works today, as I know).
We also need to define restrictions for the volume names (in our case it should not include dot any more).
ps: some spark output
2018-02-03 18:43:04 WARN Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2018-02-03 18:43:05 INFO Client:54 - Uploading resource file:/tmp/spark-03119be0-9c3d-440c-8e9f-48c692412ab5/__spark_libs__2440448967844904444.zip -> o3://datanode:9864/user/hadoop/.sparkStaging/application_1517611085375_0001/_spark_libs_2440448967844904444.zip
My default fs was o3://datanode:9864/test/bucket1, but spark qualified the name of the home directory.