Details
-
Improvement
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
Description
I was testing the security plugin inside my company and I noticed that either running a "select * from table" or reading directly the table path on hdfs produces the same plan but in the raw path read it shows the path URI only and this is not considered into the PrivilegesBuilder class, I designed an internal patch for this module at my company to address this issue by adding this to the buildQuery function
case l: LogicalRelation => if (l.catalogTable.nonEmpty) { mergeProjection(l.catalogTable.get) } else if (l.relation.isInstanceOf[HadoopFsRelation]) { for (path <- l.relation.asInstanceOf[HadoopFsRelation].location.rootPaths) privilegeObjects += new SparkPrivilegeObject( SparkPrivilegeObjectType.DFS_URI, path.toString, path.toString) }
and this to the buildCommand function
case i: InsertIntoHadoopFsRelationCommand => i.catalogTable foreach { t => addTableOrViewLevelObjs( t.identifier, outputObjs, i.partitionColumns.map(_.name), t.schema.fieldNames) } if (i.catalogTable.isEmpty) { outputObjs += new SparkPrivilegeObject( SparkPrivilegeObjectType.DFS_URI, i.outputPath.toString, i.outputPath.toString) }
but I get this project proposes Hive authorization and not HDFS authorization, but even so people in the Spark environment tend to write temporary files without metastore tables also and this should pass through authorization.
I am creating this issue in order to ask the maintainers if this is relevant and if this is in the same scope of the Security module in order for me to provide a patch for this.