Details
-
Task
-
Status: Open
-
Major
-
Resolution: Unresolved
-
3.3.3, 3.5.0, 3.3.4
-
None
-
None
-
Spark 3.3.3 deployed on K8s
Description
We are trying to run the spark application by pointing the dependent files as well the main pyspark script from secure webserver
We are looking for solution to pass the dependencies as well as pysaprk script from webserver.
we have tried deploying the spark application from webserver to k8s cluster without username and password and it worked, but when tried with username/password we are facing "Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://username:password@domain.com/application/pysparkjob.py"
Working options on spark-submit:
spark-submit ......
--repositories https://username:password@domain.com/repo1/repo
--jars https://domain.com/jars/runtime.jar \
--files https://domain.com/files/query.sql \
--py-files https://domain.com/pythonlib/pythonlib.zip \
https://domain.com/app1/pysparkapp.py
Note: only repositories option works with username and password
Spark-submit using https url with username/password not working:
spark-submit ......
--jars https://username:password@domain.com/jars/runtime.jar \
--files https://username:password@domain.com/files/query.sql \
--py-files https://username:password@domain.com[/pythonlib/pythonlib.zip|https://domain.com/pythonlib/pythonlib.zip] \
https://username:password@domain.com/app1/pysparkapp.py
Error :
25/01/23 09:19:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 401 for URL: https://username:password@domain.com/repository/spark-artifacts/pysparkdemo/1.0/pysparkdemo-1.0.tgz
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:2000)
at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1589)
at java.base/sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:224)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:809)
at org.apache.spark.util.DependencyUtils$.downloadFile(DependencyUtils.scala:264)
at org.apache.spark.util.DependencyUtils$.$anonfun$downloadFileList$2(DependencyUtils.scala:233)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)