Uploaded image for project: 'Apache Gobblin'
  1. Apache Gobblin
  2. GOBBLIN-91

No AbstractFileSystem for scheme: null (EMR 4.7.2, Hadoop 2.7.2)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None

    Description

      The instructions [here](http://gobblin.readthedocs.io/en/latest/user-guide/FAQs/#how-do-i-fix-unsupportedfilesystemexception-no-abstractfilesystem-for-scheme-null) have not resolved this issue.

      We're trying to run Gobblin on AWS EMR 4.7.2, which has Amazon's Hadoop 2.7.2. (v2.6 seems popular for Gobblin, but the only available EMR version with 2.6 is deprecated and not recommended due to stability issues.)

      Some notes:

      • We're pulling from Kafka, using a custom schema/serde framework for Avro, and publishing to S3.
      • Our Gobblin repo is checked out at 0.7.0 for dependency issues related to our serde and S3.
      • We gradle build Gobblin with Hadoop 2.7.2 and have a Clojure project for our serde, which builds and injects into the Gobblin lib dir, and we have some aws/s3 jdk libraries coming in as well.
      • Everything functions in standalone mode on a single EC2 instance.
      • We've tried running Gobblin in MR mode on EMR with and without the Hadoop jars in the Gobblin lib directory, and the EMR Hadoop bin and classpath dirs are being recognized.

      Config details:
      This script sources test environment variables and passes them off to the mapreduce config:

      ```
      #!/bin/bash

      1. Sets test environment variables for schwartz-gobblin
      1. Set Hadoop home
        export HADOOP_BIN_DIR=/usr/bin
        export HADOOP_CLASSPATH=/usr/lib/hadoop
      1. Use the following for obtaining IAM role keys
        eval $(./sts-assume-get-keys arn:aws:iam::xxxxxxx:role/xxxxxxxxx)
      1. Needed by Gobblin
        export GOBBLIN_WORK_DIR=hdfs://<host:8020>/gobblin/work
        export GOBBLIN_JOB_CONFIG_DIR=/home/hadoop/gobblin-dist/jobs
        export ZOOKEEPER_CONNECT=<zookeeper1,zookeeper2,zookeeper3>
      1. Test specific
        export SCHWARTZ_GOBBLIN_FINAL_DIR=s3a://<bucket>/gobblin-test
        export SCHWARTZ_GOBBLIN_FINAL_TABLE=data
        export SCHWARTZ_GOBBLIN_STATE_STORE_URI=s3a://<bucket>/gobblin-test/state-store
        export SCHWARTZ_GOBBLIN_STATE_STORE_DIR=s3a://<bucket>/gobblin-test/state-store
        export SCHWARTZ_GOBBLIN_PUBLISHER_URI=s3a://<bucket>/gobblin-test/data
        export SCHWARTZ_GOBBLIN_JOB_LENGTH=1
        ```

      gobblin-mapreduce.properties then looks like this:

      ```
      ###############################################################################

                                                1. Gobblin MapReduce configurations #######################
                                                  ###############################################################################
      1. Source parameters
        kafka.brokers=<kafka1,kafka2,kafka3>
        source.class=gobblin-marsh.MarshGobblinSource
        extract.namespace=gobblin-marsh
      1. Writer and publisher parameters
        data.publisher.type=gobblin.publisher.TimePartitionedDataPublisher
        data.publisher.final.dir=${env:SCHWARTZ_GOBBLIN_FINAL_DIR}
        writer.builder.class=gobblin.writer.AvroDataWriterBuilder
        writer.partitioner.class=gobblin.CustomTimePartitioner
        writer.file.path.type=default
        writer.file.path=${env:SCHWARTZ_GOBBLIN_FINAL_TABLE}
        writer.destination.type=HDFS
        writer.output.format=AVRO
        writer.codec.type=SNAPPY
        writer.staging.dir=${env:GOBBLIN_WORK_DIR}/task-staging
        writer.output.dir=${env:GOBBLIN_WORK_DIR}/task-output
        data.publisher.replace.final.dir=false
      1. File system parameters
        fs.uri=hdfs://<namenodehost>:8020
        writer.fs.uri=${fs.uri}
        state.store.fs.uri=s3a://<bucket>/gobblin-test/state-store
      1. S3 parameters
        state.store.dir=s3a://<bucket>/gobblin-test/state-store
        data.publisher.fs.uri=${env:SCHWARTZ_GOBBLIN_PUBLISHER_URI}
        fs.s3a.access.key=${env:aws_access_key_id}
        fs.s3a.secret.key=${env:aws_secret_access_key}
        fs.s3a.buffer.dir=<bufferdir>
      1. Gobblin execution parameters
        taskexecutor.threadpool.size=10
        taskretry.threadpool.coresize=4
        taskretry.threadpool.maxsize=2
        jobconf.dir=${env:GOBBLIN_JOB_CONFIG_DIR}
      1. Where to start; How long to run each task
        bootstrap.with.offset=earliest
        extract.limit.enabled=true
        extract.limit.type=time
        extract.limit.time.limit=${env:SCHWARTZ_GOBBLIN_JOB_LENGTH}
        extract.limit.time.limit.timeunit=minutes
      1. Directory where error files from the quality checkers are stored
        qualitychecker.row.err.file=${fs.uri}/gobblin/err
      1. Directory where job locks are stored
        job.lock.dir=${env:GOBBLIN_WORK_DIR}/locks
      1. Directory where metrics log files are stored
        metrics.log.dir=${env:GOBBLIN_WORK_DIR}/metrics
      1. Interval of task state reporting in milliseconds
        task.status.reportintervalinms=5000
      1. MapReduce properties
        mr.job.root.dir=${fs.uri}/gobblin/work/working
        ```

      The job output (marshmallow is our serde) is:

      ```
      SLF4J: Class path contains multiple SLF4J bindings.
      SLF4J: Found binding in [jar:file:/home/hadoop/gobblin-dist/lib/gobblin-marshmallow-0.1.0-SNAPSHOT-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/home/hadoop/gobblin-dist/lib/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: Found binding in [jar:file:/usr/lib/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
      SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
      SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
      Exception in thread main gobblin.runtime.JobException: Job job_schwartzTest004S3v2_1469633575428 failed
      at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:363)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:87)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:64)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:110)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      ```

      The gobblin-current.log contains the following:

      ```
      ...
      2016-07-27 15:33:01 UTC INFO [main] org.apache.hadoop.yarn.client.RMProxy 92 - Connecting to ResourceManager at xxxxxxxxxxxxxxx:8032
      2016-07-27 15:33:02 UTC INFO [main] org.apache.hadoop.mapreduce.lib.input.FileInputFormat 287 - Total input paths to process : 1
      2016-07-27 15:33:02 UTC INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 396 - number of splits:100
      2016-07-27 15:33:02 UTC INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 479 - Submitting tokens for job: job_1469481831759_0035
      2016-07-27 15:33:02 UTC WARN [main] org.apache.hadoop.security.UserGroupInformation 1551 - PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
      2016-07-27 15:33:02 UTC INFO [main] org.apache.hadoop.mapreduce.JobSubmitter 441 - Cleaning up the staging area /tmp/hadoop-yarn/staging/hadoop/.staging/job_1469481831759_0035
      2016-07-27 15:33:02 UTC WARN [main] org.apache.hadoop.security.UserGroupInformation 1551 - PriviledgedActionException as:hadoop (auth:SIMPLE) cause:org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
      2016-07-27 15:33:02 UTC INFO [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 98 - Stopping the TaskStateCollectorService
      2016-07-27 15:33:02 UTC WARN [TaskStateCollectorService STOPPING] gobblin.runtime.TaskStateCollectorService 119 - Output task state path hdfs://<host:8020>/gobblin/work/working/schwartzTest004S3v2/output/job_schwartzTest004S3v2_1469633575428 does not exist
      2016-07-27 15:33:02 UTC INFO [main] gobblin.runtime.mapreduce.MRJobLauncher 464 - Deleted working directory hdfs://<host:8020>/gobblin/work/working/schwartzTest004S3v2
      2016-07-27 15:33:02 UTC ERROR [main] gobblin.runtime.AbstractJobLauncher 321 - Failed to launch and run job job_schwartzTest004S3v2_1469633575428: org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
      org.apache.hadoop.fs.UnsupportedFileSystemException: No AbstractFileSystem for scheme: null
      at org.apache.hadoop.fs.AbstractFileSystem.createFileSystem(AbstractFileSystem.java:152)
      at org.apache.hadoop.fs.AbstractFileSystem.get(AbstractFileSystem.java:240)
      at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:332)
      at org.apache.hadoop.fs.FileContext$2.run(FileContext.java:329)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      at org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:329)
      at org.apache.hadoop.fs.FileContext.getFileContext(FileContext.java:443)
      at org.apache.hadoop.mapred.YARNRunner.createApplicationSubmissionContext(YARNRunner.java:360)
      at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:285)
      at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:432)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1285)
      at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1282)
      at java.security.AccessController.doPrivileged(Native Method)
      at javax.security.auth.Subject.doAs(Subject.java:415)
      at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
      at org.apache.hadoop.mapreduce.Job.submit(Job.java:1282)
      at gobblin.runtime.mapreduce.MRJobLauncher.runWorkUnits(MRJobLauncher.java:198)
      at gobblin.runtime.AbstractJobLauncher.launchJob(AbstractJobLauncher.java:296)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.launchJob(CliMRJobLauncher.java:87)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.run(CliMRJobLauncher.java:64)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
      at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
      at gobblin.runtime.mapreduce.CliMRJobLauncher.main(CliMRJobLauncher.java:110)
      at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
      at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
      at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
      at java.lang.reflect.Method.invoke(Method.java:606)
      at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
      2016-07-27 15:33:02 UTC INFO [main] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@7eee074d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
      2016-07-27 15:33:02 UTC INFO [main] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@7eee074d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
      2016-07-27 15:33:02 UTC INFO [main] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@531f7b83[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
      2016-07-27 15:33:02 UTC INFO [main] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@531f7b83[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
      2016-07-27 15:33:02 UTC INFO [main] gobblin.util.ExecutorsUtils 125 - Attempting to shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@5103fa5c[Shutting down, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 1]
      2016-07-27 15:33:02 UTC INFO [main] gobblin.util.ExecutorsUtils 144 - Successfully shutdown ExecutorService: java.util.concurrent.ThreadPoolExecutor@5103fa5c[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 1]
      2016-07-27 15:33:02 UTC INFO [main] gobblin.runtime.app.ServiceBasedAppLauncher 162 - Shutting down the application
      2016-07-27 15:33:02 UTC WARN [Thread-6] gobblin.runtime.app.ServiceBasedAppLauncher 157 - ApplicationLauncher has already stopped
      ```

      Any help is greatly appreciated.

      Github Url : https://github.com/linkedin/gobblin/issues/1162
      Github Reporter : hilljb
      Github Created At : 2016-07-27T16:26:15Z
      Github Updated At : 2017-01-12T05:01:55Z

      Comments


      maiyatanglxn wrote on 2016-12-29T06:59:29Z : I have the same problem with you,have you solved it ?

      Github Url : https://github.com/linkedin/gobblin/issues/1162#issuecomment-269589954

      Attachments

        Activity

          People

            Unassigned Unassigned
            abti Abhishek Tiwari
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: