[KYLIN-4299] Issue with building real-time segment cache into HBase when using S3 as working dir - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: v3.0.0-alpha2
Fix Version/s: v3.1.0
Component/s: Real-time Streaming
Labels:
None

Description

We have an issue with using S3 as working dir for Kylin when using real-time streaming. The reason why we would like to do this is to have no state in HDFS, so the actual runtime environment running Kylin becomes stateless.
We already have HBase data on S3, but there is persistent data also in kylin.env.hdfs-working-dir (cube dictionaries), so we need to have that in S3 as well to have a setup where it's possible to fail over to a new cluster without having to rebuild all cubes.

We are using the real-time streaming feature in Kylin, which persists segment caches hourly and a MR job merges those hourly segments into HBase. In these MR jobs, we get the following exception:

Error: java.lang.IllegalArgumentException: Wrong FS: s3://kylin-XXXXX/kylin-dev/hdfs-rootdir/kylin_metadata/stream/tops_jaywalks/20191206010000_20191206020000/1/1, expected: hdfs://ip-24-0-3-243.us-west-2.compute.internal:8020 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:669) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:214) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatusInternal(DistributedFileSystem.java:897) at org.apache.hadoop.hdfs.DistributedFileSystem.access$600(DistributedFileSystem.java:114) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:964) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:961) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:971) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1551) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1577) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1625) at org.apache.hadoop.fs.FileSystem$4.<init>(FileSystem.java:1808) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1807) at org.apache.hadoop.fs.FileSystem.listLocatedStatus(FileSystem.java:1785) at org.apache.hadoop.fs.FileSystem$6.<init>(FileSystem.java:1887) at org.apache.hadoop.fs.FileSystem.listFiles(FileSystem.java:1885) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.checkPath(ColumnarFilesReader.java:46) at org.apache.kylin.engine.mr.streaming.ColumnarFilesReader.<init>(ColumnarFilesReader.java:41) at org.apache.kylin.engine.mr.streaming.DictsReader.<init>(DictsReader.java:43) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.init(ColumnarSplitDictReader.java:65) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictReader.<init>(ColumnarSplitDictReader.java:52) at org.apache.kylin.engine.mr.streaming.ColumnarSplitDictInputFormat.createRecordReader(ColumnarSplitDictInputFormat.java:32) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:524) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:169) at org.apache.kylin.engine.mr.common.MapReduceExecutable.doWork(MapReduceExecutable.java:173) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.execution.DefaultChainedExecutable.doWork(DefaultChainedExecutable.java:71) at org.apache.kylin.job.execution.AbstractExecutable.execute(AbstractExecutable.java:179) at org.apache.kylin.job.impl.threadpool.DefaultScheduler$JobRunner.run(DefaultScheduler.java:114) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Attachments

Activity

People

Assignee:: Xiaoxiang Yu

Reporter:: Andras Istvan Nagy

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 13/Dec/19 19:07

Updated:: 31/Jul/20 12:19

Resolved:: 07/Feb/20 05:11