Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-8473

JarListHandler may fail with NPE if directory is deleted

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.4.0, 1.5.0
    • 1.4.1, 1.5.0
    • Runtime / Web Frontend
    • None

    Description

      The JarListHandler is responsible for listing all jars that have been uploaded via the webUI. Uploaded jars are stored in a temporary directory, and the handler simply iterates over the contents of this directory to create the listing.

       

      File[] list = jarDir.listFiles(new FilenameFilter() {
         @Override
         public boolean accept(File dir, String name) {
            return name.endsWith(".jar");
         }
      });
      
      // last modified ascending order
      Arrays.sort(list, (f1, f2) -> Long.compare(f2.lastModified(), f1.lastModified()));

      This code will fail with a NullPointerException if the directory was deleted (like as part of some automatic cleanup). File#listFiles returns null in this case causing Arrays.sort to throw the NPE.

      Attachments

        Issue Links

          Activity

            githubbot ASF GitHub Bot added a comment -

            GitHub user zentol opened a pull request:

            https://github.com/apache/flink/pull/5331

            FLINK-8473[webUI] Improve error behavior of JarListHandler

              1. What is the purpose of the change

            This PR modifies the `JarListHandler` to not crash when the jar storage directory has been deleted. This caused the job-submission-page in the webUI to be completely blank.

            We now log a warning (to make debugging easier) and continue on as if the directory was empty (so that the page isn't blank).

              1. Verifying this change

            Manually verified and not covered by tests, like the rest of the handler.

              1. Does this pull request potentially affect one of the following parts:
            • Dependencies (does it add or upgrade a dependency): (no)
            • The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no)
            • The serializers: (no)
            • The runtime per-record code paths (performance sensitive): (no)
            • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
            • The S3 file system connector: (no)
              1. Documentation
            • Does this pull request introduce a new feature? (no)
            • If yes, how is the feature documented? (not applicable)

            You can merge this pull request into a Git repository by running:

            $ git pull https://github.com/zentol/flink 8473

            Alternatively you can review and apply these changes as the patch at:

            https://github.com/apache/flink/pull/5331.patch

            To close this pull request, make a commit to your master/trunk branch
            with (at least) the following in the commit message:

            This closes #5331


            commit c660166ce1818b08f7cd44da5cc504ed73d20177
            Author: zentol <chesnay@...>
            Date: 2018-01-22T12:29:34Z

            FLINK-8473[webUI] Improve error behavior of JarListHandler


            githubbot ASF GitHub Bot added a comment - GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/5331 FLINK-8473 [webUI] Improve error behavior of JarListHandler What is the purpose of the change This PR modifies the `JarListHandler` to not crash when the jar storage directory has been deleted. This caused the job-submission-page in the webUI to be completely blank. We now log a warning (to make debugging easier) and continue on as if the directory was empty (so that the page isn't blank). Verifying this change Manually verified and not covered by tests, like the rest of the handler. Does this pull request potentially affect one of the following parts: Dependencies (does it add or upgrade a dependency): (no) The public API, i.e., is any changed class annotated with `@Public(Evolving)`: (no) The serializers: (no) The runtime per-record code paths (performance sensitive): (no) Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no) The S3 file system connector: (no) Documentation Does this pull request introduce a new feature? (no) If yes, how is the feature documented? (not applicable) You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 8473 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/5331.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5331 commit c660166ce1818b08f7cd44da5cc504ed73d20177 Author: zentol <chesnay@...> Date: 2018-01-22T12:29:34Z FLINK-8473 [webUI] Improve error behavior of JarListHandler
            githubbot ASF GitHub Bot added a comment -

            Github user uce commented on a diff in the pull request:

            https://github.com/apache/flink/pull/5331#discussion_r162932050

            — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java —
            @@ -145,6 +155,7 @@ public boolean accept(File dir, String name)

            { return writer.toString(); }

            catch (Exception e) {
            + LOG.warn("Failed to fetch jar list.", e);
            — End diff –

            Aren't failed completions logged anyways?

            githubbot ASF GitHub Bot added a comment - Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/5331#discussion_r162932050 — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java — @@ -145,6 +155,7 @@ public boolean accept(File dir, String name) { return writer.toString(); } catch (Exception e) { + LOG.warn("Failed to fetch jar list.", e); — End diff – Aren't failed completions logged anyways?
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on a diff in the pull request:

            https://github.com/apache/flink/pull/5331#discussion_r162934645

            — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java —
            @@ -145,6 +155,7 @@ public boolean accept(File dir, String name)

            { return writer.toString(); }

            catch (Exception e) {
            + LOG.warn("Failed to fetch jar list.", e);
            — End diff –

            not in legacy handlers

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/5331#discussion_r162934645 — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java — @@ -145,6 +155,7 @@ public boolean accept(File dir, String name) { return writer.toString(); } catch (Exception e) { + LOG.warn("Failed to fetch jar list.", e); — End diff – not in legacy handlers
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on a diff in the pull request:

            https://github.com/apache/flink/pull/5331#discussion_r162934893

            — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java —
            @@ -145,6 +155,7 @@ public boolean accept(File dir, String name)

            { return writer.toString(); }

            catch (Exception e) {
            + LOG.warn("Failed to fetch jar list.", e);
            — End diff –

            scratch that, should be logged, I'll double check

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/5331#discussion_r162934893 — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java — @@ -145,6 +155,7 @@ public boolean accept(File dir, String name) { return writer.toString(); } catch (Exception e) { + LOG.warn("Failed to fetch jar list.", e); — End diff – scratch that, should be logged, I'll double check
            githubbot ASF GitHub Bot added a comment -

            Github user uce commented on a diff in the pull request:

            https://github.com/apache/flink/pull/5331#discussion_r162935682

            — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java —
            @@ -77,6 +82,11 @@ public boolean accept(File dir, String name) {
            }
            });

            + if (list == null) {
            + LOG.warn("Jar storage directory {} has been deleted.", jarDir.getAbsolutePath());
            — End diff –

            Somehow this comment got lost before: Should we make the error message more explicit and say that it was `deleted externally` (e.g. not by Flink)?

            githubbot ASF GitHub Bot added a comment - Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/5331#discussion_r162935682 — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java — @@ -77,6 +82,11 @@ public boolean accept(File dir, String name) { } }); + if (list == null) { + LOG.warn("Jar storage directory {} has been deleted.", jarDir.getAbsolutePath()); — End diff – Somehow this comment got lost before: Should we make the error message more explicit and say that it was `deleted externally` (e.g. not by Flink)?
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on a diff in the pull request:

            https://github.com/apache/flink/pull/5331#discussion_r162936018

            — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java —
            @@ -77,6 +82,11 @@ public boolean accept(File dir, String name) {
            }
            });

            + if (list == null) {
            + LOG.warn("Jar storage directory {} has been deleted.", jarDir.getAbsolutePath());
            — End diff –

            yes thats a good idea

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/5331#discussion_r162936018 — Diff: flink-runtime-web/src/main/java/org/apache/flink/runtime/webmonitor/handlers/JarListHandler.java — @@ -77,6 +82,11 @@ public boolean accept(File dir, String name) { } }); + if (list == null) { + LOG.warn("Jar storage directory {} has been deleted.", jarDir.getAbsolutePath()); — End diff – yes thats a good idea
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on the issue:

            https://github.com/apache/flink/pull/5331

            @uce Could you take another look?

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/5331 @uce Could you take another look?
            githubbot ASF GitHub Bot added a comment -

            Github user uce commented on the issue:

            https://github.com/apache/flink/pull/5331

            đź‘Ť to merge.

            githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/5331 đź‘Ť to merge.
            githubbot ASF GitHub Bot added a comment -

            Github user StephanEwen commented on the issue:

            https://github.com/apache/flink/pull/5331

            Will this allow new upload of jars? Would we need an `mkdirs` to re-create that directory?

            githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5331 Will this allow new upload of jars? Would we need an `mkdirs` to re-create that directory?
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on the issue:

            https://github.com/apache/flink/pull/5331

            I haven't tried that out yet. I couldn't quite figure out where the upload happens (maybe completely handled by netty?).

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/5331 I haven't tried that out yet. I couldn't quite figure out where the upload happens (maybe completely handled by netty?).
            githubbot ASF GitHub Bot added a comment -

            Github user uce commented on the issue:

            https://github.com/apache/flink/pull/5331

            I just tried it, re-uploading after deleting the directory *does not work*. Good catch Stephan.

            @zentol: I found `HttpRequestHandler` which handles the uploads. The handler assumes that the directory exists and the logic for creating the temp directory is in `WebRuntimeMonitor`. We should consolidate this in a single place (if no one else needs this directory).

            githubbot ASF GitHub Bot added a comment - Github user uce commented on the issue: https://github.com/apache/flink/pull/5331 I just tried it, re-uploading after deleting the directory * does not work *. Good catch Stephan. @zentol: I found `HttpRequestHandler` which handles the uploads. The handler assumes that the directory exists and the logic for creating the temp directory is in `WebRuntimeMonitor`. We should consolidate this in a single place (if no one else needs this directory).
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on the issue:

            https://github.com/apache/flink/pull/5331

            I've updated the PR. All handlers that access the jar storage directory now check whether it exists and regenerate it if necessary with appropriate logging messages. The logging and directory creation is done via static methods in the WebRuntimeMonitor.

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/5331 I've updated the PR. All handlers that access the jar storage directory now check whether it exists and regenerate it if necessary with appropriate logging messages. The logging and directory creation is done via static methods in the WebRuntimeMonitor.
            githubbot ASF GitHub Bot added a comment -

            Github user StephanEwen commented on the issue:

            https://github.com/apache/flink/pull/5331

            Thanks, looks good!

            Please merge into `master` and `release-1.4`...

            githubbot ASF GitHub Bot added a comment - Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/5331 Thanks, looks good! Please merge into `master` and `release-1.4`...
            githubbot ASF GitHub Bot added a comment -

            Github user zentol commented on the issue:

            https://github.com/apache/flink/pull/5331

            merging.

            githubbot ASF GitHub Bot added a comment - Github user zentol commented on the issue: https://github.com/apache/flink/pull/5331 merging.

            master: 8fdea6093a55c33732ae869b82552371b8142c2a

            chesnay Chesnay Schepler added a comment - master: 8fdea6093a55c33732ae869b82552371b8142c2a
            githubbot ASF GitHub Bot added a comment -

            Github user asfgit closed the pull request at:

            https://github.com/apache/flink/pull/5331

            githubbot ASF GitHub Bot added a comment - Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/5331

            1.4: 20be204b96edd5c92683013a4c5af9ea4096acca

            chesnay Chesnay Schepler added a comment - 1.4: 20be204b96edd5c92683013a4c5af9ea4096acca

            People

              chesnay Chesnay Schepler
              chesnay Chesnay Schepler
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: