Uploaded image for project: 'Apache NiFi'
  1. Apache NiFi
  2. NIFI-13077

On-demand Extension Provider

    XMLWordPrintableJSON

Details

    • Epic
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Core Framework
    • None
    • On-demand Extension Provider
    • To Do

    Description

      We currently have the concept of ExternalResourceProvider with two implementations (HDFS and NiFi Registry) that can be configured to list and download all NARs made available in those locations. Those implementations, if configured, would get started when NiFi starts and would download ALL of the available NARs, plus a background thread would check every five minutes for new NARs to be available and downloaded.

      The proposal here is to have a similar concept that would focus on extensions / components but instead of having a background thread and instead of having all of the components downloaded, the approach would be to plug this into the ExtensionBuilder and when a component cannot be instantiated (when loading a flow definition) with locally available components, then, instead of creating a ghost component, the Extension Providers would be queried with specific coordinates and if the provider makes the component available, then the NAR would be downloaded (alongside required dependencies if the NAR depends on another NAR).

      This approach already exists in the Kafka Connect NiFi plugin with the class ExtensionClientDefinition. By adopting this approach in NiFi, it’d be much easier to ship a much smaller version of NiFi and have NiFi download the required components based on flows that are being instantiated / deployed.

      The operation of downloading the NAR would not be blocking, meaning that we would still create a ghost component but after completion of the NAR(s) download and the loading of the components, the flows would be fully operational.

      It might be possible to show something similar as for the Python extensions where we show that the component is still in the process of downloading third party dependencies.

      While this is a great opportunity to reduce the size of the NiFi binary (and associated container image), it would not be great from a user perspective when designing flows because all of the NARs removed from the default image would no longer be visible in the list of available components when adding, for example, a processor to the canvas.

      Longer term we could imagine that the extension providers can also implement a listing API so that when showing the list of available components, we would show the list of the components available locally as well as the components available through the extensions providers. The listing of components could add another column to indicate the source of the component.

      This is something that is exposed for the Extension Bundles in the NiFi Registry (we also have the information about the NiFi API version that has been used for building the components so we could use this information to only list components that should be compatible from an API standpoint - same major version but lower or equal API version).

      The immediate goal though would be to introduce the concept of ExtensionProvider with the following APIs:

      boolean isAvailableExtension(Coordinates)
      void downloadExtension(Coordinates)
      

      Longer term we could also consider something like:

      List<Extensions> listExtensions()

      But we would need to figure out how a NAR can provide the information about the components that are inside of it. The NiFi Registry provides this information, but that would not be the case for a Maven based implementation for example.

      In nifi.properties we would have something looking like:

      nifi.nar.extension.provider.<identifier>.<property-name>

      And we would loop through all the configured providers to find the appropriate NAR to download based on provided coordinates in the flow definition that is being instantiated (either from flow.json.gz, or an uploaded JSON flow definition, or when checking out a flow from a registry client).

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              pvillard Pierre Villard
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: