Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3164

Store DecisionTree Split.categories as Set

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Trivial
    • Resolution: Fixed
    • None
    • 1.4.0
    • ML
    • None

    Description

      Improvement: computation

      For categorical features with many categories, it could be more efficient to store Split.categories as a Set, not a List. (It is currently a List.) A Set might be more scalable (for log n lookups), though tests would need to be done to ensure that Sets do not incur too much more overhead than Lists.

      Attachments

        Issue Links

          Activity

            People

              josephkb Joseph K. Bradley
              josephkb Joseph K. Bradley
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: