Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-10524

Decision tree binary classification with ordered categorical features: incorrect centroid

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.0, 1.6.0
    • 1.6.1, 2.0.0
    • ML, MLlib
    • None

    Description

      In DecisionTree and RandomForest binary classification with ordered categorical features, we order categories' bins based on the hard prediction, but we should use the soft prediction.

      Here are the 2 places in mllib and ml:

      The PR which fixes this should include a unit test which isolates this issue, ideally by directly calling binsToBestSplit.

      Attachments

        Activity

          People

            viirya L. C. Hsieh
            josephkb Joseph K. Bradley
            Joseph K. Bradley Joseph K. Bradley
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: