Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-6116 DataFrame API improvement umbrella ticket (Spark 1.5)
  3. SPARK-8573

For PySpark's DataFrame API, we need to throw exceptions when users try to use and/or/not

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Duplicate
    • 1.3.0
    • 1.4.1, 1.5.0
    • PySpark, SQL
    • None

    Description

      In PySpark's DataFrame API, we have

      # `and`, `or`, `not` cannot be overloaded in Python,
      # so use bitwise operators as boolean operators
      __and__ = _bin_op('and')
      __or__ = _bin_op('or')
      __invert__ = _func_op('not')
      __rand__ = _bin_op("and")
      __ror__ = _bin_op("or")
      

      Right now, users can still use operators like and, which can cause very confusing behaviors. We need to throw an error when users try to use them and let them know what is the right way to do.

      For example,

      df = sqlContext.range(1, 10)
      df.id > 5 or df.id < 10
      Out[30]: Column<(id > 5)>
      df.id > 5 and df.id < 10
      Out[31]: Column<(id < 10)>
      

      Attachments

        Issue Links

          Activity

            People

              davies Davies Liu
              yhuai Yin Huai
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: