Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6356][SQL] Support the ROLLUP/CUBE/GROUPING SETS in SQLContext #5080

Closed
wants to merge 1 commit into from

Conversation

watermen
Copy link
Contributor

Support for the syntax(also supported by HiveContext) below:

GROUP BY expression list WITH ROLLUP
GROUP BY expression list WITH CUBE
GROUP BY expression list GROUPING SETS(expression list2)

#5045

/cc @yhuai

@yhuai
Copy link
Contributor

yhuai commented Mar 18, 2015

ok to test

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28767 has finished for PR 5080 at commit c3a3785.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 18, 2015

Test build #28778 has finished for PR 5080 at commit 5d8e845.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 23, 2015

Test build #28987 has finished for PR 5080 at commit 238a782.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

| rep1sep(expression, ",")
)

private def calGroupingSets(child: LogicalPlan, aggregations: Seq[NamedExpression],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each argument per line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupByExprs should always be valid (not None) currently, let's say groupByExprs: Seq[Expression] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because I want to compat syntax GROUP BY GROUPING SETS(expression list) in #5045 before, and I'll change to groupByExprs: Seq[Expression] in this patch, thanks for review the code.

@chenghao-intel
Copy link
Contributor

LGTM in general, except some small issues.

@SparkQA
Copy link

SparkQA commented Mar 24, 2015

Test build #29046 has finished for PR 5080 at commit 3ba3fc2.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@watermen
Copy link
Contributor Author

@yhuai @marmbrus Any more comment on this?

@marmbrus
Copy link
Contributor

My biggest comment is that we probably should avoid adding new keywords to the SQLParser. While this might be a nice feature to have, you can already do it through HiveQL and you are going to break existing code that uses column names SETS, WITH, GROUPING, CUBE, etc. As a result I think we should close this issue.

@ash211
Copy link
Contributor

ash211 commented Apr 22, 2015

I would find the ROLLUP feature useful for some of my workflows. If we're trying to keep the SQLParser forward compatible indefinitely, what are your plans for achieving SQL92 (and later) compatibility in Spark SQL? The work with dialects happening at #4015 ?

@watermen
Copy link
Contributor Author

@marmbrus I see If these dependencies are not a problem for your application then using HiveContext is recommended for the 1.3 release of Spark. Future releases will focus on bringing SQLContext up to feature parity with a HiveContext. in sql-programming-guide, so why don't we adding new keywords to the SQLParser? I think it's better to avoid use SETS, WITH which are keywords in SQL, can you give me more info? Many thanks.

@marmbrus
Copy link
Contributor

@ash211, exactly. I think the best tactic is to create a new SQL92 parser with something more robust than Scala parser combinators.

@watermen, do other dialects actually disallow those words as identifiers? I think part of the problem here is that scala parser combinators are very rigid and our keywords can never be used as identifiers even where it would not cause ambiguity.

That said, if there is a lot of demand for this, we can add it. I just want everyone to be aware that there is a cost to compatibility for every addition (and users have complained in the past).

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@asfgit asfgit closed this in 8dee274 Apr 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants