Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
0.14.0, 1.0.0, 1.1.0, 1.2.0
-
None
Description
Queries using join and group by produce multiple output rows with the same key when hive.auto.convert.join=false and hive.optimize.reducededuplication=true. This interaction between configuration parameters is unexpected and should be well documented at the very least and should likely be considered a bug.
e.g.
hive> set hive.auto.convert.join = false;
hive> set hive.optimize.reducededuplication = true;
hive> SELECT foo.id, count as factor
> FROM foo
> JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
> JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
> JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
> WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
> GROUP BY foo.id;
XYZ 79
XYZ 74
XYZ 297
XYZ 66
hive> set hive.auto.convert.join = true;
hive> set hive.optimize.reducededuplication = true;
hive> SELECT foo.id, count as factor
> FROM foo
> JOIN bar ON (foo.id = bar.id and foo.line_id = bar.line_id)
> JOIN split ON (foo.id = split.id and foo.line_id = split.line_id)
> JOIN forecast ON (foo.id = forecast.id AND foo.line_id = forecast.line_id)
> WHERE foo.order != ‘blah’ AND foo.id = ‘XYZ'
> GROUP BY foo.id;
XYZ 516
Attachments
Attachments
Issue Links
- links to