Created 12-16-2016 10:34 AM
Created 12-17-2016 01:16 AM
As you already know, Hive does not support sub-queries such as connect by. Bad news, this is a general situation with similar tools in Hadoop ecosystem.
Join works if you know the number of levels and the query is quite ugly.
If you need hierarchical queries against databases that don't support Recursive Subquery Factoring, one very low-tech alternative is to encode the hierarchy into a separate key. Of course this will only work if you can control the table update process and rewrite the key following parent updates.
Your option is to take the hierarchical data, import it onto an RDBMS suited for connect by.
***
If response helped, please vote/accept best answer.
Created 12-17-2016 01:16 AM
As you already know, Hive does not support sub-queries such as connect by. Bad news, this is a general situation with similar tools in Hadoop ecosystem.
Join works if you know the number of levels and the query is quite ugly.
If you need hierarchical queries against databases that don't support Recursive Subquery Factoring, one very low-tech alternative is to encode the hierarchy into a separate key. Of course this will only work if you can control the table update process and rewrite the key following parent updates.
Your option is to take the hierarchical data, import it onto an RDBMS suited for connect by.
***
If response helped, please vote/accept best answer.
Created 12-19-2016 01:22 PM
I have come across a certain blog which implements a scala code to generate the Hierarchal data In hive using UDTF.
But I have come acrss this below source code. But not sure How to execute or implement this.
class
ExpandTree
2
UDTF
extends
GenericUDTF {
var
inputOIs
:
Array[PrimitiveObjectInspector]
=
null
val
tree
:
collection.mutable.Map[String,Option[String]]
=
collection.mutable.Map()
override
def
initialize(args
:
Array[ObjectInspector])
:
StructObjectInspector
=
{
inputOIs
=
args.map{
_
.asInstanceOf[PrimitiveObjectInspector]}
val
fieldNames
=
java.util.Arrays.asList(
"id"
,
"ancestor"
,
"level"
)
val
fieldOI
=
primitive.PrimitiveObjectInspectorFactory.javaStringObjectInspector.asInstanceOf[ObjectInspector]
val
fieldOIs
=
java.util.Arrays.asList(fieldOI, fieldOI, fieldOI)
ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
}
def
process(record
:
Array[Object]) {
val
id
=
inputOIs(
0
).getPrimitiveJavaObject(record(
0
)).asInstanceOf[String]
val
parent
=
Option(inputOIs(
1
).getPrimitiveJavaObject(record(
1
)).asInstanceOf[String])
tree +
=
( id -> parent )
}
def
close {
val
expandTree
=
collection.mutable.Map[String,List[String]]()
def
calculateAncestors(id
:
String)
:
List[String]
=
tree(id)
match
{
case
Some(parent)
=
> id
::
getAncestors(parent) ;
case
None
=
> List(id) }
def
getAncestors(id
:
String)
=
expandTree.getOrElseUpdate(id, calculateAncestors(id))
tree.keys.foreach{ id
=
> getAncestors(id).zipWithIndex.foreach{
case
(ancestor,level)
=
> forward(Array(id, ancestor, level)) } }
}
}
Created 12-20-2016 03:19 PM
Of course, you can use custom UDF, but those are not part of Hive core and performance is not guaranteed, especially for such expensive operation on a big data set. There is a reason that is not part of it. Iterative and recursive problems not well suited for map reduce because tasks do not share state or coordinate with each other.
If you still want to go this path, in a few words, you would build the jar and deploy it to Hive auxiliary libraries folder or HDFS, then create a permanent or temporary function which you can invoke it in your SQL. Follow the steps described here: https://dzone.com/articles/writing-custom-hive-udf-andudaf. Look also at this: https://community.hortonworks.com/articles/39980/creating-a-hive-udf-in-java.html.
You would follow similar steps with the code you found, even is Scala. I am not aware of similar code implemented in Java, but it must be.
Created 12-24-2016 04:47 AM
That is the way to go
Created 12-21-2016 12:51 PM
If Constantin's awesome answer helped you, please accept the answer to close this thread, otherwise provide your solution or follow up questions for more clarity