question Re: How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE . in Archives of Support Questions (Read Only)

How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE .

sampyyy — Fri, 16 Dec 2016 18:34:20 GMT

Re: How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE .

cstanca — Sat, 17 Dec 2016 09:16:44 GMT

As you already know, Hive does not support sub-queries such as connect by. Bad news, this is a general situation with similar tools in Hadoop ecosystem.

Join works if you know the number of levels and the query is quite ugly.

If you need hierarchical queries against databases that don't support Recursive Subquery Factoring, one very low-tech alternative is to encode the hierarchy into a separate key. Of course this will only work if you can control the table update process and rewrite the key following parent updates.

Your option is to take the hierarchical data, import it onto an RDBMS suited for connect by.

***

If response helped, please vote/accept best answer.

Re: How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE .

sampyyy — Mon, 19 Dec 2016 21:22:04 GMT

@Constantin Stanca

I have come across a certain blog which implements a scala code to generate the Hierarchal data In hive using UDTF.

But I have come acrss this below source code. But not sure How to execute or implement this.

class ExpandTree2UDTF extends GenericUDTF { var inputOIs: Array[PrimitiveObjectInspector] = null val tree: collection.mutable.Map[String,Option[String]] = collection.mutable.Map() override def initialize(args: Array[ObjectInspector]): StructObjectInspector = { inputOIs = args.map{_.asInstanceOf[PrimitiveObjectInspector]} val fieldNames = java.util.Arrays.asList("id", "ancestor", "level") val fieldOI = primitive.PrimitiveObjectInspectorFactory.javaStringObjectInspector.asInstanceOf[ObjectInspector] val fieldOIs = java.util.Arrays.asList(fieldOI, fieldOI, fieldOI) ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } def process(record: Array[Object]) { val id = inputOIs(0).getPrimitiveJavaObject(record(0)).asInstanceOf[String] val parent = Option(inputOIs(1).getPrimitiveJavaObject(record(1)).asInstanceOf[String]) tree += ( id -> parent ) } def close { val expandTree = collection.mutable.Map[String,List[String]]() def calculateAncestors(id: String): List[String] = tree(id) match { case Some(parent) => id :: getAncestors(parent) ; case None => List(id) } def getAncestors(id: String) = expandTree.getOrElseUpdate(id, calculateAncestors(id)) tree.keys.foreach{ id => getAncestors(id).zipWithIndex.foreach{ case(ancestor,level) => forward(Array(id, ancestor, level)) } } } }

Re: How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE .

cstanca — Tue, 20 Dec 2016 23:19:30 GMT

@Sampat Budankayala

Of course, you can use custom UDF, but those are not part of Hive core and performance is not guaranteed, especially for such expensive operation on a big data set. There is a reason that is not part of it. Iterative and recursive problems not well suited for map reduce because tasks do not share state or coordinate with each other.

If you still want to go this path, in a few words, you would build the jar and deploy it to Hive auxiliary libraries folder or HDFS, then create a permanent or temporary function which you can invoke it in your SQL. Follow the steps described here: https://dzone.com/articles/writing-custom-hive-udf-andudaf. Look also at this: https://community.hortonworks.com/articles/39980/creating-a-hive-udf-in-java.html.

You would follow similar steps with the code you found, even is Scala. I am not aware of similar code implemented in Java, but it must be.

Re: How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE .

aervits — Wed, 21 Dec 2016 20:51:14 GMT

If Constantin's awesome answer helped you, please accept the answer to close this thread, otherwise provide your solution or follow up questions for more clarity

Re: How to implement "connect By" of ORACLE in Hive ? OR create Hierarchie in HIVE .

TimothySpann — Sat, 24 Dec 2016 12:47:29 GMT

That is the way to go