You def use hive but you are not using the easy button. "best practice" is a abused term in our industry. I say a best practice for customer A may not be best practice for customer B. Its all about cluster size, hardware config, and use case which applies the "best practice" for again your specific use case. if you want to transform data the entire industry is moving to Spark. Spark is nice since it has multipule api for the same dataset. I recommend you open another HCC question if you are looking for a "best practice" on a specfic use case. I recommend NiFi for what you have identified.
... View more
Hi Michael, I'm still trying to figure out where to find that log (is there any folder?).
I've tried finding the Query ID that Hive shows once i execute a query, but couldn't find it.
Query ID = root_20161205190741_fb2a555d-1633-404d-9128-0c3696d2d56a Until now, i've just found the following exception (through CLI) after execution fails: Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1480931794353_0004_1_00, diagnostics=[Task failed, taskId=task_1480931794353_0004_1_00_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space Any suggestion? Regards and thank you Michael for helping me
... View more