Member since
06-23-2016
13
Posts
2
Kudos Received
0
Solutions
09-07-2017
08:32 PM
@Eugene Koifman That helped reduce the spilled_rows from 11 billion to 5 billion. I was under the impression that inserting data into a partition is faster with a distribute by. This was useful. Also, I heard compressing the intermediary files helps reduce the spilled_rows. Is that correct? set
mapreduce.map.output.compress = true set
mapreduce.output.fileoutputformat.compress = true Or anything else we can do to optimize the query?
... View more
07-11-2017
03:03 AM
@Gaurav Mallikarjuna In the above example you can notice that I used other method to connect to hiveserver2 - using hive2 node + its port number like $ beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin
Using admin is for my sample only. In your case - if your transport mode is binary and the cluster is NON kerberized - $ beeline -u "jdbc:hive2://<hiveserver2-hostname>:10000/" -n <username>
... View more