About gauravmallik12

gauravmallik12 · ‎09-07-2017

@Eugene Koifman That helped reduce the spilled_rows from 11 billion to 5 billion. I was under the impression that inserting data into a partition is faster with a distribute by. This was useful. Also, I heard compressing the intermediary files helps reduce the spilled_rows. Is that correct? set mapreduce.map.output.compress = true set mapreduce.output.fileoutputformat.compress = true Or anything else we can do to optimize the query?

dkozlowski · ‎07-11-2017

@Gaurav Mallikarjuna In the above example you can notice that I used other method to connect to hiveserver2 - using hive2 node + its port number like $ beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin Using admin is for my sample only. In your case - if your transport mode is binary and the cluster is NON kerberized - $ beeline -u "jdbc:hive2://<hiveserver2-hostname>:10000/" -n <username>

Online	Offline
Last Visited	‎09-07-2017 08:32 PM

Member Since	‎06-23-2016 04:39 AM
Last Visited	‎09-07-2017 08:32 PM
Posts	13
Kudos received	2

Cloudera Community

Re: How to reduce 'SPILLED_RECORDS' in Hive with T...

Re: Beeline -u "JDBC:hive2://url..." -n username ...