About balavignesh_nag

balavignesh_nag · ‎11-06-2017

@Team Spark Seems the target has 2 columns where as your insert query has 3 columns. Check the no of columns in the select clause. Then it should work fine. Its not the problem with dynamic partition.

balavignesh_nag · ‎10-30-2017

@Saurabh It happens sometimes because of limitation of no of lines displayed in CLI. try this --> hive -e "show create table sample_db.i0001_ivo_hdr;" > ddl.txt

balavignesh_nag · ‎10-25-2017

@Biswajit Chakraborty One way of doing this is to reduce the no of records which is flowing from spark to hive. Use a filter condition and reduce the no of records flowing and try to insert in multiple inserts. That should work and reduce the record flowing into memory. Also when you are inserting from spark to hive im not sure why but there are high chances that the data moving to shuffles might be very high. If possible attach the complete logs.Hope it helps!!

balavignesh_nag · ‎10-17-2017

@Rush You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!

balavignesh_nag · ‎10-16-2017

@Rashi Jain To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!

balavignesh_nag · ‎10-14-2017

@viswanath When you are running the insert statement for a static partition then the lock is obtained over the folder which would be created for the static partition. Now your select query would running over the entire table or the entire folder created for the table which includes all the partition including the one which you are overwriting. In such case it comes to simple file handling. When a file is getting written then it locks the file. The same applies here as well. Hope it helps!!

balavignesh_nag · ‎10-11-2017

Hi @Saravanan Ramaraj Technically you cant compare RDBMS with Hive atleast for now. Once way of doing it capture stats of the table in the hive properties through you can see improvement in performance. If you wanted to do analysis on column then you may have to run Analyze table dbname.tblname partition compute statistics for columns column_name; By performing column statistics you can experience a better performance. Hive works well in terms of large computing for which it is specifically designed. But comparing with RDBMS is like comparing apples with oranges. Hope it helps!

balavignesh_nag · ‎09-28-2017

Hi @Gayathri Devi sample nested case which can be used in hive. Select case when hour(split(split(hbid,"#")[1],"_")[1])== 0 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2") when hour(split(split(hbid,"#")[1],"_")[1]) ==1 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2")] when hour(split(split(hbid,"#")[1],"_")[1]) ==2 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")] when hour(split(split(hbid,"#")[1],"_")[1]) ==2 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")] when hour(split(split(hbid,"#")[1],"_")[1]) ==3 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")] else 'NA' end as column1 from table_name; If it helps then please accept it as the best answer! Happy Hadooping!!

balavignesh_nag · ‎09-27-2017

@Gayathri Devi You have missed end in the statement. This should work. case when hour(split(split(hbid,"#")[1],"_")[1])==0<br>then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2") else 'NA' end Hope it helps!!

balavignesh_nag · ‎09-26-2017

@Gayathri Devi I couldn't think of any built in function in hive to handle this scenario. The other way of doing this is by using something like below: select from_unixtime(unix_timestamp(current_timestamp)+7200); 7200 implies seconds needed for 2 hours as mentioned in your quetion. You can alter it based on your need. Instead of the hardcoded value you can pass variable or row_number()over() * 3600 can be used to generate sequentially. Hope it Helps!!

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Re: Hive Dynamic partition issue

Re: show create table view_name not showing comple...

Re: Process big size file in spark / Hive

Re: Can I use split by with multiple mappers in sc...

Re: Can I use split by with multiple mappers in sc...

Re: Reading a Hive table while doing insert overwr...

Re: Why my hive over tez query slow for 25 million...

Re: CASE statement Error in Beeline HIVE

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...