Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13380 | 02-20-2018 12:33 PM | |
1514 | 02-19-2018 05:12 AM | |
1864 | 12-28-2017 06:13 AM | |
7150 | 09-28-2017 09:25 AM | |
12190 | 09-25-2017 11:19 AM |
11-06-2017
07:57 AM
@Team Spark Seems the target has 2 columns where as your insert query has 3 columns. Check the no of columns in the select clause. Then it should work fine. Its not the problem with dynamic partition.
... View more
10-30-2017
12:13 PM
@Saurabh It happens sometimes because of limitation of no of lines displayed in CLI. try this --> hive -e "show create table sample_db.i0001_ivo_hdr;" > ddl.txt
... View more
10-25-2017
10:06 AM
@Biswajit Chakraborty One way of doing this is to reduce the no of records which is flowing from spark to hive. Use a filter condition and reduce the no of records flowing and try to insert in multiple inserts. That should work and reduce the record flowing into memory. Also when you are inserting from spark to hive im not sure why but there are high chances that the data moving to shuffles might be very high. If possible attach the complete logs.Hope it helps!!
... View more
10-17-2017
12:30 PM
@Rush You have to correct the sqoop syntax. Mapper parameter has to be specified before the target directory. Please correct the syntax and re-trigger the sqoop command. It should work fine. Hope if helps!!
... View more
10-16-2017
01:08 PM
@Rashi Jain To increase the performance of sqoop import increase the no of mappers depending on the source load and no of records which are ingested into HDFS. Also in split by try to use primary key through which you will be able to identify the unique records. So that the records will split into multiple mappers and the ingestion would work faster. Hope It Helps!!
... View more
10-14-2017
10:10 AM
@viswanath When you are running the insert statement for a static partition then the lock is obtained over the folder which would be created for the static partition. Now your select query would running over the entire table or the entire folder created for the table which includes all the partition including the one which you are overwriting. In such case it comes to simple file handling. When a file is getting written then it locks the file. The same applies here as well. Hope it helps!!
... View more
10-11-2017
09:19 AM
Hi @Saravanan Ramaraj Technically you cant compare RDBMS with Hive atleast for now. Once way of doing it capture stats of the table in the hive properties through you can see improvement in performance. If you wanted to do analysis on column then you may have to run Analyze table dbname.tblname partition compute statistics for columns column_name; By performing column statistics you can experience a better performance. Hive works well in terms of large computing for which it is specifically designed. But comparing with RDBMS is like comparing apples with oranges. Hope it helps!
... View more
09-28-2017
09:25 AM
1 Kudo
Hi @Gayathri Devi sample nested case which can be used in hive. Select case
when hour(split(split(hbid,"#")[1],"_")[1])== 0 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2")
when hour(split(split(hbid,"#")[1],"_")[1]) ==1 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2")]
when hour(split(split(hbid,"#")[1],"_")[1]) ==2 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")]
when hour(split(split(hbid,"#")[1],"_")[1]) ==2 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")]
when hour(split(split(hbid,"#")[1],"_")[1]) ==3 then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","2-4")]
else 'NA'
end as column1 from table_name; If it helps then please accept it as the best answer! Happy Hadooping!!
... View more
09-27-2017
12:53 PM
1 Kudo
@Gayathri Devi You have missed end in the statement. This should work. case when hour(split(split(hbid,"#")[1],"_")[1])==0<br>then concat(split(split(split(hbid,"#")[1],"_")[1]," ")[0],"/","0-2") else 'NA' end Hope it helps!!
... View more
09-26-2017
11:04 AM
1 Kudo
@Gayathri Devi I couldn't think of any built in function in hive to handle this scenario. The other way of doing this is by using something like below: select from_unixtime(unix_timestamp(current_timestamp)+7200); 7200 implies seconds needed for 2 hours as mentioned in your quetion. You can alter it based on your need. Instead of the hardcoded value you can pass variable or row_number()over() * 3600 can be used to generate sequentially. Hope it Helps!!
... View more