Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13105 | 02-20-2018 12:33 PM | |
1450 | 02-19-2018 05:12 AM | |
1803 | 12-28-2017 06:13 AM | |
7011 | 09-28-2017 09:25 AM | |
11946 | 09-25-2017 11:19 AM |
12-12-2018
04:29 PM
Its good approach but the only point which I could find as disadvantage is multiple hops to achieve the desired result. Instead of performing joins we can apply windowing function to achieve the same in a single hop assuming you unique value column and last modified date in your scenario.
... View more
12-11-2018
07:26 AM
Hi @harsha vardhan Could you explain a bit more on that? Yes you can override the queue whenever you want. But it also depends on the user/groups access as well. If the user is assigned to specific groups and if the groups are not assigned/given privileges to access any other queue then it will not be possible unless proper access are given to user groups. But if you have access to multiple queues then , you can have a parameter passed as a queue name to the sqoop job and if the queue name has to be changed, then you can do that with the combination of shell+sqoop.
... View more
04-23-2018
06:35 AM
Hi @Swaapnika Guntaka When you are deleting a data from HDFS all the data will be moved to Trash. But there is a time span between which the trash will be flushed out on regular frequency. If its flushed out then there is no way to recover the data unless you DR in place which is possible only in production environment. Hope it Helps!!
... View more
04-12-2018
07:11 AM
Hi @johny
gate
Below query works but its kind of dirty. Hope it Helps!! select * from a
left join
(select*,lag(col3)over (partition by col1 order by col2) as lag_val from a) tblb
on tbl b.col1=a.col1 and a.col2=tblb.lag_val
... View more
04-06-2018
12:48 PM
Hi @Subramaniam Ramasubramanian You would have to start by looking into the executor failures. As you said that this jobs was working fine earlier and recently you were facing this issue. In that case I believe the maximum executor failures was set to 10 and it was working fine. But now the no of executor failures started increasing more than 10. Executor failures may be due to resource unavailability as well. So you may need to consider the cluster resource/ memory availability at the time of your job execution as well. Hope it helps!
... View more
03-16-2018
06:16 AM
1 Kudo
@Timothy Spann If open source is given importance then I would go with Hive using merge, though I haven't tried with merge with huge volume I believe that it would work decent.
... View more
03-15-2018
05:36 AM
@Timothy Spann I would go with either attunity & or some utility/framework which can be modified depending on the use case. These kind of frameworks reduces time and effort. Multiple tables can be processed in parallel with less effort.
... View more
02-21-2018
05:20 AM
Hi @hippagun It wont work. though its ORC hive will be able to differentiate the columns based on the delimiter which you have specified during the table creation. So no matter whether you re-create it it wont work. There are two option which you can do now: 1) Create another external table with the additional columns. Write a simple query to load the records from old to the new table specifying null the newly added columns. Once it is done drop the old table. Going forward you can make use of this table. It will be suitable for ORC 2) The other way is, If the schema of the table changes frequently then its better to go with avro table as the schema changes can be handled easily. You have to follow the above step just for the first time. But whenever there is any changes in the schema in future then you need to alter the schema file and nothing else is needed. You can refer to this Link to get the understanding of the handling the schema changes in avro file. Hope it helps!!
... View more
02-20-2018
12:33 PM
Hi @Ravikiran Dasari If it is for knowledge purpose then what Im going to give has no more information then the previous answers. But if you are looking for something related to work then this answer might help a bit. Have a file watcher which looks for a file with the particular pattern, which has to be ftp'ed to the desired location. Once the file arrives you can move the file to HDFS server. This can be accomplished by a simple shell script which requires basic knowledge on shell and nothing more. Also this can accomplished by either push or pull. If you have any other downstream jobs which has to be executed once the file arrives in hdfs then I would recommend to go with pull approach so that you can execute any other hadoop/hive/pig/spark jobs in hdfs server. Hope it helps!!
... View more
02-19-2018
05:12 AM
Hi @Lanic When you submit a job, its YARN which gives an information about the resources. So the driver gets the information from name node regarding the HDFS data location, needed to execute the job. Then based on the nearest available resource which are closer to the data will be taken into consideration where the jobs will be executed. Its the name node which gives Yarn about the information of the HDFS data location. Once all the jobs are completed then the communication about all the jobs status will be updated and corresponding metastore will be brought in sync. Hope it Helps!!
... View more