About rakesh_an1992

rakesh_an1992 · ‎09-26-2018

Hi @Srikanth You can try using the below approach : create table test1(store id, items STRING); insert into table test1 values(22, '1001 abc, 1002 pqr, 1003 tuv'); insert into table test1 values(33, '1004 def, 1005 xyz'); I have created a sample table in Hive and executed below query to get the expected result. select store, split(item,' ')[0] as item_id,split(item,' ')[1] as item_name from test1 lateral view explode(split(items,', ')) vExplodeTbl as item;

rakesh_an1992 · ‎04-05-2018

I have a large file of 5 GB which has detailed information about an Employee and also, i have 1 small file with 2 MB which has only employee names. I want to extract the employee names from the smaller file and do analysis on larger file using employee name. How can I do this in Map reduce ?

rakesh_an1992 · ‎03-06-2018

@Shu How is number of Mappers/reducers decided for a given query will be decided in runtime ? Is it dependet on how many number of Joins or group by or order by clauses that are used in the query ? If yes, then please let me know how many mappers and reducers are launched for the below query. select name, count(*) as cnt from test group by name order by name;

rakesh_an1992 · ‎02-22-2018

@Jordan Moore Thanks for the update !!! We are also working in the same fashion as you said, but I thought that other companies might be following agile/scrum methodologies for the Hadoop Development. Also, I have one more question; How is stand-up meetings or client interaction process done in big Data projects ?

rakesh_an1992 · ‎02-21-2018

Hi, I'm trying to implement Hadoop project and i'm researching on how the SDLC workflow involved in the Hadoop project. Thanks,

rakesh_an1992 · ‎12-14-2017

@Jordan Moore Thanks for the suggestion. Can you please let me know how log from different sever collected in real-time projects ? If you know any link, you can share.

rakesh_an1992 · ‎12-13-2017

Thanks for the clarification @Shu

rakesh_an1992 · ‎12-12-2017

Hi, I want to know how Flume is very much useful in streaming log files in real-time. I have practiced to import files through 'exec' command but I want to know what are the different sources used in Flume streaming in real-time projects. Please help me out in clearing this doubt. Thanks,

rakesh_an1992 · ‎12-12-2017

@Shu I have one doubt - If we change the contents of a file, will this affect to the metadata information stored on the Namenode. what happens if we keep on appending the data to the same file on daily basis? Also, what if we append large files, will this reduces performance ? Do you recommend to appending data the existing file or creating the new file ? Thanks,

rakesh_an1992 · ‎12-11-2017

Hi, I have a text file stored in HDFS and I want to append some rows into it. How can I resolve complete this task ? Thanks,

Online	Offline
Last Visited	‎10-28-2019 02:51 AM

Member Since	‎12-06-2017 06:06 AM
Last Visited	‎10-28-2019 02:51 AM
Posts	15
Kudos received	3

Cloudera Community

Re: Hive Split for columns

Writing a Map reduce code with larger and smaller ...

Re: Hive queries use only mappers or only reducers

Re: How Hadoop works with agile and scrum methodol...

How Hadoop works with agile and scrum methodologie...

Re: what are the different sources used in real-ti...

Re: Can I change the contents of a file present in...

what are the different sources used in real-time t...

Re: Can I change the contents of a file present in...

Can I change the contents of a file present inside...