About simran_k

simran_k · ‎10-16-2016

That helps. What about the footer? Yes, headers and footers are static. @grajagopal

simran_k · ‎10-16-2016

I have a CSV file that looks like this: Report Name: XYZ Report Time: 11/11/1111 Time Zone: (GMT+05:30) i Last Completed Last Completed Available Hour: Report Aggregation: Daily Report Filter: Potential Incomplete Data: true Rows: 1 GregorianDate AccountId AccountName Clicks Impressions Ctr AverageCpc Spend 10/15/2016 1234556 ABC ©2016 Microsoft Corporation. All rights reserved. I need all header and footer taken off and only the actual data with column names to stay in this file. How do I do it in Pig?I would need this to be mapped to a Hive table so cannot have it this way.

simran_k · ‎10-11-2016

It asks for oozie and hive to be installed on master nodes, why is that? Should it not be installed on edge/client nodes?

simran_k · ‎10-11-2016

I read http://stackoverflow.com/questions/8456141/in-a-hadoop-cluster-should-hive-be-installed-on-all-nodes and it says hive should be installed on client machines. Alright. But on Ambari I install Hcatalog and Hive service right? Where should these services stay? What does it mean to install hive on client machine if the service would be installed on cluster too? Or is it like I do not need to install Hive service in Ambari? From what I understand, to add a client machine, only etc/hosts file is edited to declare it as a host and that's it. Or do I add it as a host in ambari and also move Hive service to the client machine? Also, in that case pig should be on client too? What about oozie and Ranger? Where should these too be installed?

simran_k · ‎10-11-2016

I have 1. Hive 2. Pig 3. Zookeeper 4.HDFS 5. Hue 6. Oozie 7. Sqoop 8. Yarn 9. Ranger Currently, all of these are deployed on the same host. Now, I would like to add more hosts to it. But I have a few doubts: In production, 1. a node means a server, right? No VM'S ? 2. How many servers I would need to add to have a healthy cluster 3. Which of the above mentioned services should be co-located? 4. What should be the distribution like? Pig is relatively used less but sqoop, Hive , Oozie and Hue most of the times and ofcourse Ranger for authorization part. What should be the distribution like? Which of these services should be moved to new hosts?Which of these should be co located? Which of these should have entirely dedicated server to them? I am new to it and would appreciate if you could give the specifications to establishing a multi-node cluster .

simran_k · ‎10-10-2016

My date looks like this: 2016-09-13T06:03:51Z I need to convert it in dd-mm-YYYY hh:mm:ss format. HOw do I do it? I tried: from_unixtime(unix_timestamp(created_at , 'dd-MM-yyyy HH-mm-ss')) but it didnot work and gives null

simran_k · ‎10-04-2016

PLease check the update. I think I didn't explain it properly

simran_k · ‎10-04-2016

PLease check the update. I think I didn't explain it properly

simran_k · ‎10-04-2016

I have a table `old_data` and a table `new_data`. I want to merge these table using so that `old_data` gets updated with `new_data` 1. Rows in old_data stay there 2. New rows in new_data get added to old_data 3. unique key is `id` so rows with `id` in `new_data` should update existing ones in `old_data` I think it is possible using join so I can do something like INSERT OVERWRITE old_data SELECT..Example: Table a: id count 1 2 2 19 3 4 Table b: id count 2 22 5 7 I need a SELECT statement that gives me id count 1 2 2 22 3 4 5 7

simran_k · ‎09-15-2016

I am aware of flume and Kafka but these are event driven tools. I don't need it to be event driven or real time but may be just schedule the import once in a day. What are the data ingestion tools available for importing data from API's in HDFS? I am not using HBase either but Hive. I have used `R` language for that for quite a time but I am looking for a more robust,may be native solution to Hadoop environment.

Online	Offline
Last Visited	‎05-04-2018 06:36 AM

Member Since	‎05-16-2016 08:37 AM
Last Visited	‎05-04-2018 06:36 AM
Posts	270
Kudos received	18

Cloudera Community

Re: Merge MapReduce job fails in oozie

Re: IllegalArgumentException and Illegal partition...

Re: only some sqoop jobs ask for password when run...

Re: Sqoop jobs ask for passward even when record p...

Re: How to remove header and footer from a CSV fil...

How to remove header and footer from a CSV file in...

Re: where to install hive pig oozie and ranger on ...

where to install hive pig oozie and ranger on a mu...

Minimum number of nodes to add in a multi-node clu...

convert ISO8601 date in d mm yyhh mm ss format in ...

Re: insert or update if exists in hive

Re: insert or update if exists in hive

insert or update if exists in hive

How to pull data from API and store it in HDFS