Member since
05-16-2016
270
Posts
18
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1745 | 07-23-2016 11:36 AM | |
3108 | 07-23-2016 11:35 AM | |
1580 | 06-05-2016 10:41 AM | |
1169 | 06-05-2016 10:37 AM |
10-16-2016
03:17 PM
That helps. What about the footer? Yes, headers and footers are static. @grajagopal
... View more
10-16-2016
03:00 PM
1 Kudo
I have a CSV file that looks like this:
Report Name: XYZ
Report Time: 11/11/1111
Time Zone: (GMT+05:30) i
Last Completed
Last Completed Available Hour:
Report Aggregation: Daily
Report Filter:
Potential Incomplete Data: true
Rows: 1
GregorianDate
AccountId
AccountName
Clicks
Impressions
Ctr
AverageCpc
Spend
10/15/2016
1234556
ABC
©2016 Microsoft Corporation. All rights reserved.
I need all header and footer taken off and only the actual data with column names to stay in this file. How do I do it in Pig?I would need this to be mapped to a Hive table so cannot have it this way.
... View more
Labels:
- Labels:
-
Apache Pig
10-11-2016
03:03 PM
It asks for oozie and hive to be installed on master nodes, why is that? Should it not be installed on edge/client nodes?
... View more
10-11-2016
01:46 PM
I read http://stackoverflow.com/questions/8456141/in-a-hadoop-cluster-should-hive-be-installed-on-all-nodes and it says hive should be installed on client machines. Alright. But on Ambari I install Hcatalog and Hive service right? Where should these services stay? What does it mean to install hive on client machine if the service would be installed on cluster too? Or is it like I do not need to install Hive service in Ambari? From what I understand, to add a client machine, only etc/hosts file is edited to declare it as a host and that's it. Or do I add it as a host in ambari and also move Hive service to the client machine? Also, in that case pig should be on client too? What about oozie and Ranger? Where should these too be installed?
... View more
Labels:
10-11-2016
09:16 AM
1 Kudo
I have 1. Hive 2. Pig 3. Zookeeper 4.HDFS 5. Hue 6. Oozie 7. Sqoop 8. Yarn 9. Ranger Currently, all of these are deployed on the same host. Now, I would like to add more hosts to it. But I have a few doubts: In production, 1. a node means a server, right? No VM'S ? 2. How many servers I would need to add to have a healthy cluster 3. Which of the above mentioned services should be co-located? 4. What should be the distribution like? Pig is relatively used less but sqoop, Hive , Oozie and Hue most of the times and ofcourse Ranger for authorization part. What should be the distribution like? Which of these services should be moved to new hosts?Which of these should be co located? Which of these should have entirely dedicated server to them? I am new to it and would appreciate if you could give the specifications to establishing a multi-node cluster .
... View more
10-10-2016
10:07 AM
My date looks like this: 2016-09-13T06:03:51Z I need to convert it in dd-mm-YYYY hh:mm:ss format. HOw do I do it? I tried: from_unixtime(unix_timestamp(created_at , 'dd-MM-yyyy HH-mm-ss')) but it didnot work and gives null
... View more
Labels:
- Labels:
-
Apache Hive
10-04-2016
09:02 AM
1 Kudo
I have a table `old_data` and a table `new_data`. I want to merge these table using so that `old_data` gets updated with `new_data`
1. Rows in old_data stay there
2. New rows in new_data get added to old_data
3. unique key is `id` so rows with `id` in `new_data` should update existing ones in `old_data`
I think it is possible using join so I can do something like INSERT OVERWRITE old_data
SELECT..Example: Table a:
id count
1 2
2 19
3 4
Table b: id count
2 22
5 7 I need a SELECT statement that gives me id count
1 2
2 22
3 4
5 7
... View more
Labels:
- Labels:
-
Apache Hive
09-15-2016
12:29 PM
2 Kudos
I am aware of flume and Kafka but these are event driven tools. I don't need it to be event driven or real time but may be just schedule the import once in a day. What are the data ingestion tools available for importing data from API's in HDFS? I am not using HBase either but Hive.
I have used `R` language for that for quite a time but I am looking for a more robust,may be native solution to Hadoop environment.
... View more
Labels:
- Labels:
-
Apache Hadoop