Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13344 | 02-20-2018 12:33 PM | |
1501 | 02-19-2018 05:12 AM | |
1859 | 12-28-2017 06:13 AM | |
7136 | 09-28-2017 09:25 AM | |
12164 | 09-25-2017 11:19 AM |
03-11-2017
09:30 AM
1 Kudo
I have created a table create table data ( id int,name string, addreess struc<city:string,State:string>) rows format delimited fields terminated by ',' collections terminated by ',' stored as 'TEXTFILE'; If I use insert into table data select 1,'Bala',named_struct('city','Tampa','state','FL') from anothertable limit 1; Is this correct way to insert a record into hive table. Because when I do that address.state is inserted as null
... View more
Labels:
- Labels:
-
Apache Hadoop
03-08-2017
08:50 AM
I have a table with 3 columns ( customer id, amount, quantity) which has 1 billion of records. My query is select cusomer id, sum(amount) from my_table group by customer_id. I wanted to understand how the mappers and reducers picks the data.
I assume mappers will pick all the keys in its corresponding mappers. After that shuffler will group all the same set of keys. Then reducers will perform the sum operations and delivers it. Please correct me if im wrong here. Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? If the same key has huge volume which cant be handled by a single reducer can it be split into multiple reducers?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
03-06-2017
11:41 AM
Thanks Predrag Minovic. One more question is there way to check the what are the queries executed through ambari or admin console?
... View more
03-04-2017
01:38 PM
Im new to pig script to pardon me if my question is lame. I know that we can define a datatype for each atom in pig while loading it from a file. But is there a way we can define the datatype after taking a subset of it? Example: data = load 'mydata.csv' using PigStrogae(',') AS (col1:int, col2:int); subsetdata = foreach data generate col1; --> Here i need to define the col1 as int . Is there a way to feed it?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
03-03-2017
11:27 AM
Thanks Venkata Naga Balarama Murthy Pelluri. It gave clear understanding about the functionalities what I have expected. Yet I have few other questions? If my file size is one TB values falling under the same key has large no of values. In such condition one reducer will be loaded heavily? Is there a way that this work can be splitted across?
... View more
03-03-2017
09:21 AM
On what basis does a key value pair is generated? How to decide the no of mappers and its associated reducers?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
03-03-2017
09:06 AM
Is there anyway to find out how long does a hive query take to get completed?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
02-14-2017
09:27 PM
Thanks @Binu Mathew. I would like to know one additional information from the point 5. I understand that ORC will read the specific column if we specify the column names in the select clause. But how does the text file work? . Like if i specify the column name in the select clause will it read only the specify columns or will it read the entire record but displays only the selected columns.
... View more
02-14-2017
09:25 AM
2 Kudos
What are the differences between mr mode and tez. Why tez functions faster than mr? why few queries fail in tez but gets executed successfully in mr mode? I know the question I ask have multiple answers but I just wanted to know all the possible scenarios. Thanks
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Tez