About balavignesh_nag

balavignesh_nag · ‎03-11-2017

I have created a table create table data ( id int,name string, addreess struc<city:string,State:string>) rows format delimited fields terminated by ',' collections terminated by ',' stored as 'TEXTFILE'; If I use insert into table data select 1,'Bala',named_struct('city','Tampa','state','FL') from anothertable limit 1; Is this correct way to insert a record into hive table. Because when I do that address.state is inserted as null

balavignesh_nag · ‎03-08-2017

I have a table with 3 columns ( customer id, amount, quantity) which has 1 billion of records. My query is select cusomer id, sum(amount) from my_table group by customer_id. I wanted to understand how the mappers and reducers picks the data. I assume mappers will pick all the keys in its corresponding mappers. After that shuffler will group all the same set of keys. Then reducers will perform the sum operations and delivers it. Please correct me if im wrong here. Also When the reducers performs the sum operation will each reducers works on its key value pair which was feed from its mapper job? If the same key has huge volume which cant be handled by a single reducer can it be split into multiple reducers?

balavignesh_nag · ‎03-06-2017

Thanks Predrag Minovic. One more question is there way to check the what are the queries executed through ambari or admin console?

balavignesh_nag · ‎03-04-2017

Im new to pig script to pardon me if my question is lame. I know that we can define a datatype for each atom in pig while loading it from a file. But is there a way we can define the datatype after taking a subset of it? Example: data = load 'mydata.csv' using PigStrogae(',') AS (col1:int, col2:int); subsetdata = foreach data generate col1; --> Here i need to define the col1 as int . Is there a way to feed it?

balavignesh_nag · ‎03-03-2017

Thanks Venkata Naga Balarama Murthy Pelluri. It gave clear understanding about the functionalities what I have expected. Yet I have few other questions? If my file size is one TB values falling under the same key has large no of values. In such condition one reducer will be loaded heavily? Is there a way that this work can be splitted across?

balavignesh_nag · ‎03-03-2017

On what basis does a key value pair is generated? How to decide the no of mappers and its associated reducers?

balavignesh_nag · ‎03-03-2017

Is there anyway to find out how long does a hive query take to get completed?

balavignesh_nag · ‎02-15-2017

spot on @mqureshi. Thanks it helps !!

balavignesh_nag · ‎02-14-2017

Thanks @Binu Mathew. I would like to know one additional information from the point 5. I understand that ORC will read the specific column if we specify the column names in the select clause. But how does the text file work? . Like if i specify the column name in the select clause will it read only the specify columns or will it read the entire record but displays only the selected columns.

balavignesh_nag · ‎02-14-2017

What are the differences between mr mode and tez. Why tez functions faster than mr? why few queries fail in tez but gets executed successfully in mr mode? I know the question I ask have multiple answers but I just wanted to know all the possible scenarios. Thanks

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Struct datatype insert in hive

Distribution of key,value in mappers and Reducers

Re: Estimate run time of hive query

defining datatype in Pig

Re: Generation of Key value pair

Generation of Key value pair

Estimate run time of hive query

Re: Difference between mr and Tez?

Re: Will there be any performance issues if we sel...

Difference between mr and Tez?