About balavignesh_nag

balavignesh_nag · ‎06-08-2017

Thanks @Jay SenSharma Does worker node will also be available as a dedicated node like edge node? Also when a job is executed does all the intermediate staging data will be stored in worker node? How the worker node access data from data node? Forgive me if these are lame questions. Im trying to understand about worker nodes.

balavignesh_nag · ‎06-08-2017

What is worker node & edge nodes? Why are w using these nodes? What is role of these nodes? What role does it play when a job is executed?

balavignesh_nag · ‎05-23-2017

@Prabhat Ratnala Ideally it should work. But what's the error message? I think the value substituted in the parameter are not properly assigned. try running the script using set -x so that it will be easy to debug. Even after running it if throw an error then paste the error message. It would be helpful to solve your issue.

balavignesh_nag · ‎05-17-2017

Hi @saravanan p one way of doing it is modify the job to run it in single reducer so that the output will be a single file. Use this property to change the reducer to one. set mapred.reduce.tasks=1; By default the no of files inserted in a hive table depends on the size of a file, size of map job, size of reducer job. Based on the size no of files inserted in a hive table varies. max(mapreduce.input.fileinputformat.split.minsize, min(mapreduce.input.fileinputformat.split.maxsize, dfs.block.size)) If you have reducers running, then you should also look at hive.exec.max.created.files, mapred.reduce.tasks, hive.exec.reducers.bytes.per.reducer Hope it helps!

balavignesh_nag · ‎05-15-2017

Hi @Andres Urrego Are you able to see any other files which starts with part_m_* in that folder? Those are the text files which you have imported from MySQL

balavignesh_nag · ‎05-14-2017

Hi @Satish S In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.

balavignesh_nag · ‎05-12-2017

Thanks a ton @mqureshi.

balavignesh_nag · ‎05-12-2017

@Mohit Varshney I understand that when i ingest data through sqoop the no of mappers will decide the no of files ingested into a hive table. I just wanted to understand the same when im loading a data between hive tables. Thanks

balavignesh_nag · ‎05-12-2017

Thanks @mqureshi. Just wanted to confirm if my understanding is correct. Consider the input file size is 300mb, dfs block size=256mb, mapreduce.input.fileinputformat.split.maxsize=256mb and mapreduce.input.fileinputformat.split.minsize=128mb. If a hive command is triggered to load from one hive table to another then 2 mapred jobs will be triggered and the after loading the hive table I should be able to see 2 files as only 2 mapred jobs are triggered. Adding one more question on top of it. What would be the maximum size of a file stored in a hive table. I believe it should be equal to the mapreduce job size. Please correct me if Im wrong. I have loaded the hive table and no of files underneath the table is 6. Out of 6, one file has the size of 10mb. I have one more set of file into a same table and it created 2 more files out of which one file has 20mb. Now if you see there are 2 files (10mb & 20mb) which is stored as blocks wasting almost the 100mb of the block. Is there a way that it can be clubbed and stored in one block? Thanks in advance for helping me out in understanding it.

balavignesh_nag · ‎05-12-2017

Alternatively analyze table tbl_name compute statistics; will give you no of files, no of records and its size. Hope it helps.

Online	Offline
Last Visited	‎10-03-2019 09:01 AM

Member Since	‎05-02-2017 01:47 PM
Last Visited	‎10-03-2019 09:01 AM
Posts	360
Kudos received	64

Cloudera Community

Re: what is the best way to get ftp file to hdfs c...

Re: when yarn communicates with the namenodes when...

Re: [TEZ] are partition, sort and shuffle built-in...

Re: CASE statement Error in Beeline HIVE

Re: hive query to display Week of the timestamp an...

Re: What are worker and Edge nodes?

What are worker and Edge nodes?

Re: Creating hive table with columns from a text f...

Re: How to merge multiple part files while creati...

Re: Import Sqoop as textfile

Re: Pig --> Header first column is not showing

Re: How Files loaded through a Hive table can be d...

Re: How Files loaded through a Hive table can be d...

Re: How Files loaded through a Hive table can be d...

Re: table row count from hive metastore