Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13482 | 02-20-2018 12:33 PM | |
1531 | 02-19-2018 05:12 AM | |
1888 | 12-28-2017 06:13 AM | |
7187 | 09-28-2017 09:25 AM | |
12229 | 09-25-2017 11:19 AM |
06-08-2017
07:43 AM
Thanks @Jay SenSharma Does worker node will also be available as a dedicated node like edge node? Also when a job is executed does all the intermediate staging data will be stored in worker node? How the worker node access data from data node? Forgive me if these are lame questions. Im trying to understand about worker nodes.
... View more
06-08-2017
06:57 AM
1 Kudo
What is worker node & edge nodes? Why are w using these nodes? What is role of these nodes? What role does it play when a job is executed?
... View more
Labels:
- Labels:
-
Apache Hadoop
05-23-2017
05:39 PM
@Prabhat Ratnala Ideally it should work. But what's the error message? I think the value substituted in the parameter are not properly assigned. try running the script using set -x so that it will be easy to debug. Even after running it if throw an error then paste the error message. It would be helpful to solve your issue.
... View more
05-17-2017
10:34 AM
Hi @saravanan p one way of doing it is modify the job to run it in single reducer so that the output will be a single file. Use this property to change the reducer to one. set mapred.reduce.tasks=1; By default the no of files inserted in a hive table depends on the size of a file, size of map job, size of reducer job. Based on the size no of files inserted in a hive table varies. max(mapreduce.input.fileinputformat.split.minsize, min(mapreduce.input.fileinputformat.split.maxsize, dfs.block.size)) If you have reducers running, then you should also look at hive.exec.max.created.files, mapred.reduce.tasks, hive.exec.reducers.bytes.per.reducer Hope it helps!
... View more
05-15-2017
05:01 PM
Hi @Andres Urrego Are you able to see any other files which starts with part_m_* in that folder? Those are the text files which you have imported from MySQL
... View more
05-14-2017
01:43 PM
Hi @Satish S In the output I could see "(,fname,mname,lname,age,gender,address,city,state,)" i believe these are the header of the file. The reason why e_id and zip and not present in the output is because you have declared e_id & zip as int which will not accept character. That's why its not displayed in the output. PigStorage doesn't know whether first row is header. If you are not handling it then by default it will considered as data rather than considering it as file header. Hope it helps.
... View more
05-12-2017
04:18 PM
Thanks a ton @mqureshi.
... View more
05-12-2017
03:44 PM
@Mohit Varshney I understand that when i ingest data through sqoop the no of mappers will decide the no of files ingested into a hive table. I just wanted to understand the same when im loading a data between hive tables. Thanks
... View more
05-12-2017
03:42 PM
Thanks @mqureshi. Just wanted to confirm if my understanding is correct. Consider the input file size is 300mb, dfs block size=256mb, mapreduce.input.fileinputformat.split.maxsize=256mb and mapreduce.input.fileinputformat.split.minsize=128mb. If a hive command is triggered to load from one hive table to another then 2 mapred jobs will be triggered and the after loading the hive table I should be able to see 2 files as only 2 mapred jobs are triggered. Adding one more question on top of it. What would be the maximum size of a file stored in a hive table. I believe it should be equal to the mapreduce job size. Please correct me if Im wrong. I have loaded the hive table and no of files underneath the table is 6. Out of 6, one file has the size of 10mb. I have one more set of file into a same table and it created 2 more files out of which one file has 20mb. Now if you see there are 2 files (10mb & 20mb) which is stored as blocks wasting almost the 100mb of the block. Is there a way that it can be clubbed and stored in one block? Thanks in advance for helping me out in understanding it.
... View more
05-12-2017
08:35 AM
Alternatively analyze table tbl_name compute statistics; will give you no of files, no of records and its size. Hope it helps.
... View more