Member since
08-24-2017
24
Posts
2
Kudos Received
0
Solutions
02-17-2020
07:28 AM
1 Kudo
Hi,
I just want to know that whether we can integrate the hive tables with delta lake?
If yes then how?
Delta tables support all features of Hive?
Cloudera support delta lake?
Regards,
Satya
... View more
- Tags:
- delta lake
- Hive
Labels:
02-10-2020
07:29 AM
Hi,
Can anyone suggest if we will implement hadoop in two different data center with same network Then it will impact the performance or not?
We are distributing the master nodes and data nodes in two different data center to overcome the down time.
However both the data center in same network, so it will impact the performance or not?
Satya
... View more
Labels:
02-10-2020
03:33 AM
Hi Team,
We are using HDP 3.1.0 and spark2.
Is there anyway to identify at cluster side about the spark job that particular job is using datasets or dataframe?
Regards,
Satya
... View more
11-23-2018
06:48 AM
Hi, We are trying to install R in our prod cluster where no accessibility of the internet. R is working locally, we have installed it in one of our node using anaconda. It's working fine. But don't know how to install Sparkr. we didn't get any package for sparkr. We have downloaded the some basic R package from the url https://repo.continuum.io/pkgs/r/linux-64/ So could you please let me know the exact installation guide of Sparkr on offline cluster and integrate the same with Zeppelin. Regards, Satya
... View more
Labels:
03-02-2018
04:16 AM
@Pranay Vyas So if we have 400GB yarn memory and only 10 core of cpu. Then can we run 400 containers simultaneously in a cluster?using minimum.allocation = 1GB
... View more
02-17-2018
09:38 AM
can we run more no of containers than the available cpu core? if yes then how can we achieve it? As per my understanding we can run max no of containers as per the value configured in yarn i.e yarn.scheduler.minimum allocation and yarn.scheduler.maximum allocation. But , suppose in my cluster total cpu core or vcore is 100 and i want to run the 500 containers simultaneously. Then how can i achieve the same?
... View more
11-28-2017
05:43 AM
@aengineer Hi, Thanks for your response, but i don't have GUI(Ambari). However this cluster is HA cluster. So let me know the proper steps to do it. Can we do it as a rolling restart by login in each data node one by one. Regards, Satya Gaurav
... View more
11-27-2017
08:15 AM
Hi Team, I got the below error in GC log. Full GC (Allocation Failure
CMS-concurrent-mark-start
CMS-concurrent-abortable-preclean-start) Current heap utilization is below. Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 12884901888 (12288.0MB)
NewSize = 1610612736 (1536.0MB)
MaxNewSize = 1610612736 (1536.0MB)
OldSize = 11274289152 (10752.0MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 1449590784 (1382.4375MB)
used = 1449590784 (1382.4375MB)
free = 0 (0.0MB)
100.0% used
Eden Space:=========================================>>>new object space.
capacity = 1288568832 (1228.875MB)
used = 1288568832 (1228.875MB)
free = 0 (0.0MB)
100.0% used
From Space:
capacity = 161021952 (153.5625MB)
used = 161021952 (153.5625MB)
free = 0 (0.0MB)
100.0% used
To Space:
capacity = 161021952 (153.5625MB)
used = 0 (0.0MB)
free = 161021952 (153.5625MB)
0.0% used
concurrent mark-sweep generation:
capacity = 11274289152 (10752.0MB)
used = 11274289152 (10752.0MB)
free = 0 (0.0MB)
100.0% used So My questions are below. 1.We need to increase the heap size using command line (as GUI is not here) for data node and our cluster using hadoop version is hadoop-common 2.4.0.2.1.2.0-402. 2. So if we change the parameter in hadoop-env.sh on name node will it reflect to all nodes or we have to do it manually in all data nodes. 3.whether we need down time or we can change it without stoping any services. just need to run sh hadoop-env.sh 4.If we need to stop the services then please let us know the services name i.e which services need to stop. Kindly give me the proper steps to do it using command line and also how to cross verify it. Regards, Satya Gaurav
... View more
08-24-2017
06:15 AM
Hi Team, We are enabling kerberos for lily indexer in cloudera 5.9 but have confiusion in one of the steps i.e on jaas.conf file whether we need to chnage the parameter or not in this file. As we have already hbase.keytab generated by GUI. As cloudera is using the keytab from /var/run/cloudera-scm-agent/process so whether we need to giv the keytab file location or not? if yes then how to edit this file on which server i.e on hbase master instead of command line can we edit using GUI?? can you give the steps how to edit this file using gui. current conf Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=false----------after enabling kerberos should we give the path of keytab?? or using cache it should work? useTicketCache=true; }; as per cloudera doc Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=false keyTab="/etc/hbase/conf/hbase.keytab" principal="hbase/fully.qualified.domain.name@<YOUR-REALM>"; }
... View more
Labels:
03-28-2017
02:24 PM
@ Benjamin Leonhardi Why sorting is written before shuffling? I think sorting always happen after the shuffling. As there is already combiner to combine(sort) the output on single node. I think when all intermediated data collected using shuffling then sorting is use to make one single input file, which will use by reducer.
... View more
03-02-2017
03:46 PM
Hi, I want to know that like in mysql replication can we have any option in hive for incremental backup where we no need to the manual task again and again like creating base table and incremental table then reconcile it.
... View more
Labels:
02-28-2017
06:35 AM
@Artem Ervits Hi Artem, Thanks for your reply. I did the same thing and I am able to get back the data. The most surprising thing I got, I have created 2-3 tables even with different schema it's showing the same data whatever the data was in an old table for extra column it's showing the null. So every time when we have to create an external table we should give the different directory path?
... View more
02-27-2017
01:43 PM
suppose I have dropped an external table(EMP) the table was stored at /user/hive/satya/. As we know the metadata will be deleted if we will drop the external table and actual data will be there. So my Question is that how we can restore the external table(EMP) how we will get the data. would anyone give me the steps need to perform to get the data?
... View more
- Tags:
- Data Processing
- Hive
Labels:
02-22-2017
01:47 PM
suppose if we want to run 1000 map task then we need 1000 container or we can run the map task less than 1000?
... View more
02-22-2017
01:45 PM
1. How multiple reducers writing the output ? can multiple reducers write the output in a single output file? Or we have to write an intermediate reducer to do so? i just want to know that how we can get a single output file from multiple reducers. 2. can map and reduce task run in the same container or more than one map task or reduce task can run in the same container? if yes then how? How container is assigning for map and reduce task?
... View more
Labels:
02-21-2017
07:11 AM
@mqureshi, Thanks a lot! for your explanation. I am a little bit confuse on the no of map task and no reduce task logic and resource management in hadoop. As you have written no of reducer can we determined by mapreduce.job.reduces but if we have given more no of reducer then also job will run and resource mangaer will check the resource availability if the resource is available then job will run as no of requested reducers. am i correct? This configuration parameter is just a recommendation for yarn.finall resource manager will take the decision with reference of the available resource. The most worrying thing how programmer used to decide how many no of reducer they need to proceed the file. whether they have to calculate it every time before job submission?
... View more
02-21-2017
05:33 AM
Hi, I know the no of map task is basically determined by no of input files and no of map-splits of these input files. So if we want to process the 200 files with same block size or even more so we need 200 map-task to process these files and in the case of 1k files we need 1k map task. How we set the number of reducer for these files apart from setReducenum() or mapreduce.job.task configuration is there any algorithm or logic like hashkey to get the no of reducer. Secondary ,i want to know that how no of container and require resource is requested by AM to resource manager. suppose if there is 2gb ram is available in a nodemanager and we submitted a job with 3gb ram then how job will run or it will not run. If you can give the exact flow with logic from map task to reduce task and till the container assignment then it will be really helpful for me.
... View more
Labels:
02-17-2017
02:18 PM
Hi Artem, Thanks a lot for your explanation. So ideally, when we want to run a spark cluster using yarn then we no need to configure spark master?
... View more
02-17-2017
09:37 AM
I want to know that where we can run the job without spark master server or not? As we can integrate it with yarn resource manager so in this case what will be the use spark master? could you give the exact flow diagram of job submission using spark master and yarn?
... View more
Labels:
02-10-2017
11:19 AM
@abokor, Hi, Thanks man! My question was suppose there is a file with the name of A.txt and it has created before 10 years and there is no change from last 10 years in this file. Now if any client wants to access the same file. so from where it will get the metadata (block location, permission, name,etc) from RAM only(since NN store all metadata in RAM)? or somewhere else ?
... View more
02-10-2017
06:17 AM
@abokor, Q1- I have seen 2 fsimage always exist on SN. It stores two old FSimage for backup and rollback purpose but i don't know why it stores so many edits log. Once you will check the location of edits log you will get so many edits log. I don't know what's the use of these many edits log.
... View more
02-10-2017
06:11 AM
Hi Abokor, Thanks a lot for your reply, but still I have one query as per your explanation edits log and fsimage store changes only from the last checkpoint.So if any user wants to get the file information or any data which is 10 years old then how it will get the metadata information for that particular file or data? It will access the metadata (block location,etc) only from RAM or it will use some other process?
... View more
02-08-2017
01:21 PM
1 Kudo
I have a confusion on the actual use of fsimage. can any explain on my below queries? it will be really helpful for me. In so many articles it is written there fsimage and edits log use only at the time of namenode restart, but my question is suppose the namenode was running from last 10 years then namonde store all the changes (Including block location) in metadata happened from last 10 years in RAM only? it will not a size or performance issue for namenode? or namenode store the metadata as per new fsimage. how namenode using the new fsimage which is received by SN.?
... View more
Labels:
02-08-2017
01:07 PM
I have confusion on the actual use of fsimage and edits log. can any explain on my below queries it will be really helpful for me. Q1. why there are so many fsimage and edits log in namenode? what's the use of these many fsimage and edits log? if there is already schedule for checkpointing? Q2 what's happening with old fsimage and edits log after checkpointing. when sn ( secondary NN) sends a new fsimage to namenode then whats the use of this new fsimage in name node. is namenode store the location of block as per the new fsimage (whatever the blocks name in new fsimage) Q3 if namenode is running from last 4 years then it will store all the changes or metadata in RAM (or is there any logic to store old block location in some other place)? from the first of the year? Q4 is namenode updating the metadata in RAM for every second? or is there any time peroid?
... View more
Labels: