Member since
03-25-2017
47
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1844 | 10-08-2018 06:21 PM | |
1745 | 09-17-2018 11:33 PM |
05-12-2020
11:59 AM
Hi, I am exploring atlas, I have gone through apache atlas documents but it is doesn't give internals of atlas and sequence flows. So is there any document which can be refered to learn all about atlas something like we use Hadoop definitive guide to learn about Hadoop. It wil be very helpful if anyone can give any lead on best document to refer inorder to learn atlas.
... View more
Labels:
10-14-2018
09:01 AM
Hi, You may try below work around 1) Generally operations team create a client system and allow access to production cluster from there rather giving access to datanode. So if it's just a client then you use the previous solution 2) if you really want to read data from cluster 1 in cluster 2 then you can try using namenode ip rather than nameservice hdfs dfs -ls hdfs://namenode-ip:port/
... View more
10-08-2018
06:21 PM
Thanks @bgooley I solved this by upgrading os and kerberos version. It works fine for me now. Thanks for your help
... View more
10-05-2018
12:57 PM
Even for me kinit is working and zookeeper and namenode start but datanode fails to connect namenode and then complete cluster comes down
... View more
10-05-2018
07:18 AM
I have done similar for myself . The steps you mentioned above looks fine.
... View more
10-05-2018
07:09 AM
After enabling kerberos datanode started failing to connect the namenode Error in datanode log: WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/hdp-3.com@CDH.HDP (auth:KERBEROS) cause:java.io.IOException: Couldn't setup connection for hdfs/hdp-3.com@CDH.HDP to hdp-1.com/192.1.1.1:8022 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Problem connecting to server: hdp-1.com/192.1.1.1:8022 WARN org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hdfs/hdp-3..com@CDH.HDP (auth:KERBEROS) cause:javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Ticket expired (32) - PROCESS_TGS)] WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool ID needed, but service not yet registered with NN, trace: java.lang.Exception krb.conf cat /etc/krb5.conf [libdefaults] default_realm = CDH.HDP dns_lookup_kdc = false dns_lookup_realm = false ticket_lifetime = 86400 renew_lifetime = 604800 forwardable = true default_tgs_enctypes = des-cbc-crc aes des-cbc-md5 arcfour-hmac rc4 default_tkt_enctypes = des-cbc-crc aes des-cbc-md5 arcfour-hmac rc4 permitted_enctypes = des-cbc-crc aes des-cbc-md5 arcfour-hmac rc4 udp_preference_limit = 1 kdc_timeout = 10000 [realms] CDH.HDP = { kdc = hdp-2.com admin_server = hdp-2.com default_domain = cdh.hdp } [domain_realm] cdh.hdp = CDH.HDP kdc.conf [kdcdefaults] kdc_ports = 88 kdc_tcp_ports = 88 [realms] CDH.HDP = { #master_key_type = aes256-cts acl_file = /var/kerberos/krb5kdc/kadm5.acl dict_file = /usr/share/dict/words admin_keytab = /var/kerberos/krb5kdc/kadm5.keytab supported_enctypes = aes128-cts:normal des3-hmac-sha1:normal arcfour-hmac:normal camellia256-cts:normal camellia128-cts:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal } Please help to resolve this.
... View more
09-18-2018
12:04 AM
I see below error in log: java.lang.OutOfMemoryError: Java heap space So i would like to know the heap memory you have allocated right now? Can you try increasing heap size of datanode.
... View more
09-17-2018
11:55 PM
Decommision 1 datanode at a time so that the data get replicated to another node and then go for the next. Make sure there is no underreplicated data in namenodeUI before you decommision another datanode. Please accept this as sollution if it resolve your issue
... View more
09-17-2018
11:49 PM
I suggest, you can create two linux user account for cluster1 and cluster2 respectively and configure .bashrc. For example: Create two user account produser(prod) and druser(dr). Create 2 directory of hdfs config dir "/mnt/hadoopprod/conf" and " /mnt/hadoopdr/conf " Configure hadoop home directory for each user in ~/.bashrc file Switch user and use the cluster 🙂
... View more
09-17-2018
11:33 PM
Hi, Thanks for your response and help. Everytime i make changes in configs it re-deploy the configurations which was deleting my topology script. So i pushed my script to /mnt/topology/ directory and also tweak the script a bit It look like below now topology.sh #!/bin/bash while [ $# -gt 0 ]; do nodearg=$1 #get the first argument for line in `cat /mnt/topology/topology.data`; do #read line from topology.data file node=$(echo $line|awk -F ',' '{print $1}') #parse the data and get the hostname to compare result="" if [ $nodearg = $node ]; then #compare the hostname in the file with the argument result=$(echo $line|awk -F ',' '{print $2}') #parse the file again to recive the rack details for the host break; else result="/default/rack-0" fi done shift echo $result done
... View more
09-17-2018
09:23 AM
So, in that case it will satisfy the first if condition. Do you know how hadoop invoke topology script? I mean the parameters it passes along with script file.
... View more
09-17-2018
05:30 AM
Hi,
I have written my own topology script and made required configuration in cloudera manager>hdfs>configurations>net.topology.script.file.name property. But the rack topology is not updated and could see ERROR in namenode log as "script /etc/hadoop/conf/topology.sh returned 0 values when 1 were expected.". Please help to resolve the issue.
topology.sh
#!/bin/bash
nodearg=$1 #get the first argument
while [ $# -gt 0 ]; do
for line in `cat topology.data`; do #read line from topology.data file
node=$(echo $line|awk -F ',' '{print $1}') #parse the data and get the hostname to compare
result=""
if [ $nodearg = $node ]; then #compare the hostname in the file with the argument
result=$(echo $line|awk -F ',' '{print $2}') #parse the file again to recive the rack details for the host
break;
else
result="/default/rack-0"
fi
done
shift
echo $result
done
topology.data
hdp-1.hdp.com,/default/rack-1
hdp-2.hdp.com,/default/rack-2
hdp-3.hdp.com,/default/rack-3
19.1.0.13,/default/rack-1
19.1.0.14,/default/rack-2
19.1.0.15,/default/rack-3
Output:
$ ./topology.sh hdp-1. hdp.com
/default/rack-1
$ ./topology.sh 19.1.0.14
/default/rack-2
Thanks and regards
Sidharth
... View more
Labels:
02-21-2018
03:51 PM
Thanks alot. So what approach and configurations you would propose when the requirement is not to disclose direct Kafka broker IP to external users to produce/consume from kerberos Kafka cluster Thanks Sidharth
... View more
02-20-2018
11:08 AM
Thanks for your response. Got much better understanding now. However, I would like to say I have created a principal something like " centos@CLOUDERA.COM" and the same is working when I am trying to connect kerberized broker directly without nginx. But the same doesn't work with nginx. Thanks Sidharth
... View more
02-19-2018
09:10 PM
Thank you! I have two Kafka cluster now with one kerberos and another without kerberos Tried 3 approach Approach 1- Produce and consume data with Kerberos Kafka and without nginx proxy, work fine and able to consume and publish data in Kafka topics. Approach 2- Produce and consume data with Kerberos Kafka and with nginx proxy, doesn't work and get timeout error in producer and consumer unable to get data from Kafka topics Approach 3- Produce and consume data without kerberos and with/without nginx proxy, work fine and able to consume and publish data from/to Kafka topics. For security concern we can't proceed no kerberos setup and stuck with nginx issue. Please help my Kafka version is 0.10.1.
... View more
02-19-2018
10:23 AM
Also the provided configuration is deprecated. So could you please help me with the working one for latest stable version.
... View more
02-19-2018
10:09 AM
Thank you for your help. For me it's like I am connecting to the same server . For example- I am running Kafka producer from server1.abc.com to broker running on same node server1.abc.com without any proxy work fine and can produce but if the same I try with proxy doesn't work and goes timeout but I can see proxy server entry at targeted broker log and also in tcpdump
... View more
02-17-2018
04:31 AM
Can anyone please reply
... View more
02-16-2018
11:37 AM
Hi, I have a usecase where i want my producers/consumers to connect to my kerberized kafka cluster/broker only through nginx server to hide the broker IP and also to perform load balancing. I have setup kerberos kafka cluster and able to consume and produce data without nginx. But when i try to consume/produce the data from/to kafka,it fails with timeout error.After enabling DEBUG in kakfa server.log i can see nginx server ip but couldnot find any error which could help me to find out the cause of failure. This Warm regards Sidharth
... View more
07-05-2017
12:26 PM
my questions are , why does it requires repository location when I select packages. Why it try to install again all components when I have already installed the required one.
... View more
07-05-2017
11:23 AM
Hi, Thanks. I resolved those issue I still want to know the answer of my questions. Regards Sidharth
... View more
07-04-2017
12:54 PM
I have enabled kerberos. I have installed informatica on Aix.Now, I am trying to connect informatica with impala using jdbc connection but it's not able to read the ticket from cache.
... View more
07-04-2017
08:32 AM
Thanks, the requirement is like there should be gurantee of processing of each single even at input rate of 1000+ events per second. And this could happen that sometime we get a single event in a second and then suddenly a increase of 1000+ events per second as well. So what would you tell about *nifi-> kafka -> storm *combination. We cannot use spark streaming in this case because even after implementing check pointing in spark streaming, if spark streaming is processing a batch and then while processing, if that complete job fails. Then those data which were under process may get lost. So, we chose for kafka for gurantee of no data loss to occur. Thanks Sidharth
... View more
07-03-2017
07:56 AM
Also I would like to know if I install storm on my existing cloudera cluster,will I be able to monitor it,how?
... View more
07-03-2017
07:00 AM
Hi, Yes as you said SS is a micro batch processing,we cannot say SS as actual real time processing. Just for an example, if I am hosting an application and millions of users loging and I have to trace the intruders or some un authorized activity on the go and stop it. We cannot use rely on batch or even micro-batch processing. It should be complete per event real time processing. So, is there any component provided by cloudera which can do real time processing like storm or heron? In past I had experience, where we were having Flume running in production smoothly to store raw protobuf data into hbase and then process it by running mapreduce job. As it was taking long duration to complete the job, stakeholders decided to go with spark. In first attempt , developers added processing logic to transform raw data from protobuf to readable and then store it into hdfs as parquet file using spark streaming. We applied multiple suggestions and attribute to make it run but never able to survive the production like back pressure even after assigning it memory 3 times more than flume. In second attempt, transformation logic was removed and tried only to store raw protobuf data into parquet files but still it dint able to perform like flume and always had pending batches in queue due to which it was failing everytime and atleast had to give up spark due to in capability of spark handling back pressure. Thanks Sidharth
... View more
07-03-2017
06:09 AM
Thanks for your response. But Streaming is still a batch processing and it pulls data in batches and execute it. And still spark streaming have issues and not stable like flume for production. Please correct me if iam wrong.
... View more
07-03-2017
05:58 AM
Hi, I want to know why cloudera dont have STORM as a part of CDH where as Hortonworks have it? Does Cloudera have any other real time proccesing component which can replace it? Thanks Sidharth
... View more
Labels:
07-02-2017
09:09 AM
Hi,
I have a requirement where I have all transactional data injestion into hadoop in real time and before storing the data into hadoop, process it to validate the data. If the data failed to pass validation process , it will not be stored into hadoop. The validation process also make use of historical data which is stored in hadoop. I am thinking to make Nifi --> Kafka --> storm model for real time processing and then storing into HBase.So, can you suggest any better model for this use case and also I would like to know best open source reporting tools available.
Any suggestions will be a great help for me.
Warm Regards Sidharth Kumar
... View more
05-29-2017
01:49 AM
Thanks for your help. I am unable to get that document .Can you kindly provide any links to download it. Regards Sidharth
... View more