About Harsh J

jmillette · ‎11-10-2015

Many thanks! I will perform this operation.

Jeff P · ‎11-09-2015

Thanks. That was my issue. The nodes are balanced in terms of DFS Used%, even though the amount raw bytes are varied.

Harsh J · ‎11-08-2015

If by 'hard to analyse' you mean to parse/process it, you can consider using the Java API to fetch block location info too: http://archive.cloudera.com/cdh5/cdh/5/hadoop/api/org/apache/hadoop/fs/FileSystem.html#getFileBlockLocations(org.apache.hadoop.fs.Path,%20long,%20long)

Harsh J · ‎11-08-2015

Your understanding seems right, but note that none of the 'splitting' is automatic. At its simplest form, federation is a way to have multiple distinct NameNodes powered by a common set of DataNodes. Effectively, its running and managing 2 or more *separate* namespaces on top of the same storage space. If you deploy two federated NameNodes, say hdfs://host-nn1/ and hdfs://host-nn2, then they will have nothing in common except the Live DN hostnames they share. A 'hadoop fs -ls' done on each will return absolutely independent results.

omaritec · ‎11-04-2015

Thanks, will let you know. Bye

dape · ‎10-26-2015

Finally, I get it done, so ,I post my steps ,maybe it will be helpful for someone like me who happen to have the same problem. I use cdh5.3.1,the main steps is : At first ,I recreate a new cdh manager, and reconfigure all parameters and roles in this new cdh manager, t hen add all process_id in processes table in scm db, and then modify /etc/cloudera-scm-agent/config.ini server_host to this new manager and restart all agent . At first, we should backup ,prepare for the worst. 1 cdh provided two ways to backup,backup database or backup config to a json file 1.1 backup database : http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-2-x/topics/cm_ag_backup_dbs.html for example: backup : pg_dump -h localhost -p 7432 -U scm -W -F c -b -v -f "scm_db.db" scm restore: pg_restore -p 7432 -U scm -W -d scm -v scm_db.db 1.2 write config to a json file http://www.cloudera.com/content/www/en-us/documentation/enterprise/5-3-x/topics/cm_intro_api.html#xd_583c10bfdbd326ba--7f25092b-13fba2465e5--7f20 for example: export: curl -u admin:admin "http://localhost:8888/api/v9/cm/deployment" > ~/cmf_config.json import: curl --upload-file ～/cmf_config.json -u admin:admin http://localhost:8888/api/v9/cm/deployment?deleteCurrentDeployment=true If we did not backup and lost our database,well,it is hard to restore, however ,it can be done. The following is my steps: 1 Reinstall a new cdh manager on another machine with different host name. 2 Export a json configuration file from another currently working cdh manager with same cdh manager version. This cdh manager should include all service(such as hdfs,yarn,hbase,hdfs HA ,etc). 3 Run a script on all machine to get hostid and hostname, the hostid is in a file: /var/lib/cloudera-scm-agent/uuid 4 Modify the json file from step 2,do the following: 4.1 Delele all hosts in this json file and add all hosts's hostid and hostname from step 3. 4.2 Delete all roles in clusters's services. 4.3 Modify cluster name to old cluster id. you can get the old cluster id from hdfs namenode http webpage if you do not remember you old cluster id. 5 Import the new json file into the new create cdh manager. 6 Reconfigure all service's parameter and add all instance to service as before.(you should not use host template cause the agent is not report this new cmf server,however, you can add instances from service ). 7 If you have hdfs HA enabled, you have to export json file and add hdfs ha roles in that file, and import to it again. 8 If you do not want to stop all the services, you have to do the following steps to get all process_id from all hosts, however ,if you can stop services, you can jump to step 11; 9 Run the following script on all hosts to get process_id and services name. grep "spawned:.*with pid" /var/log/cloudera-scm-agent/supervisord.log |awk -vhost=$HOSTNAME '{ idx=index($5,"-"); name=substr($5,idx+1,length($5)-idx-1);pid=substr($5,2,idx-2);cc[name]=pid;}END{a=host;for (b in cc){ a=a"\t"b"\t"cc[b]} print a}' 10 Parese that file and insert a record in scm database' processes table , before we insert to processes table, you have to insert another record in commands table to get a new command_id .( this step is hard). 11 Modify all /etc/cloudera-scm-agent/config.ini server_host to new cdh manager , kill cmf listener and restart all cmf agent. Everything should be fine , the agent will report to new cdh server ,and the cdh server will return the same process_id to agent, so the running process will not be killed.

asingh18 · ‎10-15-2015

Thanks for valuable information . I have fixed the above issue by changing the MapReduce Service property value in hive configuration file to Yarn.

pradyot · ‎10-14-2015

I am able to connect to IBM MQ using the steps mentioned here. But when Flume is trying to consume any messages from the Q, its throwing following exception. com.ibm.msg.client.jms.DetailedMessageFormatException: JMSCC0053: An exception occurred deserializing a message, exception: 'java.lang.ClassNotFoundException: null class'. It was not possible to deserialize the message because of the exception shown. 1) I am using all the ibm mq client jars. Flume is starting with out any exception. But exception is coming when trying to consume the messages . 2) I am putting a custom message [Serializable object] into Q which Flume need to consume. 3) Flume 1.5.0-cdh5.4.1 4) MQ Version 8.x a1.sources=fe_s1 a1.channels=c1 a1.sinks=k1 a1.sources.s1.type=jms a1.sources.s1.channels=c1 a1.sources.s1.initialContextFactory=com.sun.jndi.fscontext.RefFSContextFactory a1.sources.s1.connectionFactory=FLUME_CF a1.sources.s1.destinationName=MY.Q a1.sources.s1.providerURL=file:///home/JNDI-Directory a1.sources.s1.destinationType=QUEUE a1.sources.s1.transportType=1 a1.sources.s1.userName=mqm a1.sources.s1.batchSize=1 a1.channels.c1.type=memory a1.channels.c1.capacity=10000 a1.channels.c1.transactionCapacity=100 a1.sinks.k1.type=logger a1.sinks.k1.channel=c1

ahnfelt · ‎10-09-2015

Thank you! I actually just an hour ago came across that solution when reading this, so I already implemented the solution you suggested, but was still waiting to see if it'd balance out evenly before I posted: http://www.slideshare.net/cloudera/hadoop-troubleshooting-101-kate-ting-cloudera Regardless, thank you very much for your help in resolving this issue.

bigdatawilliam · ‎10-09-2015

Hi Harsh, I have solved this issue. I think the problem is related with permissions. And solution is the agent should be started by sudo even by user root. like: # sudo ./cloudera-scm-agent start Then the distributing go smoothly. Thank you for your tips of running 'curl'.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Changing YARN local directory safely

Re: Balancer: number of nodes to be included = 0

Re: File distribution on HDFS

Re: HDFS Federation understanding

Re: [Sqoop] Using two versions of the same driver ...

Re: cloudera manager database lost

Re: HadoopAccessorException: E0900: Jobtracker [ho...

Re: Trying to integrate ibm mq as a flume source

Re: Balancer: No block has been moved for 5 iterat...

Re: "urlopen timed out" error occurs while distrib...