About pminovic

pminovic · ‎04-15-2016

Hi Anandha, (1) The same guide applies to upgrades to 2.4 (2) Just set hive.server2.support.dynamic.service.discovery to true in Ambari. You can try without it, then on the page where you decide between Rolling and Express upgrade Ambari will tell you can you proceed or not.

pminovic · ‎04-15-2016

I'm not sure what exactly do you want to do. What do you mean by "related parameters"? If you want to save all cluster configuration properties before the upgrade you can extract a blueprint curl -u admin:admin -H "X-Requested-By: ambari" http://<ambari-server-fqdn>:8080/api/v1/clusters/:clusterName?format=blueprint and convert its json output to csv and import in Excel. You can do the same after the upgrade and compare output. Some people do that. However, you can check all config property changes using the "History" feature of Ambari by comparing config versions before and after the upgrade for each service. As for the upgrade itself just follow the Ambari Upgrade guide (and don't skip any step!).

pminovic · ‎04-15-2016

This error means that something is wrong with your Derby database. Can you check is it created? The path do Derby files is given by "Oozie Data Dir" (oozie_data_dir) in Ambari->Oozie->Oozie server and OOZIE_DATA in oozie-env.sh. Check permissions to that path and retry. You can also try to create the DB manually if ooziedb.bat is available, but better go through Ambari.

pminovic · ‎04-15-2016

Okay, great, yes, the error was about "no channel configured". Regarding the path in hdfs, I edited my answer to include the full path in hdfs including the Name node: hdfs://sandbox.hortonworks.com:8020/user/Revathy/Flume/%y-%m-%d/%H%M/%S. It's good to organize your folders in HDFS in some way, here I put your home directory in HDFS. How do one know that the Flulme agent works? Well, if it keeps on running, there are no errors in logs, and if data written to sinks is as expected. You can find a lot of details here. You can also run Flume from Ambari, in which case Ambari will let you know whether Flume process in healthy and running. However, one still has to incepct sinks to be sure.

pminovic · ‎04-14-2016

It's not recommended to change banned.users and allowed.system.user for security reasons. It's always a good idea to run Yarn jobs as an end user. It's like when you have real users on the cluster, you create their accounts and let them login and run their apps. yarn user is used to manage Yarn, for example by running "yarn rmadmin" and other such commands. If nevertheless you want to try, the only way is to edit the cfj.j2 file located at /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/package/templates/container-executor.cfg.j2.

pminovic · ‎04-14-2016

Hi @Ran Postar You can reduce "Minimum user ID for submitting job" (min_user_id) in yarn-env in Ambari->Yarn from default 1000 to a smaller value, for example 500. The value is referenced as min.user.id={{min_user_id}} in container-executor.cfg.j2 and it should work.

pminovic · ‎04-14-2016

Check the following two lines in your sink block source_agent.sinks.avro_sink.hdfs.filetype = Datastream source_agent.sinks.avro_sink.hdfs.a1.sinks.k2.hdfs.path = /Revathy/Flume/%y-%m-%d/%H%M/%S in the first one capitals are not correct, and in the second one the property name on the left side is incorrect. Change them and retry: source_agent.sinks.avro_sink.hdfs.fileType = DataStream source_agent.sinks.avro_sink.hdfs.path = hdfs://sandbox.hortonworks.com:8020/user/Revathy/Flume/%y-%m-%d/%H%M/%S

pminovic · ‎04-14-2016

Hi Adi, the threshold means that the utilization of storage on each node after balancing will be (ACU +- threshold) where I use ACU to denote "average cluster utilization". Example: (1) Before adding new nodes: Let's say you have 10 data nodes, each has capacity of 20T, and your data size is 100T. In this case ACU=50% and if all nodes are perfectly balanced, each stores 10T of data. (2) After: Let's say you add 4 large nodes, each with capacity of 50T, and you still have 100T of data. Your total capacity is now doubled to 400T, and therefore ACU=25%. However, your new nodes are empty. Running the balancer with threshold th=10% will ensure that utilization of all nodes is between ACU-th and ACU+th, in this case between 15% and 35%. We are starting with old nodes at 50% and new nodes at 0% of utilization. Balancer will keep on moving data until old nodes' utilization is <= 35% and new nodes' utilization is >= 15%, which means old nodes keeping less than 20*0.35=7T and new nodes keeping more than 50*0.15=7.5T. As you can see, in this particular case data-per-node amounts are not so far away from each other, but as you keep on adding more data the differences will grow up little by little. If you are interested in more details about the balancer, please refer to HADOOP-1652 and the Balancer design document.

pminovic · ‎04-13-2016

Are you on CentOS/RHEL-7? I did recently two upgrades, one using Ambari-2.2.0 and one using 2.2.1.1 (both from HDP-2.2.x) and I had no issues. However, both were on RHEL-6.x

pminovic · ‎04-13-2016

Well, I thought my answer of Apr. 5 ... 🙂

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: HDP 2.4 rolling upgrade

Re: How do i make Upgrade Plan for HDP versio...

Re: Oozie Service (HDP 2.4.0) Failed To Start on W...

Re: Flume - source exec and sink hdfs. File is not...

Re: After upgrading to Ambari .2.2.1 container-exe...

Re: After upgrading to Ambari .2.2.1 container-exe...

Re: Flume - source exec and sink hdfs. File is not...

Re: When should i stop the balancer ?

Re: Registered version HDP-2.3.4.0 is not listed

Re: Phoenix Bulk Load on Ctrl-A delimiter (error c...