About pminovic

pminovic · ‎04-07-2017

For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".

pminovic · ‎04-07-2017

Thanks for your reply, but your solution will fix all Zeppelin interpreters to use py3. I want to have interpreters running both py2 and py3. I was able to set livy.pyspark to work on py3, and I'm looking for setup to enable spark.pyspark interpreter to work on py3.

pminovic · ‎04-06-2017

If you want to minimize your down time you can try to stop nodes one by one, upgrade RAM and restart all components on the node after restart. You must do masters one by one, but in case of workers you can do in sets of 2, or if you have rack-awarness in sets of 3-4 or even rack by rack. You need HA configuration of major services like HDFS, Yarn, HBase, Hive for this to work. You also need replication of Kafka topics of at least 2, and do Kafka nodes one by one, if you have Kafka.

pminovic · ‎04-06-2017

You can consider to use Hive RegexSerDe, see here for details.

pminovic · ‎04-06-2017

No, multiple HBASE services are not supported right now, not even by "copy". You'd need to create additional services like HBASE2, HBASE3 and so on, like SPARK2 now running in addition to SPARK. Instead you can either add more nodes to your cluster and enable the single HBASE to handle all your requirements, or create 2 clusters with one HBase in each. I'd suggest working with a single, strengthened HBase cluster. Recently I've been involved in a HBase cluster running on several hundred nodes, and after some tuning it works great. Initially we also considered 2 clusters but this one is covering all our needs for the time being, and scaling well so far.

pminovic · ‎04-06-2017

Yes, exactly! Data stored on HDFS is not affected in any way, so all files used by a single HBase region are still replaced only 3 times. What is further replicated to achieve RS HA are read-only secondary keys held by respective Region Servers. You can find a good explanation here. What you get in return is faster recovery for reading from HBase. For "write" you still need to wait longer (like without RS HA), until the HBase master activates affected regions on other Region Servers.

pminovic · ‎04-01-2017

Cloudbreak is a popular, easy to use HDP component for cluster deployment on various cloud environments including Azure, AWS, OpenStac and GCP. This article shows how to create an Azure application for Cloudbreak using Azure CLI. Note: To do this, you need access to "Owner" account on your Azure subscription. "Developer" and other roles are not enough. Download and install Azure CLI using instructions provided here. CLI versions are available for Windows, Mac-OS and Linux https://docs.microsoft.com/en-us/cli/azure/install-azure-cli Type "az" to make sure the CLI is available and in your command path. Login to your Azure account in your web browser, and then also login from your command line: az login To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code HPBCSXTPJ to authenticate. Follow the instructions on the web page. When done you will see confirmation on the command line that your login was successful. Run the following command. You can freely choose values to enter here including dummy URIs. Identifier URI and the homepage are never used on Azure but they are required. Also make sure that identifier URI is unique on your subscription. So, instead of "mycbdapp" you may choose a more descriptive name. URIs are dummy, never used, but required az ad app create --identifier-uris http://mycbdapp.com --display-name mycbdapp --homepage http://mycbdapp.com Ignore the output of this command, including appId, that's not the one we need! Choose your password, and run the following command az ad sp create-for-rbac --name "mycbdapp" --password "mytopsecretpassword" --role Owner { "appId": "c19a48f3-492f-a87b-ac4a-b1d8e456f14e", "displayName": "mycbdapp", "name": "http://mycbdapp", "password": "mytopsecretpassword", "tenant": "891fd956-21c9-4c40-bfa7-ab88c1d8364c" } Now login to your Cloudbreak instance, select "manage credentials", "+ create credential", and on the "Configure credential" page select Azure and fill the form like on the screenshot. Use appId, password, and tenant ID from the output above. Add you Azure subscription ID, and paste the public key of your ssh key pair your created before (this will be used to provide ssh access to cluster machines to the "cloudbreak" user). Then, proceed by providing other settings, and enjoy HDP on Cloudbreak!

pminovic · ‎03-30-2017

Trying to use Zeppelin pyspark interpreter with python3, I set "python" parameter in the interpreter to my python3 path, and have installed python3 on all worker nodes in the cluster at the same path, getting error when running simple commands: %pyspark file = sc.textFile("/data/x1") file.take(3) Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions It works from the command line, using "pyspark" after exporting PYSPARK_PYTHON set to my python3 path. But how to tell this to Zeppelin? I haven't changed anything else. Actually, as the next step I'd like to create 2 spark interpreters, one to run on python2 and another on python3.

pminovic · ‎03-29-2017

You can try WebHCat, its mapreduce command

pminovic · ‎03-25-2017

Well, it seems to be a bug, reported but unattended: HIVE-13983. A workaround is to use INSERT INTO ... SELECT like insert into test select 'привет' from test limit 1;

Online	Offline
Last Visited	‎08-19-2019 01:20 AM

Member Since	‎09-24-2015 04:02 AM
Last Visited	‎08-19-2019 01:20 AM
Posts	816
Kudos received	481

Cloudera Community

Re: datanode + Error occurred during initializatio...

Re: Problem when Distcp between two HA Cluster.

Re: Beeline over KNOX fails with HTTP Response co...

Re: What does nclients option of performance evalu...

Re: missing directories in ambari installation pac...

Re: Do we config our hadoop right? JBOD vs RAID

Re: Error running Zeppelin pyspark interpreter wit...

Re: Is there any best practice or guide to perform...

Re: Help with Hive Regex extract.

Re: Could I manage multiple hbase cluster in the s...

Re: HBase HA vs HDFS replication...

Creating Application for HDP cluster deployment on...

Error running Zeppelin pyspark interpreter with Py...

Re: Execution of hadoop-mapreduce-examples.jar usi...

Re: Hive UTF-8 problems