Member since
09-24-2015
816
Posts
488
Kudos Received
189
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2626 | 12-25-2018 10:42 PM | |
12060 | 10-09-2018 03:52 AM | |
4164 | 02-23-2018 11:46 PM | |
1839 | 09-02-2017 01:49 AM | |
2166 | 06-21-2017 12:06 AM |
04-07-2017
01:00 AM
1 Kudo
For NN data some fault-tolerant RAID, like 1, 5, or 10 is fine. On worker nodes, for hdfs data you should use JBOD or RAID-0 per disk (so that you have 3 mount points). RAID-1 for OS on all nodes is fine. I'm not sure what do you mean by "cashing".
... View more
04-07-2017
12:53 AM
Thanks for your reply, but your solution will fix all Zeppelin interpreters to use py3. I want to have interpreters running both py2 and py3. I was able to set livy.pyspark to work on py3, and I'm looking for setup to enable spark.pyspark interpreter to work on py3.
... View more
04-06-2017
10:15 PM
1 Kudo
If you want to minimize your down time you can try to stop nodes one by one, upgrade RAM and restart all components on the node after restart. You must do masters one by one, but in case of workers you can do in sets of 2, or if you have rack-awarness in sets of 3-4 or even rack by rack. You need HA configuration of major services like HDFS, Yarn, HBase, Hive for this to work. You also need replication of Kafka topics of at least 2, and do Kafka nodes one by one, if you have Kafka.
... View more
04-06-2017
09:54 AM
No, multiple HBASE services are not supported right now, not even by "copy". You'd need to create additional services like HBASE2, HBASE3 and so on, like SPARK2 now running in addition to SPARK. Instead you can either add more nodes to your cluster and enable the single HBASE to handle all your requirements, or create 2 clusters with one HBase in each. I'd suggest working with a single, strengthened HBase cluster. Recently I've been involved in a HBase cluster running on several hundred nodes, and after some tuning it works great. Initially we also considered 2 clusters but this one is covering all our needs for the time being, and scaling well so far.
... View more
04-06-2017
09:37 AM
1 Kudo
Yes, exactly! Data stored on HDFS is not affected in any way, so all files used by a single HBase region are still replaced only 3 times. What is further replicated to achieve RS HA are read-only secondary keys held by respective Region Servers. You can find a good explanation here. What you get in return is faster recovery for reading from HBase. For "write" you still need to wait longer (like without RS HA), until the HBase master activates affected regions on other Region Servers.
... View more
04-01-2017
12:35 AM
3 Kudos
Cloudbreak is a popular, easy to use HDP component for cluster deployment on various cloud environments including
Azure, AWS, OpenStac and GCP. This article shows how to create an Azure application for Cloudbreak using Azure CLI. Note: To do this, you need access to "Owner" account on your Azure subscription. "Developer" and other roles are not enough.
Download and install Azure CLI using instructions provided here. CLI versions are available for Windows, Mac-OS and Linux https://docs.microsoft.com/en-us/cli/azure/install-azure-cli
Type "az" to make sure the CLI is available and in your command path. Login to your Azure account in your web browser, and then also login from your command line: az login
To sign in, use a web browser to open the page https://aka.ms/devicelogin and enter the code HPBCSXTPJ to authenticate.
Follow the instructions on the web page. When done you will see confirmation on the command line that your login was successful. Run the following command. You can freely choose values to enter here including dummy URIs. Identifier URI and the homepage are never used on Azure but they are
required. Also make sure that identifier URI is unique on your subscription. So, instead of "mycbdapp" you may choose a more
descriptive name.
URIs are dummy, never used, but required az ad app create --identifier-uris http://mycbdapp.com --display-name mycbdapp --homepage http://mycbdapp.com
Ignore the output of this command, including appId, that's not the one we need! Choose your password, and run the following command az ad sp create-for-rbac --name "mycbdapp" --password "mytopsecretpassword" --role Owner
{
"appId": "c19a48f3-492f-a87b-ac4a-b1d8e456f14e",
"displayName": "mycbdapp",
"name": "http://mycbdapp",
"password": "mytopsecretpassword",
"tenant": "891fd956-21c9-4c40-bfa7-ab88c1d8364c"
}
Now login to your Cloudbreak instance, select "manage credentials", "+ create credential", and on the
"Configure credential" page select Azure and fill the form like on the screenshot.
Use appId, password, and tenant ID from the
output above. Add you Azure subscription ID, and paste the public key of your ssh key pair your created before
(this will be used to provide ssh access to cluster machines to the "cloudbreak" user).
Then, proceed by providing other settings, and enjoy HDP on Cloudbreak!
... View more
Labels:
03-30-2017
11:54 AM
2 Kudos
Trying to use Zeppelin pyspark interpreter with python3, I set "python" parameter in the interpreter to my python3 path, and have installed python3 on all worker nodes in the cluster at the same path, getting error when running simple commands: %pyspark
file = sc.textFile("/data/x1")
file.take(3)
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions It works from the command line, using "pyspark" after exporting PYSPARK_PYTHON set to my python3 path. But how to tell this to Zeppelin? I haven't changed anything else. Actually, as the next step I'd like to create 2 spark interpreters, one to run on python2 and another on python3.
... View more
Labels:
- Labels:
-
Apache Zeppelin
03-29-2017
11:24 PM
You can try WebHCat, its mapreduce command
... View more
03-25-2017
05:52 AM
Well, it seems to be a bug, reported but unattended: HIVE-13983. A workaround is to use INSERT INTO ... SELECT like insert into test select 'привет' from test limit 1;
... View more