Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6961 | 09-21-2018 09:54 PM | |
| 8721 | 03-31-2018 03:59 AM | |
| 2613 | 03-31-2018 03:55 AM | |
| 2754 | 03-31-2018 03:31 AM | |
| 6174 | 03-27-2018 03:46 PM |
11-28-2018
12:10 AM
Is there anyway to debug the io cache component to find out why it's not caching
... View more
02-15-2017
09:33 PM
Yes it seems to be related. I'll keep eyes on it. Actually, I've just opened a case with Hortonworks too.
... View more
12-29-2016
07:30 AM
1 Kudo
For the beginners, like myself: If you added a new processor, or change the processor name, you will need to add or change the name in the .Processor file in <Home Dir>/Documents/nifi/ChakraProcessor/HWX/nifi-demo-processors/src/main/resources/META-INF/services. If you don't do this, the processor will not be loaded.
... View more
12-26-2016
09:02 PM
2 Kudos
Introduction
h2o is a package for running H2O via its REST API from within R. This package allows the user to run basic H2O commands using R commands. No actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc. on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console.
Scope
I tested this installation guide on CentOS 7.2, but it
should work on similar RedHat/Fedora/Centos…
Steps
1. Install R
sudo yum install R
2. Install Java
https://www.java.com/en/download/help/linux_x64rpm_install.xml
3. Start R and install dependencies
install.packages(RCurl)
install.packages(bitops)
install.packages(rjson)
install.packages(statmod)
install.packages(tools)
4. Install h20 package and load library for use
install.packages("h2o").
library(h2o)
If this is your first time using CRAN4 it will ask for a
mirror to use. If you want H2O installed site-wide (i.e., usable by all users
on that machine), run R as root, sudo R, then type
install.packages("h2o").
5. Test H2O installation
Type:
library(h2o)
If nothing complains, launch h2o:
h2o.init().
If all went well then you’ll see lots of output about how it
is starting up H2O on your behalf, and then it should tell you all about your
cluster. If not, the error message should be telling you what dependency is
missing, or what the problem is. Post a note to this article and I will get
back to you.
Tips
#1 - The version of H2O on CRAN might be up to a month or two
behind the latest and greatest. Unless you are affected by a bug that you know
has been fixed, don’t worry about it.
#2- h2o.init() will only use two cores on your machine and maybe
a quarter of your system memory, 6 by default. To resize resource, use h2o.shutdown() and start it again:
a) using all your cores:
h2o.init(nthreads = -1)
b) using all your cores and 4 GB:
h2o.init(nthreads = -1, max_mem_size = "4g")
#3 - To run H2O on your local machine, you could call h2o.init without any
arguments, and H2O will be automatically launched at localhost:54321, where the
IP is "127.0.0.1" and the port is 54321.
#4 - If H2O is running on a
cluster, you must provide the IP and port of the remote machine as arguments to
the h2o.init() call. The operation will be done on the server associated with
the data object where H2O is running, not within the R environment. Tutorials
H2O Tutorial on the Hortonworks Data Platform Sandbox:
http://hortonworks.com/blog/oxdata-h2o-tutorial-hortonworks-sandbox/
Walk-Though Tutorials for Web UI:
http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/tutorial/top.html
... View more
Labels:
12-26-2016
08:15 PM
@ALFRED CHAN It is present in Oregon too. Ohio is a new region that was just added by Amazon. We will upload that image in that region too.
... View more
12-26-2016
09:33 PM
2 Kudos
@kiran gutha Since Solr 4.7 has been added a class, MiniSolrCloudCluster, that actually "deploys" locally (and if you want ram only or on a temp dir) a complete solr cluster, with zookeeper, shards and everything, for your tests. You can find the jira here : https://issues.apache.org/jira/browse/SOLR-5865 Here is an example: private static MiniSolrCloudCluster miniCluster;
private static CloudSolrServer cloudSolrServer;
@BeforeClass
public static void setup() throws Exception {
miniCluster = new MiniSolrCloudCluster(2, null, new File("src/main/solr/solr.xml"), null, null);
uploadConfigToZk("src/main/solr/content/conf/", "content");
// override settings in the solrconfig include
System.setProperty("solr.tests.maxBufferedDocs", "100000");
System.setProperty("solr.tests.maxIndexingThreads", "-1");
System.setProperty("solr.tests.ramBufferSizeMB", "100");
// use non-test classes so RandomizedRunner isn't necessary
System.setProperty("solr.tests.mergeScheduler", "org.apache.lucene.index.ConcurrentMergeScheduler");
System.setProperty("solr.directoryFactory", "solr.RAMDirectoryFactory");
cloudSolrServer = new CloudSolrServer(miniCluster.getZkServer().getZkAddress(), false);
cloudSolrServer.setRequestWriter(new RequestWriter());
cloudSolrServer.setParser(new XMLResponseParser());
cloudSolrServer.setDefaultCollection("content");
cloudSolrServer.setParallelUpdates(false);
cloudSolrServer.connect();
createCollection(cloudSolrServer, "content", 2, 1, "content");
}
protected static void uploadConfigToZk(String configDir, String configName) throws Exception {
SolrZkClient zkClient = null;
try {
zkClient = new SolrZkClient(miniCluster.getZkServer().getZkAddress(), 10000, 45000, null);
uploadConfigFileToZk(zkClient, configName, "solrconfig.xml", new File(configDir, "solrconfig.xml"));
uploadConfigFileToZk(zkClient, configName, "schema.xml", new File(configDir, "schema.xml"));
uploadConfigFileToZk(zkClient, configName, "stopwords_en.txt", new File(configDir, "stopwords_en.txt"));
uploadConfigFileToZk(zkClient, configName, "stopwords_it.txt", new File(configDir, "stopwords_it.txt"));
System.out.println(zkClient.getChildren(ZkController.CONFIGS_ZKNODE + "/" + configName, null, true));
} finally {
if (zkClient != null)
zkClient.close();
}
}
protected static void uploadConfigFileToZk(SolrZkClient zkClient, String configName, String nameInZk, File file) throws Exception {
zkClient.makePath(ZkController.CONFIGS_ZKNODE + "/" + configName + "/" + nameInZk, file, false, true);
}
@AfterClass
public static void shutDown() throws Exception {
miniCluster.shutdown();
}
protected static NamedList createCollection(CloudSolrServer server, String name, int numShards, int replicationFactor, String configName) throws Exception {
ModifiableSolrParams modParams = new ModifiableSolrParams();
modParams.set(CoreAdminParams.ACTION, CollectionAction.CREATE.name());
modParams.set("name", name);
modParams.set("numShards", numShards);
modParams.set("replicationFactor", replicationFactor);
modParams.set("collection.configName", configName);
QueryRequest request = new QueryRequest(modParams);
request.setPath("/admin/collections");
return server.request(request);
}
@Test
public void test() throws Exception {
// Do you stuff here using cloudSolrServer as a normal solrServer
}
... View more
12-26-2016
01:36 PM
deployment-hdp.png yumreposd-directory.png Thank you so much. I rebuilt all the nodes and followed the pre-requisites to prepare the same. Now Ambari host registration with server is successful. But during deployment, I am getting below error message. Can you please help on this. Log: line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hdp-select' returned 1. One of the configured repositories failed (HDP-2.4), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=HDP-2.4 ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable HDP-2.4 or subscription-manager repos --disable=HDP-2.4 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=HDP-2.4.skip_if_unavailable=true failure: repodata/repomd.xml from HDP-2.4: [Errno 256] No more mirrors to try. http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found stdout: /var/lib/ambari-agent/data/output-200.txt 2016-12-26 18:21:41,206 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-26 18:21:41,208 - Group['spark'] {} 2016-12-26 18:21:41,319 - Initializing 2 repositories 2016-12-26 18:21:41,320 - Repository['HDP-2.4'] {'base_url': 'http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/', 'action': ['create'], 'components': [u'HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None} 2016-12-26 18:21:41,339 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.4]\nname=HDP-2.4\nbaseurl=http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/\n\npath=/\nenabled=1\ngpgcheck=0'} 2016-12-26 18:21:41,341 - Repository['HDP-UTILS-1.1.0.20'] {'base_url': 'http://192.168.0.12/repo/HDP-UTILS-1.1.0.20/repos/centos7/', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None} 2016-12-26 18:21:41,347 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.20]\nname=HDP-UTILS-1.1.0.20\nbaseurl=http://192.168.0.12/repo/HDP-UTILS-1.1.0.20/repos/centos7/\n\npath=/\nenabled=1\ngpgcheck=0'} 2016-12-26 18:21:41,348 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2016-12-26 18:21:41,528 - Skipping installation of existing package unzip 2016-12-26 18:21:41,528 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2016-12-26 18:21:41,558 - Skipping installation of existing package curl 2016-12-26 18:21:41,558 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2016-12-26 18:21:41,593 - Installing package hdp-select ('/usr/bin/yum -d 0 -e 0 -y install hdp-select') 2016-12-26 18:21:41,932 - Execution of '/usr/bin/yum -d 0 -e 0 -y install hdp-select' returned 1. One of the configured repositories failed (HDP-2.4), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=HDP-2.4 ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable HDP-2.4 or subscription-manager repos --disable=HDP-2.4 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=HDP-2.4.skip_if_unavailable=true failure: repodata/repomd.xml from HDP-2.4: [Errno 256] No more mirrors to try. http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found 2016-12-26 18:21:41,932 - Failed to install package hdp-select. Executing '/usr/bin/yum clean metadata' 2016-12-26 18:21:42,267 - Retrying to install package hdp-select after 30 seconds Command failed after 1 tries BR-Sampath
... View more
01-03-2017
09:41 PM
Alicia, please see my answer above on Oct 24. If you are running Spark on YARN you will have to go through the YARN RM UI to get to the Spark UI for a running job. Link for YARN UI is available from Ambari YARN service. For a completed job, you will need to go through Spark History Server. Link for Spark history server is available from the Ambari Spark service.
... View more
12-23-2016
02:59 AM
12 Kudos
Introduction The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier. Optimization Approach Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer will attempt to accumulate data in memory and to send out larger batches in a single request. The batching can be configured to accumulate no more than a fixed number of messages and to wait no longer than some fixed latency bound (say 64k or 10 ms). This allows the accumulation of more bytes to send, and few larger I/O operations on the servers. This buffering is configurable and gives a mechanism to trade off a small amount of additional latency for better throughput. In order to find the optimal batch size and latency, iterative test supported by producer statistics monitoring is needed. Enable Monitoring Start the producer with the JMX parameters enabled: JMX_PORT=10102 bin/kafka-console-producer.sh --broker-list localhost:9092--topic testtopic Producer Metrics Use jconsole application via JMX at port number 10102. Tip: run jconsole application remotely to avoid impact on broker machine. See metrics in MBeans tab. The <strong>clientId</strong> parameter is the producer client ID for which you want the statistics. <strong>kafka.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=console-producer</strong> This MBean give values for the rate of producer requests taking place as well as latencies involved in that process. It gives latencies as a mean, the 50th, 75th, 95th, 98th, 99th, and 99.9thlatency percentiles. It also gives the time taken to produce the data as a mean, one minute average, five minute average, and fifteen minute average. It gives the count as well. <strong>kafka.producer:type=ProducerRequestMetrics,name=ProducerRequestSize,clientId=console-producer</strong> This MBean gives the request size for the producer. It gives the count, mean, max, min, standard deviation, and the 50th, 75th, 95th, 98th, 99th, and 99.9thpercentile of request sizes. <strong>kafka.producer:type=ProducerStats,name=FailedSendsPerSec,clientId=console-producer</strong> This gives the number of failed sends per second. It gives this value of counts, the mean rate, one minute average, five minute average, and fifteen minute average value for the failed requests per second. <strong>kafka.producer:type=ProducerStats,name=SerializationErrorsPerSec,clientId=console-producer</strong> This gives the number of serialization errors per second. It gives this value of counts, mean rate, one minute average, five minute average, and fifteen minute average value for the serialization errors per second. <strong>kafka.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=console-producer</strong> This gives the number of messages produced per second. It gives this value of counts, mean rate, one-minute average, five-minute average, and fifteen-minute average for the messages produced per second. References
https://kafka.apache.org/documentation.html#monitoring
Apache Kafka Cookbook by Saurabh Minni, 2015
... View more
Labels:
12-23-2016
03:51 PM
@Randy Gelhausen, thank you for the link. I added that to my favorites!:) @Constantin Stanca Thank you so much for your updated response. It provided valuable reasoning and advice and helped me to read easier Wes' article.
... View more