About cstanca

kbkreddy · ‎11-28-2018

Is there anyway to debug the io cache component to find out why it's not caching

pd47 · ‎02-15-2017

Yes it seems to be related. I'll keep eyes on it. Actually, I've just opened a case with Hortonworks too.

kehtg · ‎12-29-2016

For the beginners, like myself: If you added a new processor, or change the processor name, you will need to add or change the name in the .Processor file in <Home Dir>/Documents/nifi/ChakraProcessor/HWX/nifi-demo-processors/src/main/resources/META-INF/services. If you don't do this, the processor will not be loaded.

cstanca · ‎12-26-2016

Introduction h2o is a package for running H2O via its REST API from within R. This package allows the user to run basic H2O commands using R commands. No actual data is stored in the R workspace; and no actual work is carried out by R. R only saves the named objects, which uniquely identify the data set, model, etc. on the server. When the user makes a request, R queries the server via the REST API, which returns a JSON file with the relevant information that R then displays in the console. Scope I tested this installation guide on CentOS 7.2, but it should work on similar RedHat/Fedora/Centos… Steps 1. Install R sudo yum install R 2. Install Java https://www.java.com/en/download/help/linux_x64rpm_install.xml 3. Start R and install dependencies install.packages(RCurl) install.packages(bitops) install.packages(rjson) install.packages(statmod) install.packages(tools) 4. Install h20 package and load library for use install.packages("h2o"). library(h2o) If this is your first time using CRAN4 it will ask for a mirror to use. If you want H2O installed site-wide (i.e., usable by all users on that machine), run R as root, sudo R, then type install.packages("h2o"). 5. Test H2O installation Type: library(h2o) If nothing complains, launch h2o: h2o.init(). If all went well then you’ll see lots of output about how it is starting up H2O on your behalf, and then it should tell you all about your cluster. If not, the error message should be telling you what dependency is missing, or what the problem is. Post a note to this article and I will get back to you. Tips #1 - The version of H2O on CRAN might be up to a month or two behind the latest and greatest. Unless you are affected by a bug that you know has been fixed, don’t worry about it. #2- h2o.init() will only use two cores on your machine and maybe a quarter of your system memory, 6 by default. To resize resource, use h2o.shutdown() and start it again: a) using all your cores: h2o.init(nthreads = -1) b) using all your cores and 4 GB: h2o.init(nthreads = -1, max_mem_size = "4g") #3 - To run H2O on your local machine, you could call h2o.init without any arguments, and H2O will be automatically launched at localhost:54321, where the IP is "127.0.0.1" and the port is 54321. #4 - If H2O is running on a cluster, you must provide the IP and port of the remote machine as arguments to the h2o.init() call. The operation will be done on the server associated with the data object where H2O is running, not within the R environment. Tutorials H2O Tutorial on the Hortonworks Data Platform Sandbox: http://hortonworks.com/blog/oxdata-h2o-tutorial-hortonworks-sandbox/ Walk-Though Tutorials for Web UI: http://h2o-release.s3.amazonaws.com/h2o/rel-lambert/5/docs-website/tutorial/top.html

cstanca · ‎12-26-2016

@ALFRED CHAN It is present in Oregon too. Ohio is a new region that was just added by Amazon. We will upload that image in that region too.

cstanca · ‎12-26-2016

@kiran gutha Since Solr 4.7 has been added a class, MiniSolrCloudCluster, that actually "deploys" locally (and if you want ram only or on a temp dir) a complete solr cluster, with zookeeper, shards and everything, for your tests. You can find the jira here : https://issues.apache.org/jira/browse/SOLR-5865 Here is an example: private static MiniSolrCloudCluster miniCluster; private static CloudSolrServer cloudSolrServer; @BeforeClass public static void setup() throws Exception { miniCluster = new MiniSolrCloudCluster(2, null, new File("src/main/solr/solr.xml"), null, null); uploadConfigToZk("src/main/solr/content/conf/", "content"); // override settings in the solrconfig include System.setProperty("solr.tests.maxBufferedDocs", "100000"); System.setProperty("solr.tests.maxIndexingThreads", "-1"); System.setProperty("solr.tests.ramBufferSizeMB", "100"); // use non-test classes so RandomizedRunner isn't necessary System.setProperty("solr.tests.mergeScheduler", "org.apache.lucene.index.ConcurrentMergeScheduler"); System.setProperty("solr.directoryFactory", "solr.RAMDirectoryFactory"); cloudSolrServer = new CloudSolrServer(miniCluster.getZkServer().getZkAddress(), false); cloudSolrServer.setRequestWriter(new RequestWriter()); cloudSolrServer.setParser(new XMLResponseParser()); cloudSolrServer.setDefaultCollection("content"); cloudSolrServer.setParallelUpdates(false); cloudSolrServer.connect(); createCollection(cloudSolrServer, "content", 2, 1, "content"); } protected static void uploadConfigToZk(String configDir, String configName) throws Exception { SolrZkClient zkClient = null; try { zkClient = new SolrZkClient(miniCluster.getZkServer().getZkAddress(), 10000, 45000, null); uploadConfigFileToZk(zkClient, configName, "solrconfig.xml", new File(configDir, "solrconfig.xml")); uploadConfigFileToZk(zkClient, configName, "schema.xml", new File(configDir, "schema.xml")); uploadConfigFileToZk(zkClient, configName, "stopwords_en.txt", new File(configDir, "stopwords_en.txt")); uploadConfigFileToZk(zkClient, configName, "stopwords_it.txt", new File(configDir, "stopwords_it.txt")); System.out.println(zkClient.getChildren(ZkController.CONFIGS_ZKNODE + "/" + configName, null, true)); } finally { if (zkClient != null) zkClient.close(); } } protected static void uploadConfigFileToZk(SolrZkClient zkClient, String configName, String nameInZk, File file) throws Exception { zkClient.makePath(ZkController.CONFIGS_ZKNODE + "/" + configName + "/" + nameInZk, file, false, true); } @AfterClass public static void shutDown() throws Exception { miniCluster.shutdown(); } protected static NamedList createCollection(CloudSolrServer server, String name, int numShards, int replicationFactor, String configName) throws Exception { ModifiableSolrParams modParams = new ModifiableSolrParams(); modParams.set(CoreAdminParams.ACTION, CollectionAction.CREATE.name()); modParams.set("name", name); modParams.set("numShards", numShards); modParams.set("replicationFactor", replicationFactor); modParams.set("collection.configName", configName); QueryRequest request = new QueryRequest(modParams); request.setPath("/admin/collections"); return server.request(request); } @Test public void test() throws Exception { // Do you stuff here using cloudSolrServer as a normal solrServer }

sampathkumar_ma · ‎12-26-2016

deployment-hdp.png yumreposd-directory.png Thank you so much. I rebuilt all the nodes and followed the pre-requisites to prepare the same. Now Ambari host registration with server is successful. But during deployment, I am getting below error message. Can you please help on this. Log: line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 293, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hdp-select' returned 1. One of the configured repositories failed (HDP-2.4), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=HDP-2.4 ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable HDP-2.4 or subscription-manager repos --disable=HDP-2.4 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=HDP-2.4.skip_if_unavailable=true failure: repodata/repomd.xml from HDP-2.4: [Errno 256] No more mirrors to try. http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found stdout: /var/lib/ambari-agent/data/output-200.txt 2016-12-26 18:21:41,206 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2016-12-26 18:21:41,208 - Group['spark'] {} 2016-12-26 18:21:41,319 - Initializing 2 repositories 2016-12-26 18:21:41,320 - Repository['HDP-2.4'] {'base_url': 'http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/', 'action': ['create'], 'components': [u'HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None} 2016-12-26 18:21:41,339 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.4]\nname=HDP-2.4\nbaseurl=http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/\n\npath=/\nenabled=1\ngpgcheck=0'} 2016-12-26 18:21:41,341 - Repository['HDP-UTILS-1.1.0.20'] {'base_url': 'http://192.168.0.12/repo/HDP-UTILS-1.1.0.20/repos/centos7/', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None} 2016-12-26 18:21:41,347 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.20]\nname=HDP-UTILS-1.1.0.20\nbaseurl=http://192.168.0.12/repo/HDP-UTILS-1.1.0.20/repos/centos7/\n\npath=/\nenabled=1\ngpgcheck=0'} 2016-12-26 18:21:41,348 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2016-12-26 18:21:41,528 - Skipping installation of existing package unzip 2016-12-26 18:21:41,528 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2016-12-26 18:21:41,558 - Skipping installation of existing package curl 2016-12-26 18:21:41,558 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2016-12-26 18:21:41,593 - Installing package hdp-select ('/usr/bin/yum -d 0 -e 0 -y install hdp-select') 2016-12-26 18:21:41,932 - Execution of '/usr/bin/yum -d 0 -e 0 -y install hdp-select' returned 1. One of the configured repositories failed (HDP-2.4), and yum doesn't have enough cached data to continue. At this point the only safe thing yum can do is fail. There are a few ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is most often useful if you are using a newer distribution release than is supported by the repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=HDP-2.4 ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just ignore the repository until you permanently enable it again or use --enablerepo for temporary usage: yum-config-manager --disable HDP-2.4 or subscription-manager repos --disable=HDP-2.4 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will try to contact the repo. when it runs most commands, so will have to try and fail each time (and thus. yum will be be much slower). If it is a very temporary problem though, this is often a nice compromise: yum-config-manager --save --setopt=HDP-2.4.skip_if_unavailable=true failure: repodata/repomd.xml from HDP-2.4: [Errno 256] No more mirrors to try. http://192.168.0.12/repo/HDP/centos7/2.x/updates/2.4.0.0/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found 2016-12-26 18:21:41,932 - Failed to install package hdp-select. Executing '/usr/bin/yum clean metadata' 2016-12-26 18:21:42,267 - Retrying to install package hdp-select after 30 seconds Command failed after 1 tries BR-Sampath

bikas · ‎01-03-2017

Alicia, please see my answer above on Oct 24. If you are running Spark on YARN you will have to go through the YARN RM UI to get to the Spark UI for a running job. Link for YARN UI is available from Ambari YARN service. For a completed job, you will need to go through Spark History Server. Link for Spark history server is available from the Ambari Spark service.

cstanca · ‎12-23-2016

Introduction The producer sends data directly to the broker that is the leader for the partition without any intervening routing tier. Optimization Approach Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer will attempt to accumulate data in memory and to send out larger batches in a single request. The batching can be configured to accumulate no more than a fixed number of messages and to wait no longer than some fixed latency bound (say 64k or 10 ms). This allows the accumulation of more bytes to send, and few larger I/O operations on the servers. This buffering is configurable and gives a mechanism to trade off a small amount of additional latency for better throughput. In order to find the optimal batch size and latency, iterative test supported by producer statistics monitoring is needed. Enable Monitoring Start the producer with the JMX parameters enabled: JMX_PORT=10102 bin/kafka-console-producer.sh --broker-list localhost:9092--topic testtopic Producer Metrics Use jconsole application via JMX at port number 10102. Tip: run jconsole application remotely to avoid impact on broker machine. See metrics in MBeans tab. The clientId parameter is the producer client ID for which you want the statistics. kafka.producer:type=ProducerRequestMetrics,name=ProducerRequestRateAndTimeMs,clientId=console-producer This MBean give values for the rate of producer requests taking place as well as latencies involved in that process. It gives latencies as a mean, the 50th, 75th, 95th, 98th, 99th, and 99.9thlatency percentiles. It also gives the time taken to produce the data as a mean, one minute average, five minute average, and fifteen minute average. It gives the count as well. kafka.producer:type=ProducerRequestMetrics,name=ProducerRequestSize,clientId=console-producer This MBean gives the request size for the producer. It gives the count, mean, max, min, standard deviation, and the 50th, 75th, 95th, 98th, 99th, and 99.9thpercentile of request sizes. kafka.producer:type=ProducerStats,name=FailedSendsPerSec,clientId=console-producer This gives the number of failed sends per second. It gives this value of counts, the mean rate, one minute average, five minute average, and fifteen minute average value for the failed requests per second. kafka.producer:type=ProducerStats,name=SerializationErrorsPerSec,clientId=console-producer This gives the number of serialization errors per second. It gives this value of counts, mean rate, one minute average, five minute average, and fifteen minute average value for the serialization errors per second. kafka.producer:type=ProducerTopicMetrics,name=MessagesPerSec,clientId=console-producer This gives the number of messages produced per second. It gives this value of counts, mean rate, one-minute average, five-minute average, and fifteen-minute average for the messages produced per second. References https://kafka.apache.org/documentation.html#monitoring Apache Kafka Cookbook by Saurabh Minni, 2015

demerovb · ‎12-23-2016

@Randy Gelhausen, thank you for the link. I added that to my favorites!:) @Constantin Stanca Thank you so much for your updated response. It provided valuable reasoning and advice and helped me to read easier Wes' article.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: LLAP not using io cache

Re: Grafana - Invalid Basic Auth Header: Invalid b...

Re: Build Custom Nifi Processor

Install H2O with R

Re: HDPCD developer practice exam AMI missing in A...

Re: Solr: How to test solr distributed component?

Re: Ambari host registration with server failed

Re: How access to Spark Web UI ?

Monitor Kafka Producer for Performance

Re: Any tips on how to optimize Kafka broker perfo...