About cstanca

cstanca · ‎12-29-2016

These tools are used similarly with any software SDLC, just you will be developing software being executed on a Hadoop/Spark cluster. You can still build your jars the same way and use GIT as your source code repository. You will be submitting the job for execution in a distributed cluster. However, there are pseudo clusters for development. For example you can use hadoop mini cluster: https://github.com/sakserv/hadoop-mini-clusters A good reference on how to use this mini cluster for testing: http://www.lopakalogic.com/articles/hadoop-articles/hadoop-testing-with-minicluster/ For Spark development you could use Spark standalone.

cstanca · ‎12-28-2016

@Eric Periard Check this: https://community.hortonworks.com/questions/59122/hadoop-examples-110-snapshotjar-missing.html It would have been if the documentation was more precise. Duly noted. Thanks.

cstanca · ‎12-28-2016

Start here: http://lucene.apache.org/solr/quickstart.html Search for "Indexing Solr XML" and perform the steps indicated. In the end, you could browse the documents indexed at http://localhost:8983/solr/gettingstarted/browse. That is how the output you are interested looks like. Of courser, replace "localhost" with your case host in the URL. The /browse UI view defaults to assuming the gettingstarted schema and data are a catch-all mix of structured XML, JSON, CSV example data, and unstructured rich documents. Your own data may not look ideal at first, though the /browse templates are customizable.

cstanca · ‎12-28-2016

@rama This package makes it easy to transform data between wide and long formats. Package versions are back compatibles until a function is deprecated. If a function is deprecated one scripts may not work, but that will not mean you lost the data. Your script could throw an error if any of the functions in the script don't support the API invoked. Most likely your newer version of the package worked just fine. Assuming that you installed a newer version of the package and you still want to go to the previous version, you could revert to the previous version using the approach presented in this blog: https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages

cstanca · ‎12-28-2016

@Shihab Hive view calls GetResultSetMetadata without verifying query completion with GetOperationStatus. This results in an error status from hive server. This is https://issues.apache.org/jira/browse/AMBARI-13575 bug that supposed to be fixed in 2.2.0. However, you may encounter a variation of the same issue. There is no issue to track it. You should login to https://issues.apache.org/jira/browse/AMBARI-19313?jql=project%20%3D%20AMBARI and submit the issue with proper documentation. If you are still on Ambari 2.2.2.2, try to upgrade to a more recent version. I have seen the error you describe in 2.2.x versions, but I haven't seen it on 2.4.x. I recall seeing some similar issues with 2.3.x as well.

cstanca · ‎12-28-2016

@ Patrick Wong The reason is that your user home path contains a space. You need to "qualify" that with quotes around that path with space included. You should do that in all XML files in the hdp directory with the argument -Dhadoop.id.str, all services started. To generalize the solution proposed, let's say a service fails with the error, search for servicename.xml in C:\hdp folder. Let's say namenode fails: 1. search for namenode.xml in C:\hdp 2. open the xml file in text editor 3. find the key -Dhadoop.id.str in the file 4. Put the value in quotes if there is a space. Eg. -Dhadoop.id.str=My Name to -Dhadoop.id.str="My Name"

cstanca · ‎12-28-2016

Set the retention and aggregations timeline for YARN logging. YARN logs, including summary will be removed after the set timeline. Go to https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml and search for "log" and you will find the parameters that can allow you manage log retention, whether is detailed log or aggregated. I think you are looking for yarn.resourcemanager.max-completed-applications, which is set to 10,000 by default. You could change that value via Ambari. You may have to restart Ambari. Test in a development or test environment to determine the best combination of configurations for your case. A good blog, even a bit old: http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

cstanca · ‎12-28-2016

For equal rowkey, random, otherwise, sorted by rowkey as a String. Same behavior in Hadoop 1.X and 2.X.

cstanca · ‎12-28-2016

@Roberto Sancho Are you running HDP or CDH? What version? Which tutorials? Please provide links to tutorial. Check your libraries versions and alignment. Also, your script seems to pass a null value where a value is expected. Validate input.

cstanca · ‎12-28-2016

@vikas reddy Most likely you have mis-matching versions. Check lucene library.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: How GIT & Jenkins are related to Hadoop/Spark ...

Re: Block Level Compression - lzo

Re: How does the XML Ingest Mapper for Hadoop-Solr...

Re: R pkg i have reshape2 already there by mistake...

Re: H190 Unable to fetch results metadata error.ne...

Re: HDP 2.4.2 Windows Server 2012 - Service 'Apach...

Re: Removing YARN job summary

Re: Hadoop MapReduce sorting order

Re: error Python in hue 3.9

Re: Could not instantiate event serializer in Flum...