Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6963 | 09-21-2018 09:54 PM | |
| 8722 | 03-31-2018 03:59 AM | |
| 2614 | 03-31-2018 03:55 AM | |
| 2754 | 03-31-2018 03:31 AM | |
| 6175 | 03-27-2018 03:46 PM |
12-29-2016
12:16 AM
3 Kudos
These tools are used similarly with any software SDLC, just you will be developing software being executed on a Hadoop/Spark cluster. You can still build your jars the same way and use GIT as your source code repository. You will be submitting the job for execution in a distributed cluster. However, there are pseudo clusters for development. For example you can use hadoop mini cluster: https://github.com/sakserv/hadoop-mini-clusters
A good reference on how to use this mini cluster for testing: http://www.lopakalogic.com/articles/hadoop-articles/hadoop-testing-with-minicluster/ For Spark development you could use Spark standalone.
... View more
12-28-2016
11:59 PM
3 Kudos
@Eric Periard Check this: https://community.hortonworks.com/questions/59122/hadoop-examples-110-snapshotjar-missing.html It would have been if the documentation was more precise. Duly noted. Thanks.
... View more
12-28-2016
11:37 PM
1 Kudo
Start here: http://lucene.apache.org/solr/quickstart.html Search for "Indexing Solr XML" and perform the steps indicated. In the end, you could browse the documents indexed at http://localhost:8983/solr/gettingstarted/browse. That is how the output you are interested looks like. Of courser, replace "localhost" with your case host in the URL. The /browse UI view defaults to assuming the gettingstarted schema and data are a catch-all mix of structured XML, JSON, CSV example data, and unstructured rich documents. Your own data may not look ideal at first, though the /browse templates are customizable.
... View more
12-28-2016
11:16 PM
2 Kudos
@rama This package makes it easy to transform data between wide and long formats. Package versions are back compatibles until a function is deprecated. If a function is deprecated one scripts may not work, but that will not mean you lost the data. Your script could throw an error if any of the functions in the script don't support the API invoked. Most likely your newer version of the package worked just fine. Assuming that you installed a newer version of the package and you still want to go to the previous version, you could revert to the previous version using the approach presented in this blog: https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages
... View more
12-28-2016
11:06 PM
1 Kudo
@Shihab Hive view calls GetResultSetMetadata without verifying query completion with GetOperationStatus. This results in an error status from hive server. This is https://issues.apache.org/jira/browse/AMBARI-13575 bug that supposed to be fixed in 2.2.0. However, you may encounter a variation of the same issue. There is no issue to track it. You should login to https://issues.apache.org/jira/browse/AMBARI-19313?jql=project%20%3D%20AMBARI and submit the issue with proper documentation. If you are still on Ambari 2.2.2.2, try to upgrade to a more recent version. I have seen the error you describe in 2.2.x versions, but I haven't seen it on 2.4.x. I recall seeing some similar issues with 2.3.x as well.
... View more
12-28-2016
10:35 PM
2 Kudos
@ Patrick Wong The reason is that your user home path contains a space. You need to "qualify" that with quotes around that path with space included. You should do that in all XML files in the hdp directory with the argument -Dhadoop.id.str, all services started. To generalize the solution proposed, let's say a service fails with the error, search for servicename.xml in C:\hdp folder. Let's say namenode fails: 1. search for namenode.xml in C:\hdp 2. open the xml file in text editor 3. find the key -Dhadoop.id.str in the file 4. Put the value in quotes if there is a space. Eg. -Dhadoop.id.str=My Name to -Dhadoop.id.str="My Name"
... View more
12-28-2016
10:07 PM
5 Kudos
Set the retention and aggregations timeline for YARN logging. YARN logs, including summary will be removed after the set timeline. Go to https://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml and search for "log" and you will find the parameters that can allow you manage log retention, whether is detailed log or aggregated. I think you are looking for yarn.resourcemanager.max-completed-applications, which is set to 10,000 by default. You could change that value via Ambari. You may have to restart Ambari. Test in a development or test environment to determine the best combination of configurations for your case. A good blog, even a bit old: http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
... View more
12-28-2016
09:18 PM
1 Kudo
For equal rowkey, random, otherwise, sorted by rowkey as a String. Same behavior in Hadoop 1.X and 2.X.
... View more
12-28-2016
09:00 PM
1 Kudo
@Roberto Sancho Are you running HDP or CDH? What version? Which tutorials? Please provide links to tutorial. Check your libraries versions and alignment. Also, your script seems to pass a null value where a value is expected. Validate input.
... View more
12-28-2016
08:48 PM
@vikas reddy Most likely you have mis-matching versions. Check lucene library.
... View more