Member since
08-23-2016
261
Posts
201
Kudos Received
106
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
922 | 01-26-2018 07:28 PM | |
768 | 11-29-2017 04:02 PM | |
27263 | 11-29-2017 03:56 PM | |
1690 | 11-28-2017 01:01 AM | |
398 | 11-22-2017 04:08 PM |
09-08-2017
04:25 PM
1 Kudo
Hi @Pingping Shang Ambari Infra is a specialized deployment meant for internal consumption for Ambari. The recommended approach would be to use HDP Search if you want to index your own data.
... View more
09-08-2017
03:15 PM
@Yuki Iwasaki From your screenshots, its looks like you can get to the docker port on 12122. If you login to that one via SSH, you can follow the steps in the link I shared with you above to start the HDF sandbox VM, and try that.
... View more
09-07-2017
03:20 PM
Hi @Yuki Iwasaki does SSH work? The HDF Sandbox had a quirk in the past that if the VM was ever shutdown, then it would need to be restarted using the scripts within the Docker host. If this is the case, try suspending/pausing instead of shutting the machine down. See the post here for a detailed walk through in case it resolves your issue: https://community.hortonworks.com/questions/103790/hdf-sandbox-strange-issue.html#answer-103794
... View more
09-06-2017
03:36 PM
Hi @Sanaz Janbakhsh You could probably achieve that by combining processors. Use the Tika-based processor to extract everything from the pdf in txt form, and then use another processor (ExtractText with RegEx to find your content for example) to extract the specific text you want, and decide what to do with that content from there.
... View more
09-05-2017
03:18 PM
@John T apart from using FIFO priorotizization config on all of your connections, have you looked at the EnforceOrder processor in the latest version of NiFi? I think it does what you want? https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.3.0/org.apache.nifi.processors.standard.EnforceOrder/index.html
... View more
09-05-2017
03:09 PM
Hi @Shota Akhalaia Most of the deployments I have been involved with has seen these services be installed on bare metal machines also as a lot of organizations tend to do HDP and HDF together, but, I think these should be ok to virtualize as well.
... View more
09-01-2017
03:23 PM
Hi @Manikandan Urithevan You can always edit your post and remove anything that was not safe to post.
... View more
09-01-2017
03:17 PM
1 Kudo
hi @Shota Akhalaia The master services for the various tech are not usually overly IO heavy, and therefore, can be virtualized (and backed by SAN) without too much of an issue, including the NN but also the master services for the other technologies within the platform. Keeping the worker nodes on physical can help you to maximize your cluster's performance.
... View more
09-01-2017
03:09 PM
Hi @YoungJoon Ji There is a good article here explaining NiFi's zero-master clustering: https://hortonworks.com/blog/apache-nifi-1-0-0-zero-master-clustering/ You may want to look at the tutorials online to get familiar with NiFi and the processors: https://hortonworks.com/tutorial/realtime-event-processing-in-hadoop-with-nifi-kafka-and-storm/
... View more
08-31-2017
05:11 PM
Hi @Sanaz Janbakhsh I just did a quick test using GetFile to ingest a PDF, and used the custom processor as is without any configuration. I then used a PutFile to drop the output of the Extracted text to a dir. As expected, the output is the text lifted from the original PDF, in a text file format. No special configuration required. If you are looking to play with the metadata using Tika, you can look at the ExtractMediaMetadata processor which comes with modern versions of NiFi out of the box and uses Tika under the hood.
... View more
08-31-2017
03:51 PM
@Samir Sinha Happy to help. If this solved the problem, perhaps you can Accept the answer above.
... View more
08-31-2017
03:01 PM
2 Kudos
Hi @Samir Sinha If you are using a 2.6 based Sandbox, the Druid service should be there already but needs to be installed. My 2.6 Sandbox is from 05_05_2017 based on Sandbox filename, and it contains the Druid service but not installed by default. In Ambari, on the bottom left, click the Add Service button, choose Druid from the list of services available, and continue through the setup.
... View more
08-30-2017
09:26 PM
Hi @Sravanthi Bellamkonda You can run the following command to list/locate corrupt blocks: hdfs fsck -list-corruptfileblocks If these blocks are not critical data, then just deleting them might solve your problem: hdfs fsck / -delete There is a good thread here with links to more detail on finding/removing corrupt HDFS blocks here: https://community.hortonworks.com/questions/17917/best-way-of-handling-corrupt-or-missing-blocks.html
... View more
08-30-2017
09:09 PM
1 Kudo
Hi @Alexander Carreño I've run into this a couple of times myself. Usually just closing the tab, and coming back in allows me to access the post again. It could be issues following the upgrade from a couple of weeks ago, not sure.
... View more
08-29-2017
03:05 PM
Hi @John Koop I saw the screenshots, screenshot #1 looks good. Once you are there, can you access Ambari on http://127.0.0.1:8080 ? If so, you can start the tutorials, the welcome page on port 8888 is not really mandatory.
... View more
08-28-2017
06:47 PM
Hi @John Koop On your local laptop, you can use a text editor add an entry to your hosts file (/etc/hosts if you are using a *nix or Mac machine) that looks like the following: 127.0.0.1 sandbox.hortonworks.com sandbox
This allows your local laptop/PC to resolve "sandbox.hortonworks.com" in a browser to the IP address for your local host of 127.0.0.1. When you import the appliance to VirtualBox, you can start the VM and as you have already noted, boot into the third option for Linux OS's to start the Hortonworks Sandbox Software. When the software boots, you should see a screen that advises you to go to the start page of http://127.0.0.1:8888 or http://sandbox.hortonworks.com:8888 This start page contains some introduction material, as well as links to other areas like Ambari (http://sandbox.hortonworks.com:8080). You should be able to follow the tutorial once you are booted into the corrected OS (third one), and able to bring the screens in a browser.
... View more
08-24-2017
10:54 PM
hi @L Hurley Are you specifying the port? http://127.0.0.1:8080/ When you are logged in as root@sandbox, try also running: ambari-server status and ensure its running.
... View more
08-24-2017
10:49 PM
1 Kudo
Hi @Ramya Jayathirtha Try the following: create table movies( movieid int, title String, genre string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' This should resolve your issue as the OpenCSVSerde's default properties should work for your use case.
... View more
08-24-2017
03:07 PM
You'll need to check the entire error message, typically the root cause is somewhere near the bottom
... View more
08-24-2017
03:06 PM
Hi @L Hurley From the snippet you pasted, you are already logged in! You are logged into the sandbox as root, so ssh'ing in a second time would surley result in an error. root@sandbox
... View more
08-23-2017
04:07 PM
Hi @Prasad T The URL config doesnt look like it is correct: <url>http://http://myhost:18081/</url>
... View more
08-23-2017
04:05 PM
hi @L Hurley what you want to do is open the known_hosts file in a text editor and delete any lines that correspond with the IP address of the sandbox, which in your case is 127.0.0.1 and then try it again
... View more
08-23-2017
02:55 PM
hi @Ashnee Sharma You'll likely need to go through the whole error message, the underlying root cause will be in the error/exception message. Typically, you can see that when the data node storage is full, or when network communication was severed.
... View more
08-22-2017
02:35 PM
hi @nfleming No I am not claiming that. You can change the property within the same session as you see fit.
... View more
08-21-2017
10:57 PM
1 Kudo
Looks like the key has changed. On your local system, go into .ssh/known_hosts and just delete any lines related to the sandbox. Then try the ssh again on -p 2222 and you'll be prompted to re-accept the key and should be good to go.
... View more
08-21-2017
09:21 PM
Hi @L Hurley Can you try the ssh command again but use port 2222 ? 8080 is usually the default web port for the Ambari UI.
... View more
08-21-2017
09:15 PM
Hi @Shyam Shaw Are you using the Hortonworks Data Cloud product? I believe that setup if only for the Ambari DB if you are not using the Hortonworks Data Cloud product. The supported databases for HDP (and not HDC) are listed here: https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-SupportedBackendDatabasesforMetastore
... View more
08-21-2017
03:44 PM
1 Kudo
Hi @scottgr Writing scripts is usually the most popular way. The scripts can be automated through a number of ways including oozie/workflow manager UI in Ambari. While it is a common task, each use case is different. We can't make a general assumption about any cluster since to some people, older data may be more important than newer data depending on the data set. It gets even further complicated in multi-tenant datalakes as you can imagine.
... View more
08-21-2017
03:40 PM
Hi @Timo Burmeister You may find this tutorial helpful to follow. https://hortonworks.com/hadoop-tutorial/searching-data-solr/
... View more
08-21-2017
03:25 PM
1 Kudo
Hi @zkfs I haven't seen a query to do that yet in Hive. Instead, you can query the hive metastore for the information, though be mindful that queries run directly against the metastore could impact your hive performance and are not recommended for production systems. Look at the TBL_PRIVS and TBLS within the hive DB in the metastore, joining these on the TBL_ID may give you the table view you are looking for. You can probably construct a similar metastore query to look at it from a PRINCIPAL_TYPE (role) as well.
... View more