About myoung

myoung · ‎02-24-2017

@Mourad Chahri If you are looking to demo just NiFi on an HDP cluster, then you can use this: https://github.com/abajwa-hw/ambari-nifi-service It is unsupported and not intended for production uses. If you want to deploy Hortonworks Data Flow which includes NiFi, then you currently have to install it using a separate version of Ambari from the HDP Ambari.

myoung · ‎02-22-2017

@K Aleksandra As @Sonu Sahi said, you are adding some complexity by using the Sandbox. Having said that, the default Sandbox container does not expose port 8670 (Ambari Agent) and ports 8440/8441 (Ambari Server ports). You have to expose these 3 ports on the Sandbox for the agent to be able to talk to the Sandbox. You can refer to my article on how to expose ports on the Sandbox: https://community.hortonworks.com/articles/65914/how-to-add-ports-to-the-hdp-25-virtualbox-sandbox.html

myoung · ‎02-22-2017

@regie canada You can certainly use Solr and Banana to do Big Data reporting. Some things you should to keep in mind: 1. You need to ensure you are giving enough resources to Solr in terms of memory and CPU for indexing and querying your data. In test environments this isn't a huge issue, but it will certainly be something you take into consideration for production environments. 2. You need to be careful with the dashboards and queries you use with Banana. It's running in a web browser and all of the data that you are manipulating within Banana is loaded into memory. It's relatively easy to create very taxing queries and dashboards that consume alot of memory and put a strain on Solr. Additionally, this can also cause alot of memory usage within the end-user web browser making it unresponsive. The above two points leads me to answering your problem. How much memory have you allocated to the Sandbox? The minimum memory requirement is 8GB, however 10-12GB will work much better. Are most of the components of HDP turned on in the Sandbox? All of these things take up memory and can cause the Solr JVM to run out of memory. My recommendation would be: 1. Stop any unused components in the HDP stack. This will free up some system memory. 2. Allocate more memory to the Sandbox. As I said, 10GB works much better than 8. I prefer to give it 12GB.

myoung · ‎02-17-2017

@Joseph Hawkins 1. There is nothing inherently wrong with your cluster layout for a small cluster. 2. You generally don't want to force HDFS out of safe mode. Can you determine why HDFS is in safe mode? If it is initial configuration, there shouldn't be any good reasons for it to be in safe mode. 3. I haven't found a configuration setting for HBase to wait for HDFS. The assumption is that HDFS is running properly and writable. Safe mode makes HDFS read-only which is why HBase shutsdown. How are you starting the processes? If you are doing the process start up from the command line, then you do need to ensure that HDFS comes up prior to start HBase. Is this cluster managed with Ambari? When you start services with Ambari, it will start the services in proper sequence and ensure the components are up before starting the next one.

myoung · ‎02-15-2017

@Sedat Kestepe No, formating and mounting the hard drives is not directly related to formatting HDFS. Conceptually the idea of "formating" is the same. But the two tasks are completely separate with no direct relationship. The hadoop format command does not format or mount the hard drives. The hard drives should already be formatted and mounted. When you run the format command for HDFS, it is preparing NameNode fsimage file so that it knows where all of the storage blocks are across the data disks. If you feel my answer has been helpful, please accept it to help others.

myoung · ‎02-15-2017

@Maher Hattabi The default port is 50070. To get a list of files in a directory you would use: curl -i "http://sandbox.hortonworks.com:50070/webhdfs/v1/user/root/output/?op=LISTSTATUS"

myoung · ‎02-15-2017

@Sedat Kestepe Are you performing a manual installation or are you using Ambari? I highly recommend you use Ambari if you can, as it will take care of things such as formatting HDFS. For the HDFS slave nodes, you should format the data drives individually at the OS level. You should then mount those drives individually into their own mount path, such as /grid/disk01, /grid/disk02, etc. You should not use RAID for your data drives. For the master servers, if you want to use RAID 1 to create mirrors for the namenode directories, Zookeeper directories, etc, then you should also do that at the OS level before installing HDP. Once you have created the RAID configuration for the drives, then mount them at the OS level. During the installation process with Ambari you can then specific that you want to use those OS mounted locations for the directories. To use Ambari, follow these instructions: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-installation/content/ch_Getting_Ready.html You may find this HCC article helpful: https://community.hortonworks.com/articles/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html

myoung · ‎02-15-2017

@Sedat Kestepe This process is not specific to your configuration. The normal way to interact with Solr is to define a schema configuration for a collection and then create the collection using that configuration. The data_driven_schema_config is a schemaless configuraiton. This will allow you to push data to Solr without having to explicitly define every field in advance. However, Solr always requires that you create a collection that is associated with a configuration. The easy approach is to upload the data_driven_schema_config to Zookeeper and create all of your collections using that configuration. If you use the command tool to create a collection, it will automatically upload the configuration to Zookeeper. If you have a standard config, such as data_driven_schema_configs, you can upload that config to Zookeeper. Then you can use the collection API to create collections multiple times using the same configuration. You can read more here: https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files If you just want to upload your configs to Zookeeper, use: $ bin/solr zk upconfig -n <name for configset> -d <path to directory with configset> Then you can use the Collection API with the collection.configName parameter to refer to the configuration you uploaded in Zookeeper.

myoung · ‎02-11-2017

@Sedat Kestepe If you want to create collections using the API (https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-CREATE:CreateaCollection), the configuration (collection.configName) must already be stored in Zookeeper.

myoung · ‎02-11-2017

@Greg Frair As @Sunile Manjee pointed out, the Sandbox isn't intended for long running usage. Most hadoop components can generate a lot of logging. My guess is that logging is the likely culprit. You don''t have to be heavily using the Sandbox for the components to generate logs. There are regular service checks that occur that will generate logs. When you are not using the Sandbox, you should shut it down or suspend it.

Online	Offline
Last Visited	‎02-08-2019 07:03 PM

Member Since	‎02-09-2016 09:44 PM
Last Visited	‎02-08-2019 07:03 PM
Posts	559
Kudos received	413

Cloudera Community

Re: How can I force the getTwitter processor to no...

Re: Send Ambari Metric to Elasticsearch

Re: Ingesting unformatted, unordered data from hdf...

Re: What would the audit record on Zeppelin users ...

Re: Automate loading data into HDFS

Re: Install apache NIFI with ambari on existing HD...

Re: Adding node failed in Sandbox HDP 2.5

Re: Is it fine to use SolR + Banana UI for Big Dat...

Re: HBase Master dying after startup

Re: Hadoop initial disk formatting

Re: Accessing hadoop through webhdfs

Re: Hadoop initial disk formatting

Re: Solr problem: Could not find configName for co...

Re: Solr problem: Could not find configName for co...

Re: HDP 2.5 Sandbox Keeps running out of space