Member since
09-24-2015
47
Posts
21
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13384 | 06-07-2017 09:09 PM | |
607 | 03-28-2017 04:46 PM | |
639 | 12-08-2016 10:33 PM | |
644 | 11-15-2016 05:41 PM | |
2356 | 09-23-2016 04:26 PM |
07-28-2017
05:17 PM
The sandbox is intended to be run on a desktop with a NAT network interface. It's really not designed to be on a server with multiple people accessing it, and using it like this will likely result in errors, warnings, difficulty accessing services, etc. For a "shared sandbox", the best option is probably to run the sandbox in a cloud environment such as AWS. This is described at https://community.hortonworks.com/articles/103754/hdp-sandbox-on-aws-1.html. If you'd still like to give it a try in your environment, just be aware that there are several ports that have to be forwarded in order to access the services / components of the Sandbox. Here are a couple of links that should help. Default Sandbox port forwards - https://hortonworks.com/tutorial/hortonworks-sandbox-guide/section/3/ Port forwarding guide - https://hortonworks.com/tutorial/sandbox-port-forwarding-guide/
... View more
07-27-2017
09:40 PM
1 Kudo
@Shubham Saxena Cloudbreak does support the "configurations" section, below is an example of one I have running in AWS right now. You might verify that you don't have incorrect or extraneous characters in there somewhere, and that the blueprint is otherwise formatted correctly. For proper formatting and structure, reviewing https://cwiki.apache.org/confluence/display/AMBARI/Blueprints#Blueprints-BlueprintStructure might be helpful. {
"host_groups": [
{
"name": "host_group_master_1",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "HISTORYSERVER"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "NAMENODE"
},
{
"name": "OOZIE_SERVER"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "FALCON_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "MAPREDUCE2_CLIENT"
}
],
"cardinality": "1"
},
{
"name": "host_group_master_2",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "PIG"
},
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "HIVE_SERVER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "HIVE_METASTORE"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "MYSQL_SERVER"
},
{
"name": "MAPREDUCE2_CLIENT"
},
{
"name": "RESOURCEMANAGER"
},
{
"name": "WEBHCAT_SERVER"
}
],
"cardinality": "1"
},
{
"name": "host_group_master_3",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_SERVER"
},
{
"name": "APP_TIMELINE_SERVER"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "HBASE_MASTER"
},
{
"name": "HBASE_CLIENT"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "SECONDARY_NAMENODE"
}
],
"cardinality": "1"
},
{
"name": "host_group_client_1",
"configurations": [],
"components": [
{
"name": "ZOOKEEPER_CLIENT"
},
{
"name": "PIG"
},
{
"name": "OOZIE_CLIENT"
},
{
"name": "HBASE_CLIENT"
},
{
"name": "HCAT"
},
{
"name": "KNOX_GATEWAY"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "FALCON_CLIENT"
},
{
"name": "TEZ_CLIENT"
},
{
"name": "SLIDER"
},
{
"name": "SQOOP"
},
{
"name": "HDFS_CLIENT"
},
{
"name": "HIVE_CLIENT"
},
{
"name": "YARN_CLIENT"
},
{
"name": "METRICS_COLLECTOR"
},
{
"name": "MAPREDUCE2_CLIENT"
}
],
"cardinality": "1"
},
{
"name": "host_group_slave_1",
"configurations": [],
"components": [
{
"name": "HBASE_REGIONSERVER"
},
{
"name": "NODEMANAGER"
},
{
"name": "METRICS_MONITOR"
},
{
"name": "DATANODE"
}
]
}
],
"Blueprints": {
"blueprint_name": "hdp-small-default",
"stack_name": "HDP",
"stack_version": "2.6"
}
}
... View more
07-27-2017
09:09 PM
@Wendy Lam Can you detail what sort of setup you have? In general, the Sandbox will run as a completely standalone environment within either Virtualbox or VMware and there should not be a need to configure ports or IP addresses. For example, if you import the Sandbox into VMware using the directions at https://hortonworks.com/tutorial/sandbox-deployment-and-install-guide/section/2/, once the Sandbox starts up you will see a screen that says something along the lines of "To initiate your Hortonworks Sandbox session, please open a browser and enter this address in the browser's address field: http://192.168.10.150:8888/". At that point you should be able to put the address in a browser and connect to the Sandbox. If you are running a firewall of some kind on your local PC, try temporarily disabling it to see if that resolves the problem.
... View more
07-20-2017
07:21 PM
@umair ahmed The hostname would be the actual host name of the Exchange server. According to the documentation: "Network address of Email server (e.g., pop.gmail.com, imap.gmail.com . . .)". Hope this helps, and please accept the answer if it was useful.
... View more
06-08-2017
07:42 PM
@Ir Mar Starting with HDP 2.6, you can use Workflow Designer to design and schedule work flows, including Spark jobs. Documentation is at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_workflow-management/content/ch_wfm_basics.html. Alternatively, you can use Oozie to schedule Spark workflows, and details around that can be found at https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_spark-component-guide/content/ch_oozie-spark-action.html. Hope this helps, and please remember to upvote / accept the answer if you found this useful.
... View more
06-07-2017
09:09 PM
1 Kudo
@bigdata.neophyte Here are a few answers for you: NiFi can be interacted with via the UI as well as its REST API. The API is documented at https://nifi.apache.org/docs/nifi-docs/rest-api/index.html. NiFi is primarily a data flow tool whereas Kafka is a broker for a pub/sub type of use pattern. Kafka is frequently used as the backing mechanism for NiFi flows in a pub/sub architecture, so while they work well together they provide two different functions in a given solution. NiFi does have a visual command and control mechanism, while Kafka does not have a native command and control GUI Apache Atlas, Kafka, and NiFi all can work together to provide a comprehensive lineage / governance solution. There is a high level architecture slide at https://hortonworks.com/apache/atlas/#section_2 as well as a tutorial that might help this make more sense at https://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/. Data prioritization, back pressure, and balancing latency and throughput are all within NiFi's many strong points and these can be leveraged easily. Kafka does really not provide data prioritization. Security aspects of both Kafka and NiFi are tightly integrated with Apache Ranger, take a look at https://hortonworks.com/apache/ranger/ for additional details. Hope this helps, and please accept the answer if this was helpful.
... View more
06-02-2017
04:16 PM
@Badshah Rehman There is a great article around NiFi performance that covers several tuning aspects including disk partitioning at https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html, I think it should provide you with all of the info you need.
... View more
05-31-2017
08:38 PM
1 Kudo
@Saikrishna Tarapareddy Yes, you should be able to. Take a look at this HCC article and see if it helps: https://community.hortonworks.com/articles/98394/accessing-data-from-osi-softs-pi-system.html.
... View more
05-31-2017
05:55 PM
@Naveen Keshava It is possible to use S3 as the storage for Hive, for example uses refer to the documentation at https://docs.hortonworks.com/HDPDocuments/HDCloudAWS/HDCloudAWS-1.14.1/bk_hdcloud-aws/content/s3-hive/index.html.
... View more
05-30-2017
06:16 PM
@Sunil Neurgaonkar NiFi does not support this right now, but you might look at something like putting a proxy or a load balancer in front of NiFi that can remap the URL as needed.
... View more
05-24-2017
09:45 PM
@Vishal Prakash Shah See if this blog post helps - https://sharebigdata.wordpress.com/2016/06/12/hive-metastore-internal-tables/. Keep in mind that these are not HiveQL queries but rather queries to the underlying database.
... View more
05-24-2017
09:33 PM
@MB If you're getting a DNS error, that needs to be resolved either by configuring DNS for the hosts or by manually adding the host info to /etc/hosts on each node before you retry the cluster installation. Same goes for the repositories if you're using local repos.
... View more
05-17-2017
09:27 PM
1 Kudo
@Phoncy Joseph Read through this post, it has some good insight into methods of copying Hive data to S3: https://community.hortonworks.com/questions/39405/options-for-copying-hive-data-to-s3.html. This can be scheduled using Oozie, or if you need additional functionality, Falcon can be used to build a more complex data pipeline. Also be aware of a new feature called Ambari Workflow Manager, which is available now. Refer to http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_workflow-management/content/ch_wfm_basics.html for details. Please don't forget to accept this answer if you find it helpful.
... View more
05-17-2017
08:54 PM
@Prameela Janardanan the error suggests that you are missing the host and port parts of the connection string. A properly formatted connection (using Beeline as an example) would look similar to this: beeline -u jdbc:hive2://localhost:10000/default -n scott -w password_file. The bolded parts are what you are receiving the error about, the connect string needs to contain the hostname of the server you are connecting to (localhost in the example) and the port number (10000 in this example, which is the default port for HiveServer2). Tf you find this post helpful, please don't forget to "accept" the answer.
... View more
04-11-2017
08:22 PM
@jason cafarelli Ambari, Ranger and Zeppelin UIs should work when proxied through Knox, but support for other UIs is not supported as of yet. See https://issues.apache.org/jira/browse/KNOX-628 for details.
... View more
03-28-2017
05:00 PM
@Jonathan T If you have the VM up and running properly now, there are several tutorials at https://hortonworks.com/tutorials/. They cover all kinds of scenarios from data ingest to processing and visualization, so they are very helpful for learning Hadoop basics.
... View more
03-28-2017
04:55 PM
One additional data point - although Safari does see the file and downloads it, when you open the CSV file (it's only 15 bytes) it just has a "404 not found" and no data in it.
... View more
03-28-2017
04:46 PM
2 Kudos
@Anishkumar Valsalam There is a good tutorial with sample flow templates available at https://hortonworks.com/hadoop-tutorial/learning-ropes-apache-nifi/. There are also several workflow templates available at https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates that can be leveraged as a base for testing and building your own flows.
... View more
03-27-2017
09:24 PM
@Nilesh Some documentation for Kafka integration can be found at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_storm-component-guide/content/storm-ingest-kafka.html, and included in that are details around using Kafka with various other components. Examples: to ingest data from HDFS - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_storm-component-guide/content/storm-ingest-hdfs.html; to write data to HBase - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_storm-component-guide/content/storm-write-to-hbase.html. These examples use Storm but similar patterns can be accomplished with NiFi. Additional Kafka documentation can be found at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_kafka-component-guide/content/ch_introduction_kafka.html. Hope this helps, and if not please provide more details in your question so that a better answer can be given.
... View more
03-22-2017
08:46 PM
@ashutosh parekh How much RAM are you allocating to the HDP Sandbox VM? It really needs 8-10GB to run well, otherwise you may see very slow (or no) response. After you launch the VM, it can take a few minutes for it to spin up and get all of the Sandbox services started so if you're trying to connect immediately on startup, you might wait a couple of minutes before connecting. Also, try going to http://127.0.0.1:8080 just to see if you even get the Ambari login page.
... View more
03-15-2017
09:17 PM
@ Jayagopal Venugopal Who is the owner / group for /var/hadoop/hdfs/namenode/? It looks like you are getting a permission denied error on the directory, causing the name node to not start. Can you verify ownership and permissions?
... View more
03-15-2017
08:53 PM
@ Srinivas Santhanam Not sure if this will help, but have you tried using the --files option to pass the Python script? See the answer here for more details: https://community.hortonworks.com/comments/41935/view.html.
... View more
03-15-2017
08:47 PM
@Joby Johny Have you looked into Solr? If what you need is an open sourced index and search tool, it might be a good fit. Solr has seen some adoption in the AML space, for example it is a component of the SAS AML solution.
... View more
03-02-2017
10:51 PM
@Sachin Ambardekar There is documentation at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/ch_hardware-recommendations_chapter.html that discusses overall cluster planning. Things like memory sizing, configurations for different types of nodes (masters vs. workers), and other hardware considerations are detailed at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_cluster-planning/content/server-node.1.html.
... View more
12-08-2016
10:33 PM
@justlearning There are a handful of documents and examples to get you started using Oozie, here are a few: Hortonworks Oozie documentation - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_data-movement-and-integration/content/ch_data_movement_using_oozie.html Apache Oozie documentation - http://oozie.apache.org/docs/4.2.0/ Oozie Quick Start - http://oozie.apache.org/docs/4.2.0/DG_QuickStart.html Oozie examples - http://oozie.apache.org/docs/4.2.0/DG_Examples.html
... View more
11-15-2016
05:41 PM
1 Kudo
@viral Fichadiya The only way to install and manage HDP on Windows is via the MSI installer as described here - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2-Win/bk_HDP_Install_Win/content/ch_deploying.html. The GUI component, Ambari, is not available for the Windows platform. To get a feel for Ambari, I would encourage you to either try out the Linux installation of HDP or download the HDP Sandbox if possible. I think you'll find it to be a better overall experience.
... View more
09-23-2016
04:26 PM
4 Kudos
@Viraj Vekaria The list of operating systems that HDP 2.5 supports can be found at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_release-notes/content/ch01s02s01.html. Ambari has a web-based UI that will be consistent from OS to OS and since the cluster is typically primarily managed via Ambari, your choice of OS should be based upon your environment or personal preferences rather than ease of cluster management.
... View more
09-21-2016
09:14 PM
1 Kudo
@ Jasper How much RAM is in the host machine? The newer sandbox has a lot of components in it and typically needs about 8GB of RAM allocated to it in order to run right. One other thing I've noticed is that the 2.5 sandbox takes longer to start than the earlier ones, but after 6+ minutes you should see some progress.
... View more
09-13-2016
05:28 PM
@Navin Lada You shouldn't be having an issue with Chrome, I've personally used it many times to download the sandbox. One thing to check - there are two sandboxes, one for VMWare and one for Virtualbox. Not to insult your intelligence, but have you verified that you are downloading the correct version? That's the first thing that comes to mind.
... View more