Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5181 | 09-21-2018 09:54 PM | |
6594 | 03-31-2018 03:59 AM | |
2001 | 03-31-2018 03:55 AM | |
2207 | 03-31-2018 03:31 AM | |
4908 | 03-27-2018 03:46 PM |
03-31-2018
12:01 AM
4 Kudos
@rdoktorics Richard, Cloudbreak 1.16.5 is the version that is presented on hortonworks.com website, Software Download/ HDP section. However, documentation shows Cloudbreak 2.4 (see https://docs.hortonworks.com/). Is that right? Where should Leszek go to download Cloudbreak 2.4 for his HDP 2.6.4 installation requirement?
... View more
03-30-2018
07:00 PM
4 Kudos
@Leszek Leszczynski It could be a bug. The reason is well explained in this article, however, in your case, data node services are not present as expected. https://community.hortonworks.com/articles/12981/impact-of-hdfs-using-cloudbreak-scale-dwon.html I'll escalate the question to Cloudbreak team.
... View more
03-30-2018
06:24 PM
5 Kudos
@Alex Woolford Look at: https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic https://issues.apache.org/jira/browse/KAFKA-4565 If helpful, please vote and accept best answer.
... View more
03-28-2018
01:32 PM
3 Kudos
@Saikrishna Tarapareddy The community processor mentioned by Tim is a good example on how to write a custom processor. It is limited to Put action and quite old. You would have to rebuild it using more up-to-date libraries. Community processors are not supported by Hortonworks.
... View more
03-28-2018
04:22 AM
3 Kudos
@Mushtaq Rizvi Yes. Please follow the instructions on how to add HDF components to an existent HDP 2.6.1 cluster: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1/bk_installing-hdf-on-hdp/content/upgrading_ambari.html This is not the latest HDF, but it is compatible with HDP 2.6.1 and I was pretty happy with its stability and recommend it. You would be able to add Apache NiFi 1.5, but also Schema Registry. NiFi Registry is part of the latest HDF 3.1.x, however, you would have to install it in a separate cluster and it is not worth it the effort for what you are trying to achieve right now. I would proceed with HDP upgrade when you are ready for HDF 3.2 which will be probably launched in the next couple months. In case that you can't add another node to your cluster for NiFi, try to use one of the nodes that has low CPU utilization and some disk available for NiFi lineage data storage. It depends on how much lineage you want to preserve, but you should be probably fine with several tens of GB for starters. If this response helped, please vote and accept answer.
... View more
03-28-2018
04:13 AM
5 Kudos
@Saikrishna Tarapareddy Unfortunately, there is no specialized processor to connect to Google Big Query and execute queries. There has been some discussions about a set of new processors to support various Google Cloud services, but those processors are still to be planned into a release. Until then you can use ExecuteScript processor. Here is an example on how to write a script using Python: https://cloud.google.com/bigquery/create-simple-app-api#bigquery-simple-app-print-result-python . At https://cloud.google.com/bigquery/create-simple-app-api you can see other examples using other languages also supported by ExecuteScript processor. Obviously, there is always the possibility to develop your own processor leveraging the Java example provided by Google doc. Example of how to build NiFi custom processor: https://community.hortonworks.com/articles/4318/build-custom-nifi-processor.html If this response addressed reasonably your question, please vote and accept answer.
... View more
03-27-2018
09:37 PM
3 Kudos
@Christian Lunesa As you probably know, the 500 Internal Server Error is a very general HTTP status code that means something has gone wrong on the website's server, but the server could not be more specific on what the exact problem is. You need to provide more information. There could be many reasons. 1) Does it happen with other tables? 2) Is your cluster kerberized? 3) Did you check Ambari server log for more details?
... View more
03-27-2018
03:46 PM
5 Kudos
@Mushtaq Rizvi As you already know, in addition to the API, Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. There is no other Notification Server capability like SMTP. You would have to write your own filtering through events for those tables that you are interested. That is your presented option 2. You may not like it, but this is the best answer as of now. If you had NiFi you could easily write that Notification Server by filtering the events based on a lookup list of tables. With latest versions of NiFi you can take advantage of powerful processors like LookupRecord, QueryRecord, also processors around SMTP, email etc.
... View more
03-27-2018
03:34 PM
6 Kudos
@Gubbala Sasidhar No, it is not enough to do it only with Kafka and HBase. Kafka is your transport layer and HBase is your target data store. You need few more components to connect to the source, post to Kafka, post to HBase. In order to read Oracle DB log file you need a tool capable to perform Change Data Capture (CDC) from Oracle DB logs. That tool then would write to a Kafka topic. That is your "Kafka Producer" application. Then you would need to write an application that will read from Kafka topic and put the data to HBase. That is your "Kafka Consumer" application. Example of CDC capable tools are GoldenGate, SharePlex, Attunity etc. If you need a tool that will be used enterprise wide to connect to various source types, e.g. Oracle, SQL Server, MySQL, etc. and access database logs instead of issuing expensive queries on source databases, then Attunity is probably your best bet. However, if you don't plan to acquire and you already have GoldenGate or SharePlex then use those. For example, SharePlex writes directly to Kafka. Another option with Oracle would be to use its Change Data Capture feature (https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm) and then write that Kafka Producer application to gather the data from the source and write to Kafka topic. Then have your consumer application pick-up the data and put to HBase. Apache NiFi will add this year a CDC processor for Oracle. Currently, NiFi has only the MySQL CDC processor. If you want to make your life easier, use Apache NiFi (part of Hortonworks DataFlow) to implement Kafka Producer, Kafka Consumer, write to HBase. I see that you tagged your question with kafka-streams. You probably assume writing that Kafka Producer and Consumer using Kafka Stream, That is an alternate option to NiFi, but it will require more programming and it will require you to deal with HA and security aspects, while NiFi provides them out of box and developing a NiFi flow is much easier. NiFi has also Registry component which allows you to manage versions of the flows like source code. Hortonworks Schema Registry provides you with that structure that allows your Kafka producer and consumer applications to share schemas. If this response helped, please vote and accept it as the best answer, if appropriate.
... View more
03-12-2018
02:25 AM
1 Kudo
@Jane Becker Happy it worked out. Enjoy the rest of the week-end!
... View more