About cstanca

cstanca · ‎03-31-2018

@rdoktorics Richard, Cloudbreak 1.16.5 is the version that is presented on hortonworks.com website, Software Download/ HDP section. However, documentation shows Cloudbreak 2.4 (see https://docs.hortonworks.com/). Is that right? Where should Leszek go to download Cloudbreak 2.4 for his HDP 2.6.4 installation requirement?

cstanca · ‎03-30-2018

@Leszek Leszczynski It could be a bug. The reason is well explained in this article, however, in your case, data node services are not present as expected. https://community.hortonworks.com/articles/12981/impact-of-hdfs-using-cloudbreak-scale-dwon.html I'll escalate the question to Cloudbreak team.

cstanca · ‎03-30-2018

@Alex Woolford Look at: https://cwiki.apache.org/confluence/display/KAFKA/KIP-103%3A+Separation+of+Internal+and+External+traffic https://issues.apache.org/jira/browse/KAFKA-4565 If helpful, please vote and accept best answer.

cstanca · ‎03-28-2018

@Saikrishna Tarapareddy The community processor mentioned by Tim is a good example on how to write a custom processor. It is limited to Put action and quite old. You would have to rebuild it using more up-to-date libraries. Community processors are not supported by Hortonworks.

cstanca · ‎03-28-2018

@Mushtaq Rizvi Yes. Please follow the instructions on how to add HDF components to an existent HDP 2.6.1 cluster: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1/bk_installing-hdf-on-hdp/content/upgrading_ambari.html This is not the latest HDF, but it is compatible with HDP 2.6.1 and I was pretty happy with its stability and recommend it. You would be able to add Apache NiFi 1.5, but also Schema Registry. NiFi Registry is part of the latest HDF 3.1.x, however, you would have to install it in a separate cluster and it is not worth it the effort for what you are trying to achieve right now. I would proceed with HDP upgrade when you are ready for HDF 3.2 which will be probably launched in the next couple months. In case that you can't add another node to your cluster for NiFi, try to use one of the nodes that has low CPU utilization and some disk available for NiFi lineage data storage. It depends on how much lineage you want to preserve, but you should be probably fine with several tens of GB for starters. If this response helped, please vote and accept answer.

cstanca · ‎03-28-2018

@Saikrishna Tarapareddy Unfortunately, there is no specialized processor to connect to Google Big Query and execute queries. There has been some discussions about a set of new processors to support various Google Cloud services, but those processors are still to be planned into a release. Until then you can use ExecuteScript processor. Here is an example on how to write a script using Python: https://cloud.google.com/bigquery/create-simple-app-api#bigquery-simple-app-print-result-python . At https://cloud.google.com/bigquery/create-simple-app-api you can see other examples using other languages also supported by ExecuteScript processor. Obviously, there is always the possibility to develop your own processor leveraging the Java example provided by Google doc. Example of how to build NiFi custom processor: https://community.hortonworks.com/articles/4318/build-custom-nifi-processor.html If this response addressed reasonably your question, please vote and accept answer.

cstanca · ‎03-27-2018

@Christian Lunesa As you probably know, the 500 Internal Server Error is a very general HTTP status code that means something has gone wrong on the website's server, but the server could not be more specific on what the exact problem is. You need to provide more information. There could be many reasons. 1) Does it happen with other tables? 2) Is your cluster kerberized? 3) Did you check Ambari server log for more details?

cstanca · ‎03-27-2018

@Mushtaq Rizvi As you already know, in addition to the API, Atlas uses Apache Kafka as a notification server for communication between hooks and downstream consumers of metadata notification events. There is no other Notification Server capability like SMTP. You would have to write your own filtering through events for those tables that you are interested. That is your presented option 2. You may not like it, but this is the best answer as of now. If you had NiFi you could easily write that Notification Server by filtering the events based on a lookup list of tables. With latest versions of NiFi you can take advantage of powerful processors like LookupRecord, QueryRecord, also processors around SMTP, email etc.

cstanca · ‎03-27-2018

@Gubbala Sasidhar No, it is not enough to do it only with Kafka and HBase. Kafka is your transport layer and HBase is your target data store. You need few more components to connect to the source, post to Kafka, post to HBase. In order to read Oracle DB log file you need a tool capable to perform Change Data Capture (CDC) from Oracle DB logs. That tool then would write to a Kafka topic. That is your "Kafka Producer" application. Then you would need to write an application that will read from Kafka topic and put the data to HBase. That is your "Kafka Consumer" application. Example of CDC capable tools are GoldenGate, SharePlex, Attunity etc. If you need a tool that will be used enterprise wide to connect to various source types, e.g. Oracle, SQL Server, MySQL, etc. and access database logs instead of issuing expensive queries on source databases, then Attunity is probably your best bet. However, if you don't plan to acquire and you already have GoldenGate or SharePlex then use those. For example, SharePlex writes directly to Kafka. Another option with Oracle would be to use its Change Data Capture feature (https://docs.oracle.com/cd/B28359_01/server.111/b28313/cdc.htm) and then write that Kafka Producer application to gather the data from the source and write to Kafka topic. Then have your consumer application pick-up the data and put to HBase. Apache NiFi will add this year a CDC processor for Oracle. Currently, NiFi has only the MySQL CDC processor. If you want to make your life easier, use Apache NiFi (part of Hortonworks DataFlow) to implement Kafka Producer, Kafka Consumer, write to HBase. I see that you tagged your question with kafka-streams. You probably assume writing that Kafka Producer and Consumer using Kafka Stream, That is an alternate option to NiFi, but it will require more programming and it will require you to deal with HA and security aspects, while NiFi provides them out of box and developing a NiFi flow is much easier. NiFi has also Registry component which allows you to manage versions of the flows like source code. Hortonworks Schema Registry provides you with that structure that allows your Kafka producer and consumer applications to share schemas. If this response helped, please vote and accept it as the best answer, if appropriate.

cstanca · ‎03-12-2018

@Jane Becker Happy it worked out. Enjoy the rest of the week-end!

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Cloudbreak downscale fails when no Datanodes i...

Re: Cloudbreak downscale fails when no Datanodes i...

Re: how to configure Kafka so it's listening on tw...

Re: How to execute google big query from NiFi.?

Re: Tracking of Hive tables metadata changes in re...

Re: How to execute google big query from NiFi.?

Re: How do I get rid of failed to fetch table info...

Re: Tracking of Hive tables metadata changes in re...

Re: Streaming data from oracle data base logflie ...

Re: Ambari Files View - Service hdfs check failed