About DennisJaheruddi

DennisJaheruddi · ‎05-06-2019

This seems relevant: In Python 2, unicode objects can only be printed if they can be converted to ASCII. If it can't be encoded in ASCII, you'll get that error. You probably want to explicitly encode it and then print the resulting str: print post.text.encode('utf-8')

DennisJaheruddi · ‎05-06-2019

Since the question was asked, the situation has changed. As soon as Hortonworks and Cloudera merged, NiFi became supported by Cloudera. Shortly after the integrations with CDH were also completed, so that NiFi is now a fully supported and integrated component. Hence the question already contains the answer: Please look into NiFi for solving this usecase.

DennisJaheruddi · ‎04-10-2019

Assuming you want to access the data via spark, then the main question is how it should be stored. For this Drill is not supported, but Hive tables and Kudu are supported by Cloudera. Now it boils down to whether you want to store the data in Hive or in Kudu, as Spark can work with both of these. If you want to insert your data record by record, or want to do interactive queries in Impala then Kudu is likely the best choice. If you want to insert and process your data in bulk, then Hive tables are usually the nice fit.

DennisJaheruddi · ‎04-10-2019

Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.) If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.

DennisJaheruddi · ‎04-09-2019

There is something very unusual happening here. Based on your outputs, values are not only ending up in the wrong columns, but you are even getting different values! In the 'correct' record, you have 5686.76, and in the 'wrong' record you have -5686.76. My first guess was that there is a mistake in how you send data to the appropriate columns, but I don't see how that can explain a minus sign changing position. To troubleshoot something like this, it is really important to dig into the details. I would therefore recommend you to bring your question down to a 'Minimal reproducible example'. Eliminating any complexity that is not causing unexpected results. For example: You show a load command to get data into spark, consider replacing it with an actual string (and make sure to check whether the string allows you to reproduce the problem). You also show 2 writes, but if we have the exact input and code to reproduce the problem the correct answer is probably not relevant. Also, you use some code to list columns, consider hardcoding it first. As mentioned, really try to take out all complexity untill we land on a minimal amount that still reproduces the problem. Hopefully you will already see the answer once you have eliminated all the distractions, and if not you will have a fully trimmed down version, which you can use to update your question here!

DennisJaheruddi · ‎04-09-2019

This question is a bit broad, and simultaneously quite dependent on your exact situation. I therefore recommend you to contact your cloudera contact person for a more in-depth answer. However, what I can say is the following: Regarding your second question there is a nice answer here: https://community.cloudera.com/t5/Data-Ingestion-Integration/Flume-without-agents-on-web-server/m-p/40776/highlight/true#M1550 In short, you will want 'something' to push the data off the webserver, (for instance a flume, or a MiNiFy agent) assuming your webserver does not already publish the mesages to a bus like Kafka. In general the solution that you use for moving data from the webserver to the cluster should also work in the opposite direction.

DennisJaheruddi · ‎01-17-2019

If you want to be sure whether one of the two components is updating the password, consider checking the md5sum before and after you change it (and make sure to change it to a different value).

DennisJaheruddi · ‎01-15-2019

Listfile has the option to define a File Filter and a Path Filter. From your explanation I would expect that you need to define the Path Filter. In the Path Filter you can place a regular expression (in Java syntax). For regex there are a lot of resources available, I think something like this should do the trick: (?!(B|C)).*

DennisJaheruddi · ‎01-15-2019

First of all doublecheck all configurations (incl. password). Just to avoid moving in the right direction. Secondly confirm that you do not need TLS enabled. If these don't help, the following might help with troubleshooting: 1. Become nifi on the node where nifi is running 2. Send the message via Python 3. Share the python command here Note: Please explicity specify all things that you configure in nify when executing python (even if they are not needed because of good defaults for instance).

DennisJaheruddi · ‎01-15-2019

@Louis Allen Just to confirm: Is the server where SQL runs kerberized, and have you set up a relaton of trust?

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: Near/real-time Outlook email ingestion

Re: Near/real-time Outlook email ingestion

Re: Which one is best Hive vs Impala vs Drill vs K...

Re: What is a reasonable value for "--fetch-size" ...

Re: Unable to map the data properly from a CSV fil...

Re: Real time campaign

Re: User:amb_ranger_admin credentials on Ambari UI...

Re: Nifi Listfile - Exclude Directories

Re: NiFi PutEmail Error

Re: Access SQL Server via Nifi using Windows Authe...