About DennisJaheruddi

DennisJaheruddi · ‎06-04-2019

I found this resolution elsewhere: The problem is resolved after i copied the .class file from /tmp/sqoop-hduser/compile/ to hdfs /home/hduser/ and also the current working directory from where i am running sqoop. In case that does not work, this should help you get moving: Specify the --bindir where the compiled code and .jar file should be located. Without these arguments, Sqoop would place the generated Java source file in your current working directory and the compiled .class file and .jar file in /tmp/sqoop-<username>/compile. With an example: sqoop import --bindir ./ --connect jdbc:mysql://localhost/hadoopguide --table widgets

michalr · ‎05-20-2019

Hi Dennis, As mentioned in the (edited) post, the solution suggested above finally worked for me. Thanks again for the help! Regards, Michal

NAITTOU · ‎05-06-2019

Hello guys, Yeah, that was a long time ago,I managed to get the job by using the following framework : Logstash -> Kafka -> Spark

DennisJaheruddi · ‎05-06-2019

Since the question was asked, the situation has changed. As soon as Hortonworks and Cloudera merged, NiFi became supported by Cloudera. Shortly after the integrations with CDH were also completed, so that NiFi is now a fully supported and integrated component. Please look into the documentation for the latest info at any time, but in general Cloudera Manager is now able to install NiFi.

DennisJaheruddi · ‎05-06-2019

Since the question was asked, the situation has changed. As soon as Hortonworks and Cloudera merged, NiFi became supported by Cloudera. Shortly after the integrations with CDH were also completed, so that NiFi is now a fully supported and integrated component. Hence the question already contains the answer: NiFi is the Cloudera answer for solving these usecases.

DennisJaheruddi · ‎04-10-2019

Assuming you want to access the data via spark, then the main question is how it should be stored. For this Drill is not supported, but Hive tables and Kudu are supported by Cloudera. Now it boils down to whether you want to store the data in Hive or in Kudu, as Spark can work with both of these. If you want to insert your data record by record, or want to do interactive queries in Impala then Kudu is likely the best choice. If you want to insert and process your data in bulk, then Hive tables are usually the nice fit.

DennisJaheruddi · ‎04-10-2019

Though I don't know how it works exactly under the hood, I can confirm that it will work on the source DB side. (As it will definitely NOT simply pull everything from the DB, and then chop it up before writing to Hadoop.) If you are looking for the optimum, you are likely going to need some trial and error. However, as a starting point I understand that the default value is 1000, and that you may want to try 10000 as a first step towards better performance.

DennisJaheruddi · ‎04-09-2019

There is something very unusual happening here. Based on your outputs, values are not only ending up in the wrong columns, but you are even getting different values! In the 'correct' record, you have 5686.76, and in the 'wrong' record you have -5686.76. My first guess was that there is a mistake in how you send data to the appropriate columns, but I don't see how that can explain a minus sign changing position. To troubleshoot something like this, it is really important to dig into the details. I would therefore recommend you to bring your question down to a 'Minimal reproducible example'. Eliminating any complexity that is not causing unexpected results. For example: You show a load command to get data into spark, consider replacing it with an actual string (and make sure to check whether the string allows you to reproduce the problem). You also show 2 writes, but if we have the exact input and code to reproduce the problem the correct answer is probably not relevant. Also, you use some code to list columns, consider hardcoding it first. As mentioned, really try to take out all complexity untill we land on a minimal amount that still reproduces the problem. Hopefully you will already see the answer once you have eliminated all the distractions, and if not you will have a fully trimmed down version, which you can use to update your question here!

DennisJaheruddi · ‎08-08-2018

As indicated in an existing answer by @hduraiswamy there are some things you can do: 1. Give multiple insert commands in parallel, and they will automatically be executed sequentially 2. Writing multiple files to a directory and then creating a hive table on top of the folder, see the aforementioned answer If this does not work for you, you can of course also work with a non-external hive table.

DennisJaheruddi · ‎01-15-2019

First of all doublecheck all configurations (incl. password). Just to avoid moving in the right direction. Secondly confirm that you do not need TLS enabled. If these don't help, the following might help with troubleshooting: 1. Become nifi on the node where nifi is running 2. Send the message via Python 3. Share the python command here Note: Please explicity specify all things that you configure in nify when executing python (even if they are not needed because of good defaults for instance).

Online	Offline
Last Visited	‎12-15-2021 03:18 AM

Member Since	‎01-07-2019 03:54 AM
Last Visited	‎12-15-2021 03:18 AM
Posts	220
Kudos received	31

Cloudera Community

Re: 在启用kerberos的集群flink程序如何连接集群外未启用认证的kafka

Re: Attribute validation against MSSQL database

Re: Put array with Dates on nifi flowfile

Re: NiFi templates don't include all controller se...

Re: Concatenations of Multiple Attributes in Nifi

Re: ERROR tool.ImportTool: Import failed: java.io....

Re: Installing Nifi with cloudera

Re: Near/real-time Outlook email ingestion

Re: Creating a CSD / Parcel for Nifi - CDH 5.7.0

Re: Apache Nifi - Cloudera Alternative?

Re: Which one is best Hive vs Impala vs Drill vs K...

Re: What is a reasonable value for "--fetch-size" ...

Re: Unable to map the data properly from a CSV fil...

Re: How to do multiple parallel inserts into hive ...

Re: NiFi PutEmail Error