About RahulSoni

RahulSoni · ‎04-02-2018

@Sri Kumaran Thiruppathy I don't think so! Sqoop and Spark SQL both use JDBC connectivity to fetch the data from RDBMS engines but Sqoop has an edge here since it is specifically made to migrate the data between RDBMS and HDFS. Every single option available in Sqoop has been fine-tuned to get the best performance while doing the data ingestions. You can start with discussing the option -m which control the number of mappers. This is what you need to do to fetch data in parallel from RDBMS. Can I do it in Spark SQL? Of course yes but the developer would need to take care of "multithreading" that Sqoop has been taking care automatically. And the list goes on! Hope that helps!

RahulSoni · ‎04-01-2018

@Mahendra Hegde In the snapshot you attached, all the processors are stopped. Start all your processors and also verify the following metrics. 1. Is your processor running on all the nodes? Sometimes it may happen that you have your flow files on "Not Primary Node" and the downstream processor is running only in the Primary Node, which sometimes results in such behavior. 2. If #1 does not hold true and all your processors are running "correctly" on all nodes or whatever the correct configuration is, please verify that how many threads are running for your processor(s). 3. Please check your JVM usage for NiFi. Verify if it is "too" high for whatever reason. 4. Please check the scheduling of your processors. Let know if you are able to see anything "unusual" from the above basic debugging steps. Hope that helps!

RahulSoni · ‎04-01-2018

@Yassine Looking at your log, it seems like you are trying to change the datatype in Spark. Is this the case? If yes, use the statement like val a = sqlContext.sql("alter table tableName change col col bigint") Talking about the issue you are facing while converting the type of the column, you need to understand the available datatypes and the implicit cast option available between them. So whenever you issue a command like alter table tableName change columnName columnName <newDataType>; You need to understand that you may have some data in your Hive table's column which is string type now and if you are casting to a variable with datatype like int etc, you may not be able to access certain values and they will generate null. Check this link for Hive datatypes and implicit cast options available.

RahulSoni · ‎04-01-2018

@Sudha Chandrika Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!

RahulSoni · ‎04-01-2018

You can use multiple options. But they have there if's and buts! 🙂 Follows the best option that I can think of! Use MergeContent to merge multiple flow files to one bigger file, put the bigger flow file on Local disk and use "LOAD DATA" statement from MySQL. Will be very fast! Let me know if you need additional help on the topic! If the answer helped you resolve your query, actual or new :), please mark the answer as Accepted!

RahulSoni · ‎04-01-2018

@rajdip chaudhuri Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!

RahulSoni · ‎04-01-2018

@heta desai Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!

RahulSoni · ‎04-01-2018

@vishal dutt Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!

RahulSoni · ‎04-01-2018

@ANKIT PATEL Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!

RahulSoni · ‎04-01-2018

@Vinitkumar Pandey Did the answer help in the resolution of your query? Please close the thread by marking the answer as Accepted!

Online	Offline
Last Visited	‎10-08-2020 11:27 AM

Member Since	‎08-03-2019 10:44 AM
Last Visited	‎10-08-2020 11:27 AM
Posts	186
Kudos received	33

Cloudera Community

Re: Hive / HBase migration - Different clusters

Re: Flowfiles are stuck in que/connection of Nifi

Re: Save dataframe with header in spark 1.6

Re: hive external table pointing to AVRO files

Re: sqoop 1.4.6.2.6.3.0-235 import failing

Re: Can Spark SQL replaces Sqoop for Data Ingestio...

Re: Flowfiles are stuck in que/connection of Nifi

Re: change datatype of external table

Re: Hive table Creation from ORC format file

Re: Flowfile absolute path Nifi

Re: Copy large number of massive files from local ...

Re: how to perform Log file analysis in hadoop ?

Re: NIFI registry Windows install

Re: Files in database are following Folder structu...

Re: Spark-submit Options --jar, --spark-driver-cla...