About arkaprova

namaheshwari · ‎06-08-2017

Below are couple of good blogs: https://elitedatascience.com/r-vs-python-for-data-science http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html Go through these, and see if you can come to a conclusion. My preference is however Python.

mqureshi · ‎05-08-2017

@Arkaprova Saha First of all, Nifi is far more mature and easier to use and will do the job much better. That being said, let me try to answer your questions. 1. If I have millions of data in the database, How can I proceed with multi treading option ? Depends on what you mean by multi threading. Are you getting data from one table? In that case use tasks.max in your property file. Check this link, it has description of how to use it. 2. Can we use multiple broker in Kafka connect If you are getting data from different tables and you submit jdbc connector to a distributed cluster, it will automatically divide the work in number of tasks equal to number of tables across different workers. 3. How can we implement security in this offload from RDBMS to Kafka topic? Your RDBS security will be implemented by your database and will require authentication from your program as well as authorization of what that user can read. As for Kafka, the security is described in this link here. You can authenticate clients via SSL or Kerberos and authorization of operations. There is nothing special. You should have that working even if you are not doing this and it will be no different. 4. During data offload lets say my server goes down . How it will behave after kafka server restart? Please take a look at this discussion. This should answer your questions.

zhaodechao · ‎04-16-2018

Hi,I hava the same issue ,How can I solve it ? please!

mukul_lakh · ‎10-11-2018

@Artem Ervits, Zookeeper Ports are open for me, but I'm still facing the same exception

arkaprova · ‎01-30-2017

@rguruvannagari Thanks a lot. After setting hive property , this issue is resolved.

arkaprova · ‎01-30-2017

This issue has been fixed after setting below properties in kms-site.xml. hadoop.kms.proxyuser.hive.users=* hadoop.kms.proxyuser.hive.hosts =* Now I am not getting any authentication error.

gkeys · ‎11-23-2016

True. Here is the complete list of validation limitations. https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation Validation currently only validates data copied from a single table into HDFS. The following are the limitations in the current implementation: all-tables option free-form query option Data imported into Hive or HBase table import with --where argument incremental imports For Hive You can sqoop to HDFS with an external Hive table pointed to it. You can use the --validate feature here ... and if it passess validation, you can load to your Hive managed table from the landing zone external table using INSERT Select *. Note that this validation only validates the number of rows. If you want to validate every single value transferred from source to Hive, you would have to sqoop back to your source and compare checksums of the original and sqooped data in that source system HBase You can put a Phoenix front-end in front of your hbase table and do the same as with hive above.

LesterMartin · ‎01-12-2018

https://stackoverflow.com/questions/45100487/how-data-is-split-into-part-files-in-sqoop can start to explain more, but ultimately (and thanks to the power of open-source) you'll have to go look for yourself - you can find source code at https://github.com/apache/sqoop. Good luck and happy Hadooping!

LesterMartin · ‎09-04-2017

Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training. Additionally, Hadoop: The Definitive Guide guide, https://smile.amazon.com/Definitive-version-revised-English-Chinese/dp/7564159170/, is still a very good resource.

cstanca · ‎09-26-2016

@Arkaprova Saha It depends on you feel about yourself and your future. If you consider yourself a software engineer that has solid Java background and wants to deliver highly optimized and scalable software products based on Spark then you may want to focus more on Scala. If you are more focused on data wrangling, discovery and analysis, short-term use focused studies, or to resolve business problems as quick as possible then Python is awesome. Python has such a large community and code snippets, applications etc. Don't get me wrong, but Python could also be used to deliver enterprise-level applications, but it is more often to use Java and Scala for highly optimized. Python has some culprits, which we will not debate here. Anyhow, I would say that Python is kind of a MUST HAVE and Scala is NICE TO HAVE. Obviously, this is my 2c and I would be amazed that any of these responses in this thread is the ANSWER.

Online	Offline
Last Visited	‎06-01-2016 06:52 AM

Member Since	‎05-19-2016 11:57 PM
Last Visited	‎06-01-2016 06:52 AM
Posts	93
Kudos received	17

Cloudera Community

Re: Hive query exception from Java api

Re: Sqoop : Teradata to HDFS using AVRO file forma...

Re: R vs Python

Re: Kafka connect with Multi Threading

Re: Exception in creating hbase table from hive

Re: HBase connection issue from java

Re: Authentication failed in Hive

Re: Hive query exception from Java api

Re: Data validation after Sqoop command execution

Re: How Sqoop internally works

Re: HCA - HORTONWORKS CERTIFIED ASSOCIATE Certific...

Re: Should I learn Scala or Python