Member since
05-19-2016
93
Posts
17
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5443 | 01-30-2017 07:34 AM | |
3762 | 09-14-2016 10:31 AM |
06-08-2017
08:43 PM
7 Kudos
Below are couple of good blogs: https://elitedatascience.com/r-vs-python-for-data-science http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html Go through these, and see if you can come to a conclusion. My preference is however Python.
... View more
05-08-2017
05:19 PM
@Arkaprova Saha
First of all, Nifi is far more mature and easier to use and will do the job much better. That being said, let me try to answer your questions. 1. If I have millions of data in the database, How can I proceed with multi treading option ? Depends on what you mean by multi threading. Are you getting data from one table? In that case use tasks.max in your property file. Check this link, it has description of how to use it. 2. Can we use multiple broker in Kafka connect If you are getting data from different tables and you submit jdbc connector to a distributed cluster, it will automatically divide the work in number of tasks equal to number of tables across different workers. 3. How can we implement security in this offload from RDBMS to Kafka topic? Your RDBS security will be implemented by your database and will require authentication from your program as well as authorization of what that user can read. As for Kafka, the security is described in this link here. You can authenticate clients via SSL or Kerberos and authorization of operations. There is nothing special. You should have that working even if you are not doing this and it will be no different. 4. During data offload lets say my server goes down . How it will behave after kafka server restart? Please take a look at this discussion. This should answer your questions.
... View more
04-16-2018
08:26 AM
Hi,I hava the same issue ,How can I solve it ? please!
... View more
10-11-2018
06:14 AM
@Artem Ervits, Zookeeper Ports are open for me, but I'm still facing the same exception
... View more
01-30-2017
06:40 AM
@rguruvannagari Thanks a lot. After setting hive property , this issue is resolved.
... View more
01-30-2017
07:34 AM
This issue has been fixed after setting below properties in kms-site.xml. hadoop.kms.proxyuser.hive.users=*
hadoop.kms.proxyuser.hive.hosts =* Now I am not getting any authentication error.
... View more
11-23-2016
03:34 PM
True. Here is the complete list of validation limitations. https://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#validation Validation currently only validates data copied from a single table into HDFS. The following are the limitations in the current implementation: all-tables option free-form query option Data imported into Hive or HBase table import with --where argument incremental imports For Hive You can sqoop to HDFS with an external Hive table pointed to it. You can use the --validate feature here ... and if it passess validation, you can load to your Hive managed table from the landing zone external table using INSERT Select *. Note that this validation only validates the number of rows. If you want to validate every single value transferred from source to Hive, you would have to sqoop back to your source and compare checksums of the original and sqooped data in that source system HBase You can put a Phoenix front-end in front of your hbase table and do the same as with hive above.
... View more
01-12-2018
09:02 PM
https://stackoverflow.com/questions/45100487/how-data-is-split-into-part-files-in-sqoop can start to explain more, but ultimately (and thanks to the power of open-source) you'll have to go look for yourself - you can find source code at https://github.com/apache/sqoop. Good luck and happy Hadooping!
... View more
09-04-2017
09:20 PM
Take the free self-paced course at http://public.hortonworksuniversity.com/hdp-overview-apache-hadoop-essentials-self-paced-training. Additionally, Hadoop: The Definitive Guide guide, https://smile.amazon.com/Definitive-version-revised-English-Chinese/dp/7564159170/, is still a very good resource.
... View more
09-26-2016
06:05 PM
6 Kudos
@Arkaprova Saha It depends on you feel about yourself and your future. If you consider yourself a software engineer that has solid Java background and wants to deliver highly optimized and scalable software products based on Spark then you may want to focus more on Scala. If you are more focused on data wrangling, discovery and analysis, short-term use focused studies, or to resolve business problems as quick as possible then Python is awesome. Python has such a large community and code snippets, applications etc. Don't get me wrong, but Python could also be used to deliver enterprise-level applications, but it is more often to use Java and Scala for highly optimized. Python has some culprits, which we will not debate here. Anyhow, I would say that Python is kind of a MUST HAVE and Scala is NICE TO HAVE. Obviously, this is my 2c and I would be amazed that any of these responses in this thread is the ANSWER.
... View more