Member since
06-14-2016
3
Posts
1
Kudos Received
0
Solutions
06-14-2016
04:57 PM
1 Kudo
@Alpha3645 There are a number of ways in which Hadoop can help your company and a data lake is a great way to start using Hadoop. By "Microsoft database", do you mean Microsoft SQL Server? If so, one possibility is to use Sqoop to help move over databases and tables into your data lake and run that in conjunction with your SQL Server instance. You can then use Hive for SQL queries in the data lake. There is no need to replace everything with Hadoop at first. Regarding security, HDP ships with both Apache Knox (perimeter security and API access) and Apache Ranger (fine grained user access) and in many cases these two will meet organizational security requirements with nothing else needed. Regarding data quality, there are a number of commercial tools available such as Talend, Informatica, Trifacta, etc. Hadoop has a number of tools built in for analysis and data manipulation such as Hive, HBase, Pig, and Zeppelin. Here are a few links to get you started: Sqoop - https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html Hadoop Security - http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_Security_Guide/content/ch_hdp-security-guide-overview.html Talend - https://www.talend.com/resource/data-quality-tools.html Trifacta - https://www.trifacta.com/ Informatica - https://www.informatica.com/products/data-quality.html Apache Zeppelin - https://zeppelin.apache.org/ Apache Hive - https://hive.apache.org/ Apache HBase - https://hbase.apache.org/ Apache Pig - https://pig.apache.org/
... View more