Member since
01-29-2017
6
Posts
0
Kudos Received
0
Solutions
03-15-2019
04:56 AM
Dear @AnisurRehman You can import data from RDBMS to HDFS only with SQOOP. Then If you want to manipulate this table through Impala-Shell then you only need to run the following command from a pc where Impala is installed. impala-shell -d db_name -q "INVALIDATE METADATA tablename"; You have to do INVALIDATE because your table is new for Impala daemon metadata. Then if you append new data-files to the existing tablename table you only need to do refesh, the command is impala-shell -d db_name -q "REFRESH tablename"; Refresh due to the fact that you do not want the whole metadata for the specific table, only the block location for the new data-files. So after that you can quey the table through Impala-shell and Impala query editor.
... View more
02-06-2017
05:53 AM
1 Kudo
Hi @AnisurRehman,
Did you check out the follow up posts from the series including:
Ralph Kimball and Kaiser Permanente: Q&A Part I – Hadoop and the Data Warehouse
Ralph Kimball and Kaiser Permanente: Q&A Part II – Building the Landing Zone
... View more
02-03-2017
01:02 PM
1 Kudo
https://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_ig_feature_differences.html The main difference is that you do get a lot of features that would make management easier specifically around configuration versioning, encryption, security, etc. There will be not technical limitation on the services between the versions. Since you were told to get it from Apache it is worth mentioning that CDH is a package distribution that Cloudera integrates and tests. This means that you won't have to do it but it also means that you will have to go at Cloudera pace when adopting new projects or new versions (technically you can add your own as well but my view is that if you are going to be doing that anyway why not do it for all).
... View more
01-31-2017
12:24 PM
1 Kudo
@AnisurRehman 1. Pls refer this official link to know more about sqoop. Change the version according to your sqoop version: https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html 2. Yes bulk import is possible. Pls refer "sqoop-import-all-tables" topic from the above link 3. About Incremental: Pls refer "incremental import" from the above link 4. About Impala for Sqoop: a. Sqoop uses Mapper from MapReduce (No Reducers by default). It will refer the hive db/table just to idenfy the target location and it will never use hive/impala engine/process methods to import. So specifying impala/hive doesn't make any difference, so sqoop provides hive-import option by default. The bottom line is you can continue to use hive options in the sqoop script b. After data import, it is upto your option to use either hive/impala depends upon your requirement. But as you mentioned, you can use impala in certain situation, so pls use impala only when it is necessary (some priority tables) Thanks Kumar
... View more