Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Sqoop import to HCatalog/Hive - table not visible

avatar
Super Collaborator

HDP-2.4.2.0-258 installed using Ambari 2.2.2.0

I have to import several SQL Server schema which should be accessible via Hive, Pig, MR and any third party(in future). As suggested by the community, I decided to import in HCatalog.

Sqoop provides ways to import to Hive OR HCatalog, I guess if I import to HCatalog, the same table will be accessible from Hive CLI, to MR and to Pig(please evaluate my assumption).

Questions :

  • If imported to Hive directly, will the table be available to Pig, MR ?
  • If imported to HCatalog, what needs to be done for accessing via Hive ?
  • Is pre-creation of tables in Hive necessary ? If yes, what is the advantage of the importing in HCatalog, (I can import in Hive directly)/(import in HDFS and then create external table) ?

Issue :

I wish to achieve the following in one step, I'm unsure if it's possible but threads like this and this deem so :

  1. Import of the data(from SQL Server tables)
  2. Store this data at custom HDFS path say /org/data/schema1, /org/data/schema2 and so on(is this impossible because Sqoop says it (--target-dir/--warehouse-dir)
  3. Avoid 'pre-creating' or writing create statements for those tables(there are 100s of them)
  4. Store the table in ORC format

I executed the following command :

-bash-4.2$ sqoop import --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username --password  --table SettingAttribute -- --schema Administration  --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database default --hcatalog-table SettingAttribute --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile"

The source table contains 109 records and those are fetched :

16/08/10 15:02:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.2.0-258
16/08/10 15:02:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
16/08/10 15:02:28 INFO manager.SqlManager: Using default fetchSize of 1000
16/08/10 15:02:28 INFO manager.SQLServerManager: We will use schema Administration
16/08/10 15:02:28 INFO tool.CodeGenTool: Beginning code generation
16/08/10 15:02:28 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [Administration].[SettingAttribute] AS t WHERE 1=0
16/08/10 15:02:28 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.4.2.0-258/hadoop-mapreduce
Note: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/08/10 15:02:30 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-ojoqcu/compile/dfab14748c41a566ec286b7e4b11004d/SettingAttribute.jar
16/08/10 15:02:30 INFO mapreduce.ImportJobBase: Beginning import of SettingAttribute
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.4.2.0-258/accumulo/lib/slf4j-log4j12.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/08/10 15:02:31 INFO impl.TimelineClientImpl: Timeline service address: http://l4373t.sss.com:8188/ws/v1/timeline/
16/08/10 15:02:31 INFO client.RMProxy: Connecting to ResourceManager at l4283t.sss.com/138.106.9.80:8050
16/08/10 15:02:33 INFO db.DBInputFormat: Using read commited transaction isolation
16/08/10 15:02:33 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([SettingAttributeId]), MAX([SettingAttributeId]) FROM [Administration].[SettingAttribute]
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: number of splits:4
16/08/10 15:02:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1467787344827_0013
16/08/10 15:02:34 INFO impl.YarnClientImpl: Submitted application application_1467787344827_0013
16/08/10 15:02:34 INFO mapreduce.Job: The url to track the job: http://l4283t.sss.com:8088/proxy/application_1467787344827_0013/
16/08/10 15:02:34 INFO mapreduce.Job: Running job: job_1467787344827_0013
16/08/10 15:02:41 INFO mapreduce.Job: Job job_1467787344827_0013 running in uber mode : false
16/08/10 15:02:41 INFO mapreduce.Job:  map 0% reduce 0%
16/08/10 15:02:47 INFO mapreduce.Job:  map 100% reduce 0%
16/08/10 15:02:48 INFO mapreduce.Job: Job job_1467787344827_0013 completed successfully
16/08/10 15:02:48 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=616636
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=540
                HDFS: Number of bytes written=10079
                HDFS: Number of read operations=16
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=8
        Job Counters
                Launched map tasks=4
                Other local map tasks=4
                Total time spent by all maps in occupied slots (ms)=16132
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=16132
                Total vcore-seconds taken by all map tasks=16132
                Total megabyte-seconds taken by all map tasks=66076672
        Map-Reduce Framework
                Map input records=109
                Map output records=109
                Input split bytes=540
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=320
                CPU time spent (ms)=6340
                Physical memory (bytes) snapshot=999870464
                Virtual memory (bytes) snapshot=21872697344
                Total committed heap usage (bytes)=943194112
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=10079
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Transferred 9.8428 KB in 17.2115 seconds (585.597 bytes/sec)
16/08/10 15:02:48 INFO mapreduce.ImportJobBase: Retrieved 109 records.

The files are created under my user :

hdfs dfs -ls /user/ojoqcu/SettingAttribute
Found 5 items
-rw-------   3 ojoqcu hdfs          0 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/_SUCCESS
-rw-------   3 ojoqcu hdfs       8378 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00000
-rw-------   3 ojoqcu hdfs        144 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00001
-rw-------   3 ojoqcu hdfs       1123 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00002
-rw-------   3 ojoqcu hdfs        434 2016-08-10 15:02 /user/ojoqcu/SettingAttribute/part-m-00003

I cannot see anything in HCatalog(nor in Hive)

-bash-4.2$ /usr/hdp/2.4.2.0-258/hive-hcatalog/bin/hcat -e "show tables in default;"
WARNING: Use "yarn jar" to launch YARN applications.
16/08/10 15:07:12 WARN conf.HiveConf: HiveConf of name hive.server2.enable.impersonation does not exist
OK
Time taken: 2.007 seconds

@Ravi Mutyala I guess you have considerable experience here, I have proceeded based on the inputs you provided earlier 🙂

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Well, I'm unsure whether it was an authorization issue or a mere parsing problem or both. I did the following and it worked :

  1. Did an 'su hive'
  2. Executed the following command(probably, the -- --schema should be the last arg, Sqoop simply ignores/breaks after that!)
sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username  --password  --table DatabaseLog -- --schema ape

View solution in original post

7 REPLIES 7

avatar
Expert Contributor

I think that's authorization issue. You may try to relax your permission on the data you sqooped over, and see whether you can load your data in Hive.

avatar
Super Collaborator

As I mentioned, the only place where I see the data is under my user's hdfs dir. - neither in HCatalog nor in Hive, hence, can you elaborate on 'relax your permission on the data you sqooped over' ?

I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ?

avatar
Master Guru
  • If imported to Hive directly, will the table be available to Pig, MR ?
    • Yes since hive and pig are integrated with HS2
  • If imported to HCatalog, what needs to be done for accessing via Hive ?
    • Nothing. you should see the table in hive
  • Is pre-creation of tables in Hive necessary ? If yes, what is the advantage of the importing in HCatalog, (I can import in Hive directly)/(import in HDFS and then create external table) ?
    • No. you can use --create-hive-table arguments

avatar
Super Collaborator

Thanks for the inputs, now I am a bit clear.

I still have a doubt - if importing to Hive and HCatalog is kinda synonymous i.e import in one makes the tables available in the other, why would anyone IMPORT to HCatalog at all, why not import to Hive ONLY ? The only advantage I see is that an HCatalog import gives you an option to store the table in ORC, can you validate my assumption ?

avatar
Explorer

I believe It is an authorization issue... BTW have you looked at our tutorial https://community.hortonworks.com/articles/14802/sqoop-importexport-tutorial.html

avatar
Super Collaborator

I checked the var/log but none exist for Sqoop, Hive-Hcatalog and Hive, how can I view the authorization issue and fix it ?

avatar
Super Collaborator

Well, I'm unsure whether it was an authorization issue or a mere parsing problem or both. I did the following and it worked :

  1. Did an 'su hive'
  2. Executed the following command(probably, the -- --schema should be the last arg, Sqoop simply ignores/breaks after that!)
sqoop import --hcatalog-home /usr/hdp/current/hive-webhcat --hcatalog-database FleetManagement_Ape --hcatalog-table DatabaseLog --create-hcatalog-table --hcatalog-storage-stanza "stored as orcfile" --connect 'jdbc:sqlserver://<IP>;database=FleetManagement' --username  --password  --table DatabaseLog -- --schema ape