The objective is to import a table of 2 million records. The mapreduce job seems to completed OK. There is no error message. But the table did not appear in the target hive db.
Before this import, I was able to import another table with just 7 records successfully. The imported table appeared in hive db.
I compared the outputs of two imports and found out that, the latter one stopped at this output.
19/05/20 13:10:28 INFO mapreduce.ImportJobBase: Transferred 2.3685 GB in 526.6253 seconds (4.6054 MB/sec) 19/05/20 13:10:28 INFO mapreduce.ImportJobBase: Retrieved 2925046 records.
But the successfully completed import had a lot outputs after reporting “retrieved # records”
19/05/20 11:48:18 INFO mapreduce.ImportJobBase: Transferred 176 bytes in 67.7923 seconds (2.5962 bytes/sec) 19/05/20 11:48:18 INFO mapreduce.ImportJobBase: Retrieved 7 records. 19/05/20 11:48:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [DimDepartmentGroup] AS t WHERE 1=0 19/05/20 11:48:18 INFO hive.HiveImport: Loading uploaded data into Hive 19/05/20 11:48:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
Another difference between the two imports is that the successful one did the cleanup while the unsuccessful one did not. I can re-run the successful import without error. When I re-run the unsuccessful import, I ran into error that a folder in hdfs://user/hive/ path already exists. I had to delete that folder so the mapreduce process could continue, although it did not create the table at the end.
Can you share your sqoop import command? There should be an issue with the command.
You need to explicitly tell sqoop to create a hive table like below after the run you should have a table haijin_hive created
--hive-table haijin_hive \ --create-hive-table \ --target-dir /tmp/hive_temp
Add the above to your command and let me know
Thanks for the reply.
I tried including “create-hive-table” parameter and still got the same result. I did not apply the “target-dir” parameter because I want Hive to control this.
This is the import command ran successfully. I am able to re-run this one several times successfully. Certainly, I dropped the TestHive.DimDepartmentGroup table before the rerun.
sqoop import --connect "jdbc:sqlserver://abc:1433;database=ABC" --username abc --password abc --table "DimDepartmentGroup" --hive-import --hive-table TestHive.DimDepartmentGroup
This is the import command that did not create the table.
sqoop import --connect 'jdbc:sqlserver://xyz\\sql1:57830;database=XYZ' --username xyz --password xyz --table 'Dim_Member' --split-by Member_SKey -- --schema mydw --hive-import --hive-table Testdb.dim_member --create-hive-table
You can see there are three key differences between the two imports but I do not think any of them matters.
Additional test showed that, the source sql server database schema name is the root cause. After I re-create the Dim_Member (a sql view) in dbo schema, the import command below works.
sqoop import --connect 'jdbc:sqlserver://xyz\\sql1:57830;database=XYZ' --username xyz --password xyz --table 'Dim_Member' --split-by Member_SKey --hive-import --hive-table Testdb.dim_member
I can not find any documentation regarding specifying database schema in Sqoop tool. My only reference is this link. https://community.hortonworks.com/questions/50557/sqoop-import-sql-server-non-default-schema.html
I think the syntax of specifying the database schema is correct because at least the mapreduce job was created successfully and completed halfway.
I tested again with "target-dir" specified and got the same result. There is no error but the processed stopped at the following output.
19/05/22 12:02:24 INFO mapreduce.ImportJobBase: Transferred # bytes in # seconds 19/05/22 12:02:24 INFO mapreduce.ImportJobBase: Retrieved # records.
I checked the path specified as target-dir and it is empty.
Target-dir is not a scratch directory.
But anyway, I did see the scratch directory created under the hdfs://users/hive path every time and there were files inside. I expect the scratch directory to be deleted automatically if the import command runs through the end. Actually, I had to manually delete this scratch folder every time before I ran my test because my test import never got completed and always stopped half-way.
Thanks again for your help.
My sqoop import job still doesn't complete and create the hive tables.
Currently, my workaround is to create a dbo view on top of my warehouse view/table and then run the import job.