Created 05-20-2019 11:46 PM
The objective is to import a table of 2 million records. The mapreduce job seems to completed OK. There is no error message. But the table did not appear in the target hive db.
Before this import, I was able to import another table with just 7 records successfully. The imported table appeared in hive db.
I compared the outputs of two imports and found out that, the latter one stopped at this output.
19/05/20 13:10:28 INFO mapreduce.ImportJobBase: Transferred 2.3685 GB in 526.6253 seconds (4.6054 MB/sec) 19/05/20 13:10:28 INFO mapreduce.ImportJobBase: Retrieved 2925046 records.
But the successfully completed import had a lot outputs after reporting “retrieved # records”
19/05/20 11:48:18 INFO mapreduce.ImportJobBase: Transferred 176 bytes in 67.7923 seconds (2.5962 bytes/sec) 19/05/20 11:48:18 INFO mapreduce.ImportJobBase: Retrieved 7 records. 19/05/20 11:48:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [DimDepartmentGroup] AS t WHERE 1=0 19/05/20 11:48:18 INFO hive.HiveImport: Loading uploaded data into Hive 19/05/20 11:48:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.
Another difference between the two imports is that the successful one did the cleanup while the unsuccessful one did not. I can re-run the successful import without error. When I re-run the unsuccessful import, I ran into error that a folder in hdfs://user/hive/ path already exists. I had to delete that folder so the mapreduce process could continue, although it did not create the table at the end.
Created 05-21-2019 03:52 AM
Can you share your sqoop import command? There should be an issue with the command.
You need to explicitly tell sqoop to create a hive table like below after the run you should have a table haijin_hive created
--hive-table haijin_hive \ --create-hive-table \ --target-dir /tmp/hive_temp
Add the above to your command and let me know
Created 05-21-2019 07:02 PM
Thanks for the reply.
I tried including “create-hive-table” parameter and still got the same result. I did not apply the “target-dir” parameter because I want Hive to control this.
This is the import command ran successfully. I am able to re-run this one several times successfully. Certainly, I dropped the TestHive.DimDepartmentGroup table before the rerun.
sqoop import --connect "jdbc:sqlserver://abc:1433;database=ABC" --username abc --password abc --table "DimDepartmentGroup" --hive-import --hive-table TestHive.DimDepartmentGroup
This is the import command that did not create the table.
sqoop import --connect 'jdbc:sqlserver://xyz\\sql1:57830;database=XYZ' --username xyz --password xyz --table 'Dim_Member' --split-by Member_SKey -- --schema mydw --hive-import --hive-table Testdb.dim_member --create-hive-table
You can see there are three key differences between the two imports but I do not think any of them matters.
Created 05-22-2019 05:38 PM
Additional test showed that, the source sql server database schema name is the root cause. After I re-create the Dim_Member (a sql view) in dbo schema, the import command below works.
sqoop import --connect 'jdbc:sqlserver://xyz\\sql1:57830;database=XYZ' --username xyz --password xyz --table 'Dim_Member' --split-by Member_SKey --hive-import --hive-table Testdb.dim_member
I can not find any documentation regarding specifying database schema in Sqoop tool. My only reference is this link. https://community.hortonworks.com/questions/50557/sqoop-import-sql-server-non-default-schema.html
I think the syntax of specifying the database schema is correct because at least the mapreduce job was created successfully and completed halfway.
Created 05-22-2019 04:13 PM
Can you include the --target dir as I pointed out before
--hive-table haijin_hive \ --create-hive-table \ --target-dir /tmp/hive_temp
And let me know 🙂
Created 05-22-2019 08:16 PM
I tested again with "target-dir" specified and got the same result. There is no error but the processed stopped at the following output.
19/05/22 12:02:24 INFO mapreduce.ImportJobBase: Transferred # bytes in # seconds 19/05/22 12:02:24 INFO mapreduce.ImportJobBase: Retrieved # records.
I checked the path specified as target-dir and it is empty.
Created 05-22-2019 08:44 PM
The target-dir is a scratch directory it's supposed to load the data into hive tables if you used the --create-hive-table ???
Created 05-22-2019 11:03 PM
Target-dir is not a scratch directory.
But anyway, I did see the scratch directory created under the hdfs://users/hive path every time and there were files inside. I expect the scratch directory to be deleted automatically if the import command runs through the end. Actually, I had to manually delete this scratch folder every time before I ran my test because my test import never got completed and always stopped half-way.
Thanks again for your help.
Created 05-23-2019 04:37 AM
Wrong word scratch dir !!
So sqoop job still doesn't complete and create the hive tables right?
Created 05-24-2019 02:48 PM
My sqoop import job still doesn't complete and create the hive tables.
Currently, my workaround is to create a dbo view on top of my warehouse view/table and then run the import job.
Created 05-24-2019 02:44 PM
My sqoop import job still doesn't complete and create the hive tables.
Currently, my workaround is to create a dbo view on top of my warehouse view/table and then run the import job.