Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SQOOP importing a 2-million-record table from SQL Server

SQOOP importing a 2-million-record table from SQL Server

Contributor

The objective is to import a table of 2 million records. The mapreduce job seems to completed OK. There is no error message. But the table did not appear in the target hive db.

Before this import, I was able to import another table with just 7 records successfully. The imported table appeared in hive db.

I compared the outputs of two imports and found out that, the latter one stopped at this output.

19/05/20 13:10:28 INFO mapreduce.ImportJobBase: Transferred 2.3685 GB in 526.6253 seconds (4.6054 MB/sec)
19/05/20 13:10:28 INFO mapreduce.ImportJobBase: Retrieved 2925046 records.

But the successfully completed import had a lot outputs after reporting “retrieved # records”

19/05/20 11:48:18 INFO mapreduce.ImportJobBase: Transferred 176 bytes in 67.7923 seconds (2.5962 bytes/sec)
19/05/20 11:48:18 INFO mapreduce.ImportJobBase: Retrieved 7 records.
19/05/20 11:48:18 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [DimDepartmentGroup] AS t WHERE 1=0
19/05/20 11:48:18 INFO hive.HiveImport: Loading uploaded data into Hive
19/05/20 11:48:20 INFO hive.HiveImport: SLF4J: Class path contains multiple SLF4J bindings.

Another difference between the two imports is that the successful one did the cleanup while the unsuccessful one did not. I can re-run the successful import without error. When I re-run the unsuccessful import, I ran into error that a folder in hdfs://user/hive/ path already exists. I had to delete that folder so the mapreduce process could continue, although it did not create the table at the end.

10 REPLIES 10

Re: SQOOP importing a 2-million-record table from SQL Server

Mentor

@Haijin Li

Can you share your sqoop import command? There should be an issue with the command.

You need to explicitly tell sqoop to create a hive table like below after the run you should have a table haijin_hive created

--hive-table haijin_hive \
--create-hive-table \
--target-dir /tmp/hive_temp

Add the above to your command and let me know


Re: SQOOP importing a 2-million-record table from SQL Server

Contributor

@Geoffrey Shelton Okot

Thanks for the reply.

I tried including “create-hive-table” parameter and still got the same result. I did not apply the “target-dir” parameter because I want Hive to control this.

This is the import command ran successfully. I am able to re-run this one several times successfully. Certainly, I dropped the TestHive.DimDepartmentGroup table before the rerun.

sqoop import --connect "jdbc:sqlserver://abc:1433;database=ABC" --username abc --password abc --table "DimDepartmentGroup" --hive-import --hive-table TestHive.DimDepartmentGroup

This is the import command that did not create the table.

sqoop import --connect 'jdbc:sqlserver://xyz\\sql1:57830;database=XYZ' --username xyz --password xyz  --table 'Dim_Member' --split-by Member_SKey -- --schema mydw --hive-import --hive-table Testdb.dim_member --create-hive-table

You can see there are three key differences between the two imports but I do not think any of them matters.

  • Source SQL Server instance – default vs named
  • Source SQL db schema name – default (dbo) vs user-defined (mydw)
  • Whether to use “split-by”

Re: SQOOP importing a 2-million-record table from SQL Server

Contributor

Additional test showed that, the source sql server database schema name is the root cause. After I re-create the Dim_Member (a sql view) in dbo schema, the import command below works.

sqoop import --connect 'jdbc:sqlserver://xyz\\sql1:57830;database=XYZ' --username xyz --password xyz  --table 'Dim_Member' --split-by Member_SKey --hive-import --hive-table Testdb.dim_member

I can not find any documentation regarding specifying database schema in Sqoop tool. My only reference is this link. https://community.hortonworks.com/questions/50557/sqoop-import-sql-server-non-default-schema.html

I think the syntax of specifying the database schema is correct because at least the mapreduce job was created successfully and completed halfway.

Re: SQOOP importing a 2-million-record table from SQL Server

Mentor

@Haijin Li

Can you include the --target dir as I pointed out before

--hive-table haijin_hive \
--create-hive-table \
--target-dir /tmp/hive_temp

And let me know :-)

Re: SQOOP importing a 2-million-record table from SQL Server

Contributor

@Geoffrey Shelton Okot

I tested again with "target-dir" specified and got the same result. There is no error but the processed stopped at the following output.

19/05/22 12:02:24 INFO mapreduce.ImportJobBase: Transferred # bytes in # seconds
19/05/22 12:02:24 INFO mapreduce.ImportJobBase: Retrieved # records.

I checked the path specified as target-dir and it is empty.

Re: SQOOP importing a 2-million-record table from SQL Server

Mentor

@Haijin Li

The target-dir is a scratch directory it's supposed to load the data into hive tables if you used the --create-hive-table ???

Re: SQOOP importing a 2-million-record table from SQL Server

Contributor

@Geoffrey Shelton Okot

Target-dir is not a scratch directory.

https://stackoverflow.com/questions/37415989/difference-between-warehouse-dir-and-target-dir-command...

But anyway, I did see the scratch directory created under the hdfs://users/hive path every time and there were files inside. I expect the scratch directory to be deleted automatically if the import command runs through the end. Actually, I had to manually delete this scratch folder every time before I ran my test because my test import never got completed and always stopped half-way.

Thanks again for your help.

Re: SQOOP importing a 2-million-record table from SQL Server

Mentor

@Haijin Li

Wrong word scratch dir !!
So sqoop job still doesn't complete and create the hive tables right?

Re: SQOOP importing a 2-million-record table from SQL Server

Contributor

My sqoop import job still doesn't complete and create the hive tables.

Currently, my workaround is to create a dbo view on top of my warehouse view/table and then run the import job.