Support Questions
Find answers, ask questions, and share your expertise

Tutorial exercise 1: Problem ingesting structured data using sqoop

Explorer

Hi there,

 

I've been following excerise one, I have run the sqoop command to import all tables into hive.  It has imported the 'categories' table but it has not imported the other tables.

 

I'm using the quickstart VM hosted on GoGrid that includes the Tableu software.

 

Here is a full log:

 

[root@mb2d0-cldramaster-01 ~]# sqoop import-all-tables \
>     -m 12 \
>     --connect jdbc:mysql://216.121.94.146:3306/retail_db \
>     --username=retail_dba \
>     --password=cloudera \
>     --compression-codec=snappy \
>     --as-avrodatafile \
>     --warehouse-dir=/user/hive/warehouse
Warning: /opt/cloudera/parcels/CDH-5.2.0-1.cdh5.2.0.p0.36/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
15/06/08 00:37:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
15/06/08 00:37:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/06/08 00:37:30 INFO manager.SqlManager: Using default fetchSize of 1000
15/06/08 00:37:30 INFO tool.CodeGenTool: Beginning code generation
15/06/08 00:37:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/06/08 00:37:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/06/08 00:37:30 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
Note: /tmp/sqoop-root/compile/6f9632f206ce58bd0d42187391fced45/categories.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/06/08 00:37:32 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/6f9632f206ce58bd0d42187391fced45/categories.jar
15/06/08 00:37:32 WARN manager.MySQLManager: It looks like you are importing from mysql.
15/06/08 00:37:32 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
15/06/08 00:37:32 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
15/06/08 00:37:32 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
15/06/08 00:37:32 INFO mapreduce.ImportJobBase: Beginning import of categories
15/06/08 00:37:32 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/06/08 00:39:06 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `categories` AS t LIMIT 1
15/06/08 00:39:07 INFO mapreduce.DataDrivenImportJob: Writing Avro schema file: /tmp/sqoop-root/compile/6f9632f206ce58bd0d42187391fced45/sqoop_import_categories.avsc
15/06/08 00:39:07 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/06/08 00:39:07 INFO client.RMProxy: Connecting to ResourceManager at mb2d0-cldramaster-01/10.104.23.2:8032
15/06/08 00:39:09 INFO db.DBInputFormat: Using read commited transaction isolation
15/06/08 00:39:09 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`category_id`), MAX(`category_id`) FROM `categories`
15/06/08 00:39:09 INFO mapreduce.JobSubmitter: number of splits:12
15/06/08 00:39:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1433738552738_0001
15/06/08 00:39:10 INFO impl.YarnClientImpl: Submitted application application_1433738552738_0001
15/06/08 00:39:10 INFO mapreduce.Job: The url to track the job: http://mb2d0-cldramaster-01:8088/proxy/application_1433738552738_0001/
15/06/08 00:39:10 INFO mapreduce.Job: Running job: job_1433738552738_0001
15/06/08 00:39:23 INFO mapreduce.Job: Job job_1433738552738_0001 running in uber mode : false
15/06/08 00:39:23 INFO mapreduce.Job:  map 0% reduce 0%
15/06/08 00:39:38 INFO mapreduce.Job:  map 25% reduce 0%
15/06/08 00:39:44 INFO mapreduce.Job:  map 50% reduce 0%
15/06/08 00:39:49 INFO mapreduce.Job:  map 75% reduce 0%
15/06/08 00:39:54 INFO mapreduce.Job:  map 92% reduce 0%
15/06/08 00:39:59 INFO mapreduce.Job:  map 100% reduce 0%
15/06/08 00:39:59 INFO mapreduce.Job: Job job_1433738552738_0001 completed successfully
15/06/08 00:41:02 INFO mapreduce.Job: Counters: 30
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=1568938
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=1414
		HDFS: Number of bytes written=6868
		HDFS: Number of read operations=48
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=24
	Job Counters 
		Launched map tasks=12
		Other local map tasks=12
		Total time spent by all maps in occupied slots (ms)=205375
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=205375
		Total vcore-seconds taken by all map tasks=205375
		Total megabyte-seconds taken by all map tasks=210304000
	Map-Reduce Framework
		Map input records=58
		Map output records=58
		Input split bytes=1414
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=585
		CPU time spent (ms)=20150
		Physical memory (bytes) snapshot=2704818176
		Virtual memory (bytes) snapshot=18778705920
		Total committed heap usage (bytes)=3667918848
	File Input Format Counters 
		Bytes Read=0
	File Output Format Counters 
		Bytes Written=6868
15/06/08 00:41:03 INFO ipc.Client: Retrying connect to server: mb2d0-cldraagent-01/10.104.23.3:55949. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/06/08 00:41:04 INFO ipc.Client: Retrying connect to server: mb2d0-cldraagent-01/10.104.23.3:55949. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/06/08 00:41:05 INFO ipc.Client: Retrying connect to server: mb2d0-cldraagent-01/10.104.23.3:55949. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=3, sleepTime=1000 MILLISECONDS)
15/06/08 00:41:05 INFO mapred.ClientServiceDelegate: Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server
15/06/08 00:41:05 ERROR tool.ImportAllTablesTool: Encountered IOException running import job: java.io.IOException: Job status not available 
You have new mail in /var/spool/mail/root
[root@mb2d0-cldramaster-01 ~]# hadoop fs -ls /user/hive/warehouse
Found 1 items
drwxr-xr-x   - root hive          0 2015-06-08 00:39 /user/hive/warehouse/categories

You can see when I run the last command, it only finds 1 item.  In the tutorial, it should show 6 items.

 

Thanks so much for your help.

 

Regards,

Gaj

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Explorer

Hi,

 

I managed to get past this.  I deleted the old directories using this command here: 

 

sudo -u hdfs hadoop fs -rm -r /user/hive/warehouse/\*

 

I then re-ran the import command and it seemed to work.  I don't know why it did not work the first time.

 

Thanks,

Gaj

Ember Software

View solution in original post

1 REPLY 1

Explorer

Hi,

 

I managed to get past this.  I deleted the old directories using this command here: 

 

sudo -u hdfs hadoop fs -rm -r /user/hive/warehouse/\*

 

I then re-ran the import command and it seemed to work.  I don't know why it did not work the first time.

 

Thanks,

Gaj

Ember Software

View solution in original post