Created on 02-12-2016 02:22 PM - edited 09-16-2022 03:03 AM
Hi Cloudera community! Happy to join your community! I'm a sysadmin and who love my job and like to works on new technology. So, I'm on Cloudera now! For some test, we create a cluster with 3 nodes in a labs. 1 node for Cloudera Manager, 1 node for NameNode and DataNode, and the last one as DataNode only. It's a labs to discover the new version of Cloudera 5.5. So it's just to made some test on it, not to be in production! We install these services: hdfs, hive, hue, impala, oozie, zookeeper, Mapreduce2 (Yarn), Sqoop1. One our developers, try to import some data into Hive, but we got an error. Here the command line use by our developers: sqoop import --connect jdbc:mysql://our.database.url/database --username user --password passwordtest --table table_product --target-dir /path/to/db --split-by product_id --hive-import --hive-overwrite --hive-table table_product The command start successfully, we see the mapper do the job to 100 but when the job finish, we have an error: 6/02/12 15:37:57 WARN hive.TableDefWriter: Column last_updated had to be cast to a less precise type in Hive 16/02/12 15:37:57 INFO hive.HiveImport: Loading uploaded data into Hive 16/02/12 15:37:57 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 16/02/12 15:37:57 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50) at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392) at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337) .... I do some search in configuration file about the HIVE_CONF_DIR and doesn't find something weird. I don't find a solution about and I block on it... So our developer can't continue his test. I search in Cloudera Manager configuration too. Have you an idea about that ? I do some search on web with no success. Thanks a lot for your help!
Created 02-25-2016 07:25 AM
It works!
As we see in the outpulog, we see the HADOOP_CLASSPATH variable. Or we don't have any path for libs in hive directory...
I try once to add in HADOOP_CLASSPATH the his folder but it doesn't works.
The solution is to add the the folder and /* to take all jar...
So I add this one in .bash_profile:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/*
Then
source ~/.bash_profile
And now it works. Date were imported in Hive!
Now we can continue our labs with Cloudera 5!
Thanks!
Created on 02-14-2016 07:09 PM - edited 02-14-2016 07:35 PM
Did you try passing the --config
sorry to tell you , please makesure you have hive-site.xml file localted inside the hive conf directory.
Could you please put your command line in the code section like the below .
you will find an icon like this in the edit windown {i} - click on em and put your code in the pop up window.
thanks
Put your command line here for readability
Created 02-15-2016 07:05 AM
Hello MattSun,
Thanks for your help !
For this command, I talk with my dev to try the --config option
sqoop import --connect jdbc:mysql://our.database.url/database --username user --password passwordtest --table table_product --target-dir /path/to/db --split-by product_id --hive-import --hive-overwrite --hive-table table_product
But the --config option is not available.
I check on the Sqoop documentation website and find nothing about this otption
https://sqoop.apache.org/docs/1.4.6/
The /etc/hive/conf/hive-site.xml is present. The hive-env.sh too.
To add more information about my clutser, here the differents version of our installed tools:
Parquet 1.5.0+cdh5.5.1+176 Impala 2.3.0+cdh5.5.1+0 YARN 2.6.0+cdh5.5.1+924 spark 1.5.0+cdh5.5.1+94 HDFS 2.6.0+cdh5.5.1+924 hue-common 3.9.0+cdh5.5.1+333 hadoop-kms 2.6.0+cdh5.5.1+924 Sqoop 1.4.6+cdh5.5.1+29 Oozie 4.1.0+cdh5.5.1+223 Zookeeper 3.4.5+cdh5.5.1+91 Hue 3.9.0+cdh5.5.1+333 MapReduce 1 2.6.0+cdh5.5.1+924 Hadoop 2.6.0+cdh5.5.1+924 Hive 1.1.0+cdh5.5.1+327 HCatalog 1.1.0+cdh5.5.1+327 MapReduce2 2.6.0+cdh5.5.1+924 Java 6 JAVA_HOME=/usr/java/jdk1.6.0_31 java version "1.6.0_31" Java(TM) SE Runtime Environment (build 1.6.0_31-b04) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode) Java 7 JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera java version "1.7.0_67" Java(TM) SE Runtime Environment (build 1.7.0_67-b01) Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
Thanks for your help.
Created 02-15-2016 07:15 AM
I try to stop all service in the cluster and restart them.
I use this documentation for the order to start the process
Created 02-18-2016 03:01 PM
To reproduce the problem, I install hadoop-client and sqoop on my machine.
Same error here...
The job start, data import is done successfully on HDFS, (I can see on Hue the job status and the database is hdfs),
16/02/18 17:01:15 INFO mapreduce.Job: Running job: job_1455812803225_0020 16/02/18 17:01:24 INFO mapreduce.Job: Job job_1455812803225_0020 running in uber mode : false 16/02/18 17:01:24 INFO mapreduce.Job: map 0% reduce 0% 16/02/18 17:01:33 INFO mapreduce.Job: map 25% reduce 0% 16/02/18 17:01:34 INFO mapreduce.Job: map 50% reduce 0% 16/02/18 17:01:41 INFO mapreduce.Job: map 100% reduce 0% 16/02/18 17:01:41 INFO mapreduce.Job: Job job_1455812803225_0020 completed successfully 16/02/18 17:01:41 INFO mapreduce.Job: Counters: 30 File System Counters FILE: Number of bytes read=0 FILE: Number of bytes written=555640 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=473 HDFS: Number of bytes written=8432 HDFS: Number of read operations=16 HDFS: Number of large read operations=0 HDFS: Number of write operations=8 Job Counters Launched map tasks=4 Other local map tasks=4 Total time spent by all maps in occupied slots (ms)=25664 Total time spent by all reduces in occupied slots (ms)=0 Total time spent by all map tasks (ms)=25664 Total vcore-seconds taken by all map tasks=25664 Total megabyte-seconds taken by all map tasks=26279936 Map-Reduce Framework Map input records=91 Map output records=91 Input split bytes=473 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=351 CPU time spent (ms)=4830 Physical memory (bytes) snapshot=802369536 Virtual memory (bytes) snapshot=6319828992 Total committed heap usage (bytes)=887095296 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=8432 16/02/18 17:01:41 INFO mapreduce.ImportJobBase: Transferred 8,2344 KB in 30,7491 seconds (274,219 bytes/sec) 16/02/18 17:01:41 INFO mapreduce.ImportJobBase: Retrieved 91 records.
but when import with hive is start:
16/02/18 17:01:41 WARN hive.TableDefWriter: Column last_updated had to be cast to a less precise type in Hive 16/02/18 17:01:41 INFO hive.HiveImport: Loading uploaded data into Hive 16/02/18 17:01:41 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly. 16/02/18 17:01:41 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:50) at org.apache.sqoop.hive.HiveImport.getHiveArgs(HiveImport.java:392) at org.apache.sqoop.hive.HiveImport.executeExternalHiveScript(HiveImport.java:379) at org.apache.sqoop.hive.HiveImport.executeScript(HiveImport.java:337) at org.apache.sqoop.hive.HiveImport.importTable(HiveImport.java:241) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:514) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227) at org.apache.sqoop.Sqoop.main(Sqoop.java:236) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:195) at org.apache.sqoop.hive.HiveConfig.getHiveConf(HiveConfig.java:44) ... 12 more
I try few thing about that:
- Add variable HIVE_CONF_DIR=/etc/hive/conf to my .bash_profile file: No success
- Add the same variable to /usr/lib/hive/conf/hive-env.sh: no success
- Copy /usr/lib/sqoop/conf/sqoop-env-template.sh and add the variable inside: no success
Hope somebody have an idea to help us!
Created 02-19-2016 09:52 AM
I am not getting it ,
.ClassNotFoundException:
sounds more like a missing jar to me .
I use the below version , it works fine .
Sqoop 1.4.4-cdh5.0.0
Hive 0.12.0-cdh5.0.0
wil dig more and let you know if I come up with anything.sorry
Created 02-22-2016 07:14 AM
Hi Matt,
Thanks. But my problem still present...
Maybe someone else with a fresh install of Cloudera Manager/CDH 5.5 can have the same problem.
For test, I try a fresh install on an another single machine. Same error !
So maybe the problem came from the client configuration.
To do the installation I use our Cloudera Manager/CDH repository which are sync every day.
So I use the package and the not the parcels during installation.
My test VM are on CentOS 6.6. A supported version.
To start the command, I launch it since my machine (Ubuntu)
I install these services to work:
sudo apt-get install hadoop-client hive oozie-client sqoop
I add these variable in my ".bash_profile"
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/share/java/slf4j-simple.jar export HIVE_HOME=/usr/lib/hive export PATH=$PATH:$HIVE_HOME/bin
Then I do an "scp" to recover the "/etc/sqoop", "/etc/hive", "/etc/hadoop/",
The configuration seems to be ok so. If not, the command can't start.
I tried to add the the HIVE_CONF_DIR variable in differents file:
- sqoop-env.sh
- hadoop-env.sh
- hive-env.sh
Without any success. The process starts, but the error still present.
Hope somebody can help me !
Created 02-22-2016 02:57 PM
On my client machine, I try to find the class and find it in a jar file.
I go to /usr/lib/hive/lib folder, and look inside the hive-common.jar with this command:
jar tf hive-common.jar
At the end, I can see this line:
org/apache/hadoop/hive/conf/HiveConf.class
So the class is present. So why he can't find it when it start the import ?
The HIVE_HOME is set to /usr/lib/hive, so, the path is valid...
I continue to search, but maybe it can give you more informations why and how to solve that !
Created 02-25-2016 07:25 AM
It works!
As we see in the outpulog, we see the HADOOP_CLASSPATH variable. Or we don't have any path for libs in hive directory...
I try once to add in HADOOP_CLASSPATH the his folder but it doesn't works.
The solution is to add the the folder and /* to take all jar...
So I add this one in .bash_profile:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/usr/lib/hive/lib/*
Then
source ~/.bash_profile
And now it works. Date were imported in Hive!
Now we can continue our labs with Cloudera 5!
Thanks!