Support Questions

Report Inappropriate Content · ‎05-11-2017

Hi team,

I created a HIVE database called employees_exercise and the creation was successful. However, when I am running a sqoop job to connect to a mysql instance and load into hdfs, i can getting the following error on running the sqoop command.

sqoop import --connect jdbc:mysql://ip-10-0-0-247.eu-west-2.compute.internal/employees --username <<Removed>> --password <<removed>> --table employees --hive-import --hive-table employees_exercise.employees --hive-import --as-parquetfile -m 1

error message is below:

17/05/11 10:10:26 INFO mapreduce.ImportJobBase: Beginning import of employees 17/05/11 10:10:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1 17/05/11 10:10:28 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name employees_exercise.employees is not alphanumeric (plus '_') org.kitesdk.data.ValidationException: Dataset name employees_exercise.employees is not alphanumeric (plus '_') at org.kitesdk.data.ValidationException.check(ValidationException.java:55) at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103) at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:43) at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77) at org.kitesdk.data.Datasets.create(Datasets.java:239) at org.kitesdk.data.Datasets.create(Datasets.java:307) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

Please could some one assist?

Thanks in advance

Regards

Naveen

slachterman · ‎05-11-2017

My hypothesis is this is an issue specific to using the --as-parquetfile argument as there appears to be a KiteSDK bug related to this behavior. Can you please try executing the command without that? The pattern would be to create a staging table first and then insert into a parquet-backed table.

View solution in original post

slachterman · ‎05-11-2017

My hypothesis is this is an issue specific to using the --as-parquetfile argument as there appears to be a KiteSDK bug related to this behavior. Can you please try executing the command without that? The pattern would be to create a staging table first and then insert into a parquet-backed table.

Report Inappropriate Content · ‎05-11-2017

Thank you. @slachterman. That woirked, I removed the --as-parquetfile and that seemed to have fixed the problem. When you say insert into a parquet-backed table, do you mean I have to create a table in HIVE ahead of loading into it? Amy documentation that you can point to will be helpful.

slachterman · ‎05-11-2017

Yes, @Naveen Keshava, that is correct. You will create two tables, say, employees_staging and employees_parquet. Employees_staging will be stored as text and employees_parquet will be stored as parquet. You will sqoop into employees_staging and then run another SQL command to insert into employees_parquet from employees_staging.

Report Inappropriate Content · ‎06-04-2017

thank you, @slachterman. one final question, when you meant "run another SQL command to insert into employees_parquet from employees_staging." were you refering to a simple "insert into" sql command?

I tried it, and it looks like the insert worked into the target table. But how do I then make sure the file is in hdfs? with parquet format?

slachterman · ‎06-04-2017

Yes.

You can use describe extended to see the HDFS path associated with the Hive table and you can use parquet-tools to interact with the parquet file.

remyasenan · ‎05-07-2018

I also had the same issue, but I was not ready to create a staging table, felt like that is not the real solution.

Just do like this

instead of using "--hive-table employees_exercise.employees" use like "--hive-database employees_exercise --hive-table employees"

It worked well for me

kalebyjaun · ‎06-28-2018

I faced the same issue in a different case, a incremental import from oracle to a parquet file in hdfs. In my case, the issue ocurred when i specified the --table parameter.

After some researches, i concluded that this issue is caused by the dot when i specified the "schema.table". In adition to mentioned above, you can use the paremeter --query instead the --table in your sqoop import. Eg:

instead of:

sqoop --import --connect jdbc:oracle:thin:@host:port/db --username user --P --table schema.table --split-by column --target-dir /some_hdfs_path/schema/table --incremental append --check-column ID --as-parquetfile --fields-terminated-by '|'

i used:

sqoop --import --connect jdbc:oracle:thin:@host:port/db --username user --P --query "SELECT * FROM schema.table WHERE \$CONDITIONS" --split-by column --target-dir /some_hdfs_path/schema/table --incremental append --check-column column --as-parquetfile --fields-terminated-by '|'

the \$CONDITIONS is required by sqoop, without this you'll got another error.

Cloudera Community

Support Questions

Using underscore _ in a database name in HIVE

Converting Spaces to Underscores for user names in...

Connect to CDP DataHub Hive using Cloudera ODBC Dr...

Change default permission of hive database

ConvertJSONToSQL - Ignores underscore in update ke...

Hive Statistics: Why Useful

RDBMS to Hive using NiFi (small-medium tables)

Hive Disaster Recovery using Falcon

US Presidential Election: tweet analysis using HDF...

Using phoenixdb to connect Cloudera Machine Learni...

Machine Learning with SQL using Apache Hive and Hi...