question Re: Using underscore _ in a database name in HIVE in Archives of Support Questions (Read Only)

Using underscore _ in a database name in HIVE

Former Member — Thu, 11 May 2017 21:16:05 GMT

Hi team,

I created a HIVE database called employees_exercise and the creation was successful. However, when I am running a sqoop job to connect to a mysql instance and load into hdfs, i can getting the following error on running the sqoop command.

sqoop import --connect jdbc:mysql://ip-10-0-0-247.eu-west-2.compute.internal/employees --username <<Removed>> --password <<removed>> --table employees --hive-import --hive-table employees_exercise.employees --hive-import --as-parquetfile -m 1

error message is below:

17/05/11 10:10:26 INFO mapreduce.ImportJobBase: Beginning import of employees 17/05/11 10:10:27 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `employees` AS t LIMIT 1 17/05/11 10:10:28 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.ValidationException: Dataset name employees_exercise.employees is not alphanumeric (plus '_') org.kitesdk.data.ValidationException: Dataset name employees_exercise.employees is not alphanumeric (plus '_') at org.kitesdk.data.ValidationException.check(ValidationException.java:55) at org.kitesdk.data.spi.Compatibility.checkDatasetName(Compatibility.java:103) at org.kitesdk.data.spi.hive.HiveManagedMetadataProvider.create(HiveManagedMetadataProvider.java:43) at org.kitesdk.data.spi.hive.HiveManagedDatasetRepository.create(HiveManagedDatasetRepository.java:77) at org.kitesdk.data.Datasets.create(Datasets.java:239) at org.kitesdk.data.Datasets.create(Datasets.java:307) at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:107) at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:89) at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:108) at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260) at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673) at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605) at org.apache.sqoop.Sqoop.run(Sqoop.java:148) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235) at org.apache.sqoop.Sqoop.main(Sqoop.java:244)

Please could some one assist?

Thanks in advance

Regards

Naveen

Re: Using underscore _ in a database name in HIVE

slachterman — Thu, 11 May 2017 21:55:44 GMT

My hypothesis is this is an issue specific to using the --as-parquetfile argument as there appears to be a KiteSDK bug related to this behavior. Can you please try executing the command without that? The pattern would be to create a staging table first and then insert into a parquet-backed table.

Re: Using underscore _ in a database name in HIVE

Former Member — Thu, 11 May 2017 22:34:07 GMT

Thank you. @slachterman. That woirked, I removed the --as-parquetfile and that seemed to have fixed the problem. When you say insert into a parquet-backed table, do you mean I have to create a table in HIVE ahead of loading into it? Amy documentation that you can point to will be helpful.

Re: Using underscore _ in a database name in HIVE

slachterman — Thu, 11 May 2017 23:38:47 GMT

Yes, @Naveen Keshava, that is correct. You will create two tables, say, employees_staging and employees_parquet. Employees_staging will be stored as text and employees_parquet will be stored as parquet. You will sqoop into employees_staging and then run another SQL command to insert into employees_parquet from employees_staging.

Re: Using underscore _ in a database name in HIVE

Former Member — Mon, 05 Jun 2017 05:05:14 GMT

thank you, @slachterman. one final question, when you meant "run another SQL command to insert into employees_parquet from employees_staging." were you refering to a simple "insert into" sql command?

I tried it, and it looks like the insert worked into the target table. But how do I then make sure the file is in hdfs? with parquet format?

Re: Using underscore _ in a database name in HIVE

slachterman — Mon, 05 Jun 2017 06:13:47 GMT

Yes.

You can use describe extended to see the HDFS path associated with the Hive table and you can use parquet-tools to interact with the parquet file.

Re: Using underscore _ in a database name in HIVE

remyasenan — Tue, 08 May 2018 01:29:52 GMT

I also had the same issue, but I was not ready to create a staging table, felt like that is not the real solution.

Just do like this

instead of using "--hive-table employees_exercise.employees" use like "--hive-database employees_exercise --hive-table employees"

It worked well for me

Re: Using underscore _ in a database name in HIVE

kalebyjaun — Fri, 29 Jun 2018 01:40:55 GMT

I faced the same issue in a different case, a incremental import from oracle to a parquet file in hdfs. In my case, the issue ocurred when i specified the --table parameter.

After some researches, i concluded that this issue is caused by the dot when i specified the "schema.table". In adition to mentioned above, you can use the paremeter --query instead the --table in your sqoop import. Eg:

instead of:

sqoop --import --connect jdbc:oracle:thin:@host:port/db --username user --P --table schema.table --split-by column --target-dir /some_hdfs_path/schema/table --incremental append --check-column ID --as-parquetfile --fields-terminated-by '|'

i used:

sqoop --import --connect jdbc:oracle:thin:@host:port/db --username user --P --query "SELECT * FROM schema.table WHERE \$CONDITIONS" --split-by column --target-dir /some_hdfs_path/schema/table --incremental append --check-column column --as-parquetfile --fields-terminated-by '|'

the \$CONDITIONS is required by sqoop, without this you'll got another error.