Reply
New Contributor
Posts: 1
Registered: ‎01-09-2015

Sqoop import --as-parquetfile with CDH5

I'm trying to import data directly from mysql to parquet but it doesn't seem to work correctly...

I'm using CDH5.3 which includes Sqoop 1.4.5.

Here is my command line :

sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --hive-import --hive-table default.pages_users3 --target-dir hive_pages_users --as-parquetfile


Then I get this error :


Warning: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    15/01/09 14:31:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0
    15/01/09 14:31:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    15/01/09 14:31:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
    15/01/09 14:31:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
    15/01/09 14:31:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    15/01/09 14:31:49 INFO tool.CodeGenTool: Beginning code generation
    15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
    Note: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    15/01/09 14:31:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.jar
    15/01/09 14:31:51 INFO mapreduce.ImportJobBase: Beginning query import.
    15/01/09 14:31:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:51 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
    15/01/09 14:31:51 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
    org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
        at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
        at org.kitesdk.data.Datasets.create(Datasets.java:189)
        at org.kitesdk.data.Datasets.create(Datasets.java:240)
        at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
        at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
        at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:262)
        at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:721)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

 
I have no problem importing data to hive file format but parquet is a problem... Do you have any idea why this occurs ?

Thank you :)

Posts: 1,892
Kudos: 431
Solutions: 302
Registered: ‎07-31-2013

Re: Sqoop import --as-parquetfile with CDH5

This is caused due to some missing classpath entries (for "hive" URIs) that Kite's Hive connector relies on.

You can circumvent the issue by manually exporting the below env-vars to resolve the issue, for ex.:

 

 

 

export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --hive-import --hive-table default.pages_users3 --target-dir hive_pages_users --as-parquetfile

 

This will be resolved in a future CDH release as well (i.e. removing the need to manually export the two env-vars above).

 

Does this help?

 

New Contributor
Posts: 1
Registered: ‎01-25-2016

Re: Sqoop import --as-parquetfile with CDH5

Hi, is this fixed? if yes, could you please let me know which version of CDH has this fix?

Or still do we need to manually export the two env-vars specified above? 

New Contributor
Posts: 2
Registered: ‎04-15-2016

Re: Sqoop import --as-parquetfile with CDH5

any answer to this question , we are struggling with the same issue ?

New Contributor
Posts: 2
Registered: ‎04-30-2016

Re: Sqoop import --as-parquetfile with CDH5

[ Edited ]

Hi guy, This works for me using the parquetfile with CDH5. I hope this helps!

 

sqoop import-all-tables \ -m 1 \ --connect jdbc:mysql://xx.xx.xx.xx/database \ --username= username \ --password= password \ --compression-codec=snappy \ --as-parquetfile \ --warehouse-dir=/user/hive/warehouse/ \ --hive-import

New Contributor
Posts: 1
Registered: ‎01-31-2019

Re: Sqoop import --as-parquetfile with CDH5

[ Edited ]

leyendo parece ser que es un bug en el Kite SDK, la solucion definitiva es que se pueden guardar los archivos parquet que genera sqoop en un directorio dentro del mismo directorio que esta asociado a la tabla y despues ejecutar un comando hadoop fs -mv que mueva del directorio temporal al directorio de la tabla y tan tan. por ejemplo:

--target-dir "/user/hive/warehouse/

leyendo parece ser que es un bug en el Kite SDK, la solucion definitiva es que se pueden guardar los archivos parquet que genera sqoop en un directorio dentro del mismo directorio que esta asociado a la tabla y despues ejecutar un comando hadoop fs -mv que mueva del directorio temporal al directorio de la tabla y tan tan. por ejemplo:

--target-dir "/user/hive/warehouse/bd_equipo.db/phone_number/" --as-parquetfile (esta es la configuracion que falla)

la cambiamos por:

--target-dir "/user/hive/warehouse/bd_ConGuionBajo.db/mi_tabla/temp" --as-parquetfile 

despues de que acaba el sqoop-import ya podemos mover los archivos al directorio de la tabla

hadoop fs -mv /user/hive/warehouse/bd_ConGuionBajo.db/mi_tabla/temp/*.parquet /user/hive/warehouse/bd_ConGuionBajo.db/mi_tabla/

 

Felices imports

/" --as-parquetfile (esta es la configuracion que falla)

la cambiamos por:

--target-dir "/user/hive/warehouse/bd_ConGuionBajo.db/mi_tabla/temp" --as-parquetfile 

despues de que acaba el sqoop-import ya podemos mover los archivos al directorio de la tabla

hadoop fs -mv /user/hive/warehouse/bd_ConGuionBajo.db/mi_tabla/temp/*.parquet /user/hive/warehouse/bd_ConGuionBajo.db/mi_tabla/

 

Felices imports

Announcements
New solutions