Reply
New Contributor
Posts: 1
Registered: ‎01-09-2015

Sqoop import --as-parquetfile with CDH5

I'm trying to import data directly from mysql to parquet but it doesn't seem to work correctly...

I'm using CDH5.3 which includes Sqoop 1.4.5.

Here is my command line :

sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --hive-import --hive-table default.pages_users3 --target-dir hive_pages_users --as-parquetfile


Then I get this error :


Warning: /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    15/01/09 14:31:49 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.3.0
    15/01/09 14:31:49 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    15/01/09 14:31:49 INFO tool.BaseSqoopTool: Using Hive-specific delimiters for output. You can override
    15/01/09 14:31:49 INFO tool.BaseSqoopTool: delimiters with --fields-terminated-by, etc.
    15/01/09 14:31:49 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    15/01/09 14:31:49 INFO tool.CodeGenTool: Beginning code generation
    15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:50 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:50 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce
    Note: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.java uses or overrides a deprecated API.
    Note: Recompile with -Xlint:deprecation for details.
    15/01/09 14:31:51 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/b90e7b492f5b66554f2cca3f88ef7a61/QueryResult.jar
    15/01/09 14:31:51 INFO mapreduce.ImportJobBase: Beginning query import.
    15/01/09 14:31:51 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:51 INFO manager.SqlManager: Executing SQL statement: SELECT page_id,user_id FROM pages_users WHERE  (1 = 0)
    15/01/09 14:31:51 WARN spi.Registration: Not loading URI patterns in org.kitesdk.data.spi.hive.Loader
    15/01/09 14:31:51 ERROR sqoop.Sqoop: Got exception running Sqoop: org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
    org.kitesdk.data.DatasetNotFoundException: Unknown dataset URI: hive?dataset=default.pages_users3
        at org.kitesdk.data.spi.Registration.lookupDatasetUri(Registration.java:109)
        at org.kitesdk.data.Datasets.create(Datasets.java:189)
        at org.kitesdk.data.Datasets.create(Datasets.java:240)
        at org.apache.sqoop.mapreduce.ParquetJob.createDataset(ParquetJob.java:81)
        at org.apache.sqoop.mapreduce.ParquetJob.configureImportJob(ParquetJob.java:70)
        at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:112)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:262)
        at org.apache.sqoop.manager.SqlManager.importQuery(SqlManager.java:721)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:499)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:236)

 
I have no problem importing data to hive file format but parquet is a problem... Do you have any idea why this occurs ?

Thank you :)

Posts: 1,826
Kudos: 406
Solutions: 292
Registered: ‎07-31-2013

Re: Sqoop import --as-parquetfile with CDH5

This is caused due to some missing classpath entries (for "hive" URIs) that Kite's Hive connector relies on.

You can circumvent the issue by manually exporting the below env-vars to resolve the issue, for ex.:

 

 

 

export HIVE_HOME=/opt/cloudera/parcels/CDH/lib/hive
export HCAT_HOME=/opt/cloudera/parcels/CDH/lib/hive-hcatalog
sqoop import --connect jdbc:mysql://xx.xx.xx.xx/database --username username --password mypass --query 'SELECT page_id,user_id FROM pages_users WHERE $CONDITIONS' --split-by page_id --hive-import --hive-table default.pages_users3 --target-dir hive_pages_users --as-parquetfile

 

This will be resolved in a future CDH release as well (i.e. removing the need to manually export the two env-vars above).

 

Does this help?

 

New Contributor
Posts: 1
Registered: ‎01-25-2016

Re: Sqoop import --as-parquetfile with CDH5

Hi, is this fixed? if yes, could you please let me know which version of CDH has this fix?

Or still do we need to manually export the two env-vars specified above? 

Highlighted
New Contributor
Posts: 2
Registered: ‎04-15-2016

Re: Sqoop import --as-parquetfile with CDH5

any answer to this question , we are struggling with the same issue ?

New Contributor
Posts: 2
Registered: ‎04-30-2016

Re: Sqoop import --as-parquetfile with CDH5

[ Edited ]

Hi guy, This works for me using the parquetfile with CDH5. I hope this helps!

 

sqoop import-all-tables \ -m 1 \ --connect jdbc:mysql://xx.xx.xx.xx/database \ --username= username \ --password= password \ --compression-codec=snappy \ --as-parquetfile \ --warehouse-dir=/user/hive/warehouse/ \ --hive-import

Announcements
New solutions