About Heri

Heri · ‎10-14-2020

I've recently created the following table in Impala.... create table hpchistperf.oozie_job_stats ( name string, start_time timestamp, end_time timestamp, rows_inserted integer, last_id_loaded integer, primary key(name,start_time)) PARTITION BY HASH PARTITIONS 16 stored as kudu; I read that there was no problem creating the table in Impala and it could be accessed in Hive since they share the same metadata. The problem is that when I run the query on Hive it gives me the following error: FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.kudu.mapreduce.KuduTableInputFormat

Heri · ‎06-12-2020

Thank you Eric, We basically want to grab the last value of an id, so we can use it as a reference to resume an incremental load into Hive. Once we have the last "id" value in Hive, we can go back to our SQL Server and select all the rows which are greater than that id to pull the new/remaining data into Hive. We were initially using --incremental lastmodified to do incremental loads, but there are some source tables that don't have a datetime column and sqoop prevents us to do an incremental load if we don't have a datetime column in the source: Append mode for imports is not compatible with HCatalog.

Heri · ‎06-11-2020

Thank you Eric, I changed the form of the query: --query 'select max(id) as maxid from NodeName where $CONDITIONS' we had some progress, but for some reason sqoop is translating the query into the following form: 20/06/11 08:09:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select max(id) as maxid from NodeName where (1 = 1) ) AS t1 20/06/11 08:09:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/lrm0613/.staging/job_1590805442005_23634 20/06/11 08:09:49 WARN security.UserGroupInformation: PriviledgedActionException as:lrm0613@JACKSONNATIONAL.COM (auth:KERBEROS) cause:java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1 Invalid column name 'id'. 20/06/11 08:09:49 ERROR tool.ImportTool: Import failed: java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1 Invalid column name 'id'. Any ideas?

Heri · ‎06-10-2020

sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select max(id) from NodeName a where $CONDITIONS' \ --split-by a.id \ --target-dir /user/lrm0613/queries_output/tt Generates the following error: 20/06/10 16:37:51 ERROR tool.ImportTool: Import failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation) I've noticed that if I remove the max() function the job completes successfully. How can I make it work with the function.

Heri · ‎06-09-2020

Hello guys! I'm trying to run the following commands in Hive: use dataengsandbox; hive.mapred.supports.subdirectories=true; insert overwrite directory '/tmp/hmb/test' select max(id) from nodename; If I remove the text marked in orange it works without issues, but I want my output to be written into a HDFS directory. Here is the error that I've received: Error while compiling statement: FAILED: SemanticException No valid privileges User lrm0613 does not have privileges for QUERY The required privileges: Server=server1->URI=hdfs://nameservice1/tmp/hmb/test->action=*;

Heri · ‎06-06-2020

Finally, Our DBA Team re-created the database correctly this time, and we no longer have this issue. We believe that the root cause was that the directory in the HDFS had an uppercase typo.

Heri · ‎06-03-2020

Made it work! 🙂 I had to use --map-column-hive sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select * from Job where id=0 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Job \ --compress \ --compression-codec snappy \ --map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'

Heri · ‎06-02-2020

Hello guys, I'm trying to run full import from a table in SQL Server. But sqoop is interpreting the data type incorrectly. Column in SQL Server: ErrorParams (nvarchar(max)) Sqoop translates it to ErrorParams varchar, which generates an error. create table `dataengsandbox`.`task` ( `id` INT, `recordid` int, `state` int, `prevstate` int, `instanceid` int, `parentjobid` int, `closed` boolean, `requeuecount` int, `autorequeuecount` int, `modifytime` string, `submittime` string, `starttime` string, `endtime` string, `createtime` string, `timestamp` binary, `pendingreason` int, `failurereason` int, `exitcode` int, `failednodeid` int, `errorcode` int, `errorparams` varchar, `output` string, `executionfailureretrycount` int) stored as parquet I would expect it to translate it to ErrorParams varchar(64) instead, or another value. FAILED: ParseException line 22:22 mismatched input ',' expecting ( near 'varchar' in primitive type specification Is there a way to make sqoop cast it correctly? Sqoop job: sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --query 'select * from Task where id=0 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Task \ --compress \ --compression-codec snappy \ --map-column-hive id=INT \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'

Heri · ‎05-29-2020

Thanks for your answer! That solution might work, but we've also ran a drop table in a different database, same job, and it didn't required to delete the fs. At this point we suspect that something is wrong in the database configuration. We've requested our DBA team to re-create the database because they didn't follow the same procedure as the database that works properly. Thank you very much for the help. It's really appreciated.

Heri · ‎05-27-2020

Thanks, I gave it a try, but it still keeps the data before the table was dropped. select count(*) from dataengsandbox.test; ALTER TABLE test SET TBLPROPERTIES('EXTERNAL'='False'); drop table test purge; select count(*) from dataengsandbox.test; -> Gave me the double of the records. describe formatted test; 46 Location: hdfs://nameservice1/data/DataEngSandbox/test NULL 47 Table Type: MANAGED_TABLE NULL 48 Table Parameters: NULL NULL 49 COLUMN_STATS_ACCURATE false 50 external.table.purge true 51 numFiles 1 52 numRows -1 53 rawDataSize -1 54 totalSize 19098747 55 transient_lastDdlTime 1590587689 56 NULL NULL 57 # Storage Information NULL NULL 58 SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL 59 InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat NULL 60 OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat NULL 61 Compressed: No NULL 62 Num Buckets: -1 NULL 63 Bucket Columns: [] NULL 64 Sort Columns: [] NULL 65 Storage Desc Params: NULL NULL

Online	Offline
Last Visited	‎10-23-2020 03:18 PM

Member Since	‎04-03-2020 11:57 AM
Last Visited	‎10-23-2020 03:18 PM
Posts	25
Kudos received	2

Cloudera Community

Re: Sqoop translates one column incorrectly when w...

Access a partitioned table created in Impala using...

Re: max() function generating an error in sqoop

Re: max() function generating an error in sqoop

max() function generating an error in sqoop

Strange error on Hive

Re: Drop table not working as expected in Hive

Re: Sqoop translates one column incorrectly when w...

Sqoop translates one column incorrectly when we im...

Re: Drop table not working as expected in Hive

Re: Drop table not working as expected in Hive