Member since
04-03-2020
25
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2369 | 06-03-2020 12:54 PM |
10-14-2020
11:45 AM
I've recently created the following table in Impala.... create table hpchistperf.oozie_job_stats ( name string, start_time timestamp, end_time timestamp, rows_inserted integer, last_id_loaded integer, primary key(name,start_time)) PARTITION BY HASH PARTITIONS 16 stored as kudu; I read that there was no problem creating the table in Impala and it could be accessed in Hive since they share the same metadata. The problem is that when I run the query on Hive it gives me the following error: FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.kudu.mapreduce.KuduTableInputFormat
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
06-12-2020
08:06 AM
Thank you Eric, We basically want to grab the last value of an id, so we can use it as a reference to resume an incremental load into Hive. Once we have the last "id" value in Hive, we can go back to our SQL Server and select all the rows which are greater than that id to pull the new/remaining data into Hive. We were initially using --incremental lastmodified to do incremental loads, but there are some source tables that don't have a datetime column and sqoop prevents us to do an incremental load if we don't have a datetime column in the source: Append mode for imports is not compatible with HCatalog.
... View more
06-11-2020
05:12 AM
Thank you Eric, I changed the form of the query: --query 'select max(id) as maxid from NodeName where $CONDITIONS' we had some progress, but for some reason sqoop is translating the query into the following form: 20/06/11 08:09:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select max(id) as maxid from NodeName where (1 = 1) ) AS t1 20/06/11 08:09:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/lrm0613/.staging/job_1590805442005_23634 20/06/11 08:09:49 WARN security.UserGroupInformation: PriviledgedActionException as:lrm0613@JACKSONNATIONAL.COM (auth:KERBEROS) cause:java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1 Invalid column name 'id'. 20/06/11 08:09:49 ERROR tool.ImportTool: Import failed: java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1 Invalid column name 'id'. Any ideas?
... View more
06-10-2020
01:44 PM
sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select max(id) from NodeName a where $CONDITIONS' \ --split-by a.id \ --target-dir /user/lrm0613/queries_output/tt
Generates the following error:
20/06/10 16:37:51 ERROR tool.ImportTool: Import failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)
I've noticed that if I remove the max() function the job completes successfully. How can I make it work with the function.
... View more
Labels:
- Labels:
-
Apache Sqoop
-
Security
06-09-2020
06:26 AM
Hello guys!
I'm trying to run the following commands in Hive:
use dataengsandbox; hive.mapred.supports.subdirectories=true; insert overwrite directory '/tmp/hmb/test' select max(id) from nodename;
If I remove the text marked in orange it works without issues, but I want my output to be written into a HDFS directory.
Here is the error that I've received:
Error while compiling statement: FAILED: SemanticException No valid privileges User lrm0613 does not have privileges for QUERY The required privileges: Server=server1->URI=hdfs://nameservice1/tmp/hmb/test->action=*;
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
HDFS
06-06-2020
02:28 PM
1 Kudo
Finally, Our DBA Team re-created the database correctly this time, and we no longer have this issue. We believe that the root cause was that the directory in the HDFS had an uppercase typo.
... View more
06-03-2020
12:54 PM
1 Kudo
Made it work! 🙂 I had to use --map-column-hive sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select * from Job where id=0 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Job \ --compress \ --compression-codec snappy \ --map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'
... View more
06-02-2020
12:21 PM
Hello guys, I'm trying to run full import from a table in SQL Server. But sqoop is interpreting the data type incorrectly. Column in SQL Server: ErrorParams (nvarchar(max)) Sqoop translates it to ErrorParams varchar, which generates an error. create table `dataengsandbox`.`task` ( `id` INT, `recordid` int, `state` int, `prevstate` int, `instanceid` int, `parentjobid` int, `closed` boolean, `requeuecount` int, `autorequeuecount` int, `modifytime` string, `submittime` string, `starttime` string, `endtime` string, `createtime` string, `timestamp` binary, `pendingreason` int, `failurereason` int, `exitcode` int, `failednodeid` int, `errorcode` int, `errorparams` varchar, `output` string, `executionfailureretrycount` int) stored as parquet I would expect it to translate it to ErrorParams varchar(64) instead, or another value. FAILED: ParseException line 22:22 mismatched input ',' expecting ( near 'varchar' in primitive type specification Is there a way to make sqoop cast it correctly? Sqoop job: sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --query 'select * from Task where id=0 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Task \ --compress \ --compression-codec snappy \ --map-column-hive id=INT \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'
... View more
Labels:
- Labels:
-
Apache Sqoop
05-29-2020
10:27 AM
Thanks for your answer!
That solution might work, but we've also ran a drop table in a different database, same job, and it didn't required to delete the fs. At this point we suspect that something is wrong in the database configuration.
We've requested our DBA team to re-create the database because they didn't follow the same procedure as the database that works properly.
Thank you very much for the help. It's really appreciated.
... View more
05-27-2020
07:01 AM
Thanks, I gave it a try, but it still keeps the data before the table was dropped. select count(*) from dataengsandbox.test; ALTER TABLE test SET TBLPROPERTIES('EXTERNAL'='False'); drop table test purge; select count(*) from dataengsandbox.test; -> Gave me the double of the records. describe formatted test; 46 Location: hdfs://nameservice1/data/DataEngSandbox/test NULL 47 Table Type: MANAGED_TABLE NULL 48 Table Parameters: NULL NULL 49 COLUMN_STATS_ACCURATE false 50 external.table.purge true 51 numFiles 1 52 numRows -1 53 rawDataSize -1 54 totalSize 19098747 55 transient_lastDdlTime 1590587689 56 NULL NULL 57 # Storage Information NULL NULL 58 SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL 59 InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat NULL 60 OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat NULL 61 Compressed: No NULL 62 Num Buckets: -1 NULL 63 Bucket Columns: [] NULL 64 Sort Columns: [] NULL 65 Storage Desc Params: NULL NULL
... View more