Member since
04-03-2020
25
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1263 | 06-03-2020 12:54 PM |
10-14-2020
11:45 AM
I've recently created the following table in Impala.... create table hpchistperf.oozie_job_stats ( name string, start_time timestamp, end_time timestamp, rows_inserted integer, last_id_loaded integer, primary key(name,start_time)) PARTITION BY HASH PARTITIONS 16 stored as kudu; I read that there was no problem creating the table in Impala and it could be accessed in Hive since they share the same metadata. The problem is that when I run the query on Hive it gives me the following error: FAILED: RuntimeException java.lang.ClassNotFoundException: org.apache.kudu.mapreduce.KuduTableInputFormat
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
07-09-2020
08:40 AM
Thanks for the reply, I originally didn't know i had to use a meta connection or a metastore until I found that error. So I really don't have a configuration considered yet. What would you recommend me to do? Is it just matter of adding the --meta-connection parameter?
... View more
07-08-2020
01:43 PM
I'm scheduling a sqoop incremental job using a shell script, but it behaves differently from a conventional job, it gives me the following error: 20/07/07 15:38:26 ERROR tool.JobTool: I/O error performing job operation: java.io.IOException: Exception creating SQL connection at org.apache.sqoop.metastore.hsqldb.HsqldbJobStorage.init(HsqldbJobStorage.java:217) at org.apache.sqoop.metastore.hsqldb.AutoHsqldbStorage.open(AutoHsqldbStorage.java:112) at org.apache.sqoop.tool.JobTool.run(JobTool.java:290) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.sql.SQLException: File input/output error /home/denap0p/.sqoop/metastore.db.properties java.io.FileNotFoundException: /home/denap0p/.sqoop/metastore.db.properties.new (No such file or directory) the script has the following instructions: sqoop job \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/denap0p/sqlserver.jceks \ -Dsqoop.export.records.per.statement=1 \ --create ch_job_inc \ -- import \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username denap0p \ --password-alias sqlserver.password \ --query 'select * from dbo.Job where $CONDITIONS' \ --incremental lastmodified \ --check-column endtime \ --hcatalog-database hpchistperf \ --hcatalog-table ch_job \ --split-by ID \ --merge-key Id \ --hcatalog-storage-stanza 'stored as parquet' sqoop job --exec ch_job_inc Based on the information I've read, it seems that I need to use a metastore. We are using Cloudera CDH-5.14.2-1.cdh5.14.2.p0.3 Any advise is welcome. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Oozie
-
Apache Sqoop
-
Cloudera Hue
06-12-2020
08:06 AM
Thank you Eric, We basically want to grab the last value of an id, so we can use it as a reference to resume an incremental load into Hive. Once we have the last "id" value in Hive, we can go back to our SQL Server and select all the rows which are greater than that id to pull the new/remaining data into Hive. We were initially using --incremental lastmodified to do incremental loads, but there are some source tables that don't have a datetime column and sqoop prevents us to do an incremental load if we don't have a datetime column in the source: Append mode for imports is not compatible with HCatalog.
... View more
06-12-2020
07:58 AM
Hello guys, Based on your help, I was able to create incremental loads in sqoop using Hive as a target database. I use the lastmodified option of sqoop: --incremental lastmodified Unfortunately, the --incremental append mode doesn't work with Hive: Append mode for imports is not compatible with HCatalog. Please remove the parameter--append-mode We've found some source tables in SQL Server which don't have a datetime column, this situation prevents us to do incremental loads with sqoop. Is there an alternative way to do incremental loads with those tables that don't have a datetime column? We cannot modify the source tables.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Sqoop
06-11-2020
05:12 AM
Thank you Eric, I changed the form of the query: --query 'select max(id) as maxid from NodeName where $CONDITIONS' we had some progress, but for some reason sqoop is translating the query into the following form: 20/06/11 08:09:49 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(id), MAX(id) FROM (select max(id) as maxid from NodeName where (1 = 1) ) AS t1 20/06/11 08:09:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area /user/lrm0613/.staging/job_1590805442005_23634 20/06/11 08:09:49 WARN security.UserGroupInformation: PriviledgedActionException as:lrm0613@JACKSONNATIONAL.COM (auth:KERBEROS) cause:java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1 Invalid column name 'id'. 20/06/11 08:09:49 ERROR tool.ImportTool: Import failed: java.io.IOException: net.sourceforge.jtds.jdbc.AQSQLException: Id 207, Level 16, State 1, Line 1 Invalid column name 'id'. Any ideas?
... View more
06-10-2020
01:44 PM
sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select max(id) from NodeName a where $CONDITIONS' \ --split-by a.id \ --target-dir /user/lrm0613/queries_output/tt
Generates the following error:
20/06/10 16:37:51 ERROR tool.ImportTool: Import failed: We found column without column name. Please verify that you've entered all column names in your query if using free form query import (consider adding clause AS if you're using column transformation)
I've noticed that if I remove the max() function the job completes successfully. How can I make it work with the function.
... View more
Labels:
- Labels:
-
Apache Sqoop
-
Security
06-09-2020
06:26 AM
Hello guys!
I'm trying to run the following commands in Hive:
use dataengsandbox; hive.mapred.supports.subdirectories=true; insert overwrite directory '/tmp/hmb/test' select max(id) from nodename;
If I remove the text marked in orange it works without issues, but I want my output to be written into a HDFS directory.
Here is the error that I've received:
Error while compiling statement: FAILED: SemanticException No valid privileges User lrm0613 does not have privileges for QUERY The required privileges: Server=server1->URI=hdfs://nameservice1/tmp/hmb/test->action=*;
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
HDFS
06-06-2020
02:28 PM
1 Kudo
Finally, Our DBA Team re-created the database correctly this time, and we no longer have this issue. We believe that the root cause was that the directory in the HDFS had an uppercase typo.
... View more
06-03-2020
12:54 PM
1 Kudo
Made it work! 🙂 I had to use --map-column-hive sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select * from Job where id=0 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Job \ --compress \ --compression-codec snappy \ --map-column-hive 'excludednodes=varchar(160)','errorparams=varchar(160)' \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'
... View more
06-02-2020
12:21 PM
Hello guys, I'm trying to run full import from a table in SQL Server. But sqoop is interpreting the data type incorrectly. Column in SQL Server: ErrorParams (nvarchar(max)) Sqoop translates it to ErrorParams varchar, which generates an error. create table `dataengsandbox`.`task` ( `id` INT, `recordid` int, `state` int, `prevstate` int, `instanceid` int, `parentjobid` int, `closed` boolean, `requeuecount` int, `autorequeuecount` int, `modifytime` string, `submittime` string, `starttime` string, `endtime` string, `createtime` string, `timestamp` binary, `pendingreason` int, `failurereason` int, `exitcode` int, `failednodeid` int, `errorcode` int, `errorparams` varchar, `output` string, `executionfailureretrycount` int) stored as parquet I would expect it to translate it to ErrorParams varchar(64) instead, or another value. FAILED: ParseException line 22:22 mismatched input ',' expecting ( near 'varchar' in primitive type specification Is there a way to make sqoop cast it correctly? Sqoop job: sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Scheduler' \ --username 'lrm0613' \ --query 'select * from Task where id=0 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Task \ --compress \ --compression-codec snappy \ --map-column-hive id=INT \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'
... View more
Labels:
- Labels:
-
Apache Sqoop
05-29-2020
10:27 AM
Thanks for your answer!
That solution might work, but we've also ran a drop table in a different database, same job, and it didn't required to delete the fs. At this point we suspect that something is wrong in the database configuration.
We've requested our DBA team to re-create the database because they didn't follow the same procedure as the database that works properly.
Thank you very much for the help. It's really appreciated.
... View more
05-27-2020
07:01 AM
Thanks, I gave it a try, but it still keeps the data before the table was dropped. select count(*) from dataengsandbox.test; ALTER TABLE test SET TBLPROPERTIES('EXTERNAL'='False'); drop table test purge; select count(*) from dataengsandbox.test; -> Gave me the double of the records. describe formatted test; 46 Location: hdfs://nameservice1/data/DataEngSandbox/test NULL 47 Table Type: MANAGED_TABLE NULL 48 Table Parameters: NULL NULL 49 COLUMN_STATS_ACCURATE false 50 external.table.purge true 51 numFiles 1 52 numRows -1 53 rawDataSize -1 54 totalSize 19098747 55 transient_lastDdlTime 1590587689 56 NULL NULL 57 # Storage Information NULL NULL 58 SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe NULL 59 InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat NULL 60 OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat NULL 61 Compressed: No NULL 62 Num Buckets: -1 NULL 63 Bucket Columns: [] NULL 64 Sort Columns: [] NULL 65 Storage Desc Params: NULL NULL
... View more
05-26-2020
01:35 PM
Thanks for your answer! This is a good learning experiencing for me. I didn't know we could add the property of a table when creating the table. Unfortunately, the table data still persists after the drop table command. table name: test When we create the table test, it is stored in: /data/<database>/test When we use a different database where the issue is not present, the table is stored in /data/<database>/hive/test Could it be a database configuration issue?
... View more
05-26-2020
06:32 AM
Thank you for responding! We've verified and the underlying files are not being deleted from Hadoop.
... View more
05-20-2020
10:33 AM
Hello guys! I hope your day is going well. Something really weird (at least for me) is happening when I drop a table in Hive. 1.- A table called "Test" is created by a sqoop import job using "--create-hcatalog-table". See the full job below. 2.-Once the import job completes (with a select * from the source), I drop the table without issues: drop table test purge; At this point I expect that all the information is deleted, gone, bye, finito, adios! 3.-I run the same sqoop job again, but it not only loads the table with the fresh information from the source. It also shows me the previous information before the drop table! Some guys have told me that's because the table has been created as an external table by the sqoop job. But even that it was an external table, a drop table <table> purge; is used to get rid of the data, isn't it? Pls let me know your thoughts. Sqoop job: sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select * from JobHistory where $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table Test \ --compress \ --compression-codec snappy \ --map-column-hive jobid=INT \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
05-06-2020
02:15 PM
Hi guys, I want the Hive table created by sqoop to have two partitions: One for 2019 and another for 2020. But for some reason it is putting "2019,2020" in the year column. Command: sqoop import \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ --connection-manager org.apache.sqoop.manager.SQLServerManager \ --driver net.sourceforge.jtds.jdbc.Driver \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \ --username 'lrm0613' \ --password-alias sqlserver2.password \ --query 'select jobhistoryid, requeueid, jobid,name,event,eventtime,submittime,starttime,service,year(EventTime) as year from JobHistory where year(EventTime)=2019 or year(EventTime)=2020 and $CONDITIONS' \ --hcatalog-database dataengsandbox \ --hcatalog-table JobHistoryPartitioned \ --hive-partition-key year \ --hive-partition-value 2019,2020 \ --compress \ --compression-codec snappy \ --map-column-hive jobid=int \ -m 1 \ --create-hcatalog-table \ --hcatalog-storage-stanza 'stored as parquet'
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
04-17-2020
05:56 AM
Thanks a lot for your answer aakulov! I changed the command to the following format: qoop job \ -Dhadoop.security.credential.provider.path=jceks://hdfs/user/lrm0613/mydb2.jceks \ -Dsqoop.export.records.per.statement=1 \ --create incjob \ -- import \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \ --username lrm0613 \ --password-alias sqlserver2.password \ --query 'select * from JobHistory where $CONDITIONS' \ --incremental lastmodified \ --check-column EventTime \ --hcatalog-database dataengsandbox \ --hcatalog-table JobHistory \ -m 1 \ --merge-key EventTime It works now, however, I found that the query that it internally runs is the following: select * from JobHistory where [EventTime] < '2020-04-17 08:51:00.54' AND (1 = 0) 8:51 is the time where I ran it, but not the time since the last row was inserted. Will it then lose the rows inserted between the last time I ran it and now?
... View more
04-13-2020
01:13 PM
I'm trying to run an incremental import job on SQL Server: sqoop job --create incjob \ -- import \ --connect 'jdbc:jtds:sqlserver://SQLQP002:1433;useNTLMv2=true;domain=JNL_NT;databaseName=TC31Reporting' \ --driver net.sourceforge.jtds.jdbc.Driver \ --username lrm0613 \ -P \ --table JobHistory \ --incremental lastmodified \ --check-column JobId \ --hcatalog-database dataengsandbox \ --hcatalog-table JobHistory \ -m 1 \ --merge-key JobId When I run the job, I get the following error: $ sqoop job --exec incjob Warning: /opt/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/bin/../lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. 20/04/13 15:58:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2 Enter password: 20/04/13 15:58:48 WARN tool.BaseSqoopTool: Input field/record delimiter options are not used in HCatalog jobs unless the format is text. It is better to use --hive-import in those cases. For text formats 20/04/13 15:58:48 WARN tool.BaseSqoopTool: Output field/record delimiter options are not useful in HCatalog jobs for most of the output types except text based formats is text. It is better to use --hive-import in those cases. For non text formats, 20/04/13 15:58:48 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 20/04/13 15:58:48 INFO manager.SqlManager: Using default fetchSize of 1000 20/04/13 15:58:48 INFO tool.CodeGenTool: Beginning code generation 20/04/13 15:58:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0 20/04/13 15:58:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0 20/04/13 15:58:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH/lib/hadoop-mapreduce Note: /tmp/sqoop-lrm0613/compile/f11a4812c94c4f349a579536733c37cb/JobHistory.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 20/04/13 15:58:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-lrm0613/compile/f11a4812c94c4f349a579536733c37cb/JobHistory.jar 20/04/13 15:58:53 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM JobHistory AS t WHERE 1=0 20/04/13 15:58:53 ERROR manager.SqlManager: SQL exception accessing current timestamp: net.sourceforge.jtds.jdbc.AQSQLException: Id 102, Level 15, State 1, Line 1 Incorrect syntax near ')'. net.sourceforge.jtds.jdbc.AQSQLException: Id 102, Level 15, State 1, Line 1 Incorrect syntax near ')'. at net.sourceforge.jtds.jdbc.SQLDiagnostic.addDiagnostic(SQLDiagnostic.java:377) at net.sourceforge.jtds.jdbc.TdsCore.tdsErrorToken(TdsCore.java:2988) at net.sourceforge.jtds.jdbc.TdsCore.nextToken(TdsCore.java:2335) at net.sourceforge.jtds.jdbc.TdsCore.getMoreResults(TdsCore.java:653) at net.sourceforge.jtds.jdbc.JtdsStatement.executeSQLQuery(JtdsStatement.java:477) at net.sourceforge.jtds.jdbc.JtdsStatement.executeQuery(JtdsStatement.java:1304) at org.apache.sqoop.manager.SqlManager.getCurrentDbTimestamp(SqlManager.java:987) at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:334) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:504) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244) at org.apache.sqoop.tool.JobTool.run(JobTool.java:299) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) 20/04/13 15:58:53 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not get current time from database at org.apache.sqoop.tool.ImportTool.initIncrementalConstraints(ImportTool.java:336) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:504) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:244) at org.apache.sqoop.tool.JobTool.run(JobTool.java:299) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252)
... View more
- Tags:
- sql-server
- Sqoop
Labels:
- Labels:
-
Apache Sqoop