Member since
09-19-2020
46
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3785 | 07-13-2021 12:09 AM |
07-02-2021
04:16 AM
Dear Team, how can I mask last 5 digits in field below in Kudu? on Ranger? Thanks, Roshan
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Kudu
-
Apache Ranger
07-01-2021
04:30 AM
Hello Team, how can we extract XML values from the column below (type string) from Impala (Kudu) Regards, Roshan
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Kudu
06-28-2021
10:41 PM
Hi, is there a way we can change the datatype from int to double for a column on Impala? Table size is around 3 billion records. I do not plan to drop and recreate it. ALTER TABLE cbs.gprs_home_cdrs CHANGE percentage_val_n percentage_val_n double; AnalysisException: Cannot change the type of a Kudu column using an ALTER TABLE CHANGE COLUMN statement: (INT vs DOUBLE) impalad version 3.4.0-SNAPSHOT RELEASE (build 134517e42b7b6085e758195465f956f431e0e575) Built on Sat Dec 12 11:15:02 UTC 2020 Version: Cloudera Enterprise 7.1.3 (#4999720 built by jenkins on 20200805-1701 git: fa596184790377f07ba80e9cd4da8b875237939c) Java VM Name: OpenJDK 64-Bit Server VM Java Version: 11.0.10 Thanks, Roshan
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
-
Apache Kudu
06-27-2021
03:51 AM
What does the code below do? val conf = new SparkConf().setMaster("local").setAppName("testApp") val sc= SparkContext.getOrCreate(conf) Reference: https://www.educba.com/spark-rdd-operations/
... View more
06-26-2021
07:18 AM
Hi @aakulov thanks for the update. Can you please advise how can I schedule this scoop job so that it will update the hive table with incremental changes(CDC)? for example, suppose the XML fields are updated on Oracle, how can I schedule scoop job to replicate the incremental changes on Hive and Kudu? Regards, Roshan
... View more
06-26-2021
04:05 AM
Thanks for the update. scala> val myRDD=spark.read.textFile("/devsh_loudacre/frostroad.txt") myRDD: org.apache.spark.sql.Dataset[String] = [value: string] why does myRDD.parallelize not working for above? scala> val myRDD1=sc.parallelize(myRDD) <console>:26: error: type mismatch; found : org.apache.spark.sql.Dataset[String] required: Seq[?] Error occurred in an application involving default arguments. val myRDD1=sc.parallelize(myRDD) Does the above mean a dataset has been created? what is the difference between the above and below? val myRDD2=sc.textFile("/devsh_loudacre/frostroad.txt") can I add the .parallelize function with the above command? Thanks, Roshan
... View more
06-25-2021
08:41 AM
I managed to fix it but getting error below because of XML types [root@sandbox-hdp lib]# sqoop job --exec myjob7 Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 21/06/25 15:32:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187 21/06/25 15:32:55 INFO manager.SqlManager: Using default fetchSize of 1000 Enter password: 21/06/25 15:33:09 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 21/06/25 15:33:09 INFO manager.SqlManager: Using default fetchSize of 1000 21/06/25 15:33:09 INFO tool.CodeGenTool: Beginning code generation 21/06/25 15:33:10 INFO manager.OracleManager: Time zone has been set to GMT 21/06/25 15:33:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM cb_account_master t WHERE 1=0 Exception in thread "main" java.lang.NoClassDefFoundError: oracle/xdb/XMLType at oracle.jdbc.oracore.OracleTypeADT.applyTDSpatches(OracleTypeADT.java:1081) at oracle.jdbc.oracore.OracleTypeADT.parseTDSrec(OracleTypeADT.java:1002) at oracle.jdbc.oracore.OracleTypeADT.parseTDS(OracleTypeADT.java:936) at oracle.jdbc.oracore.OracleTypeADT.init(OracleTypeADT.java:489) at oracle.jdbc.oracore.OracleTypeADT.init(OracleTypeADT.java:470) at oracle.sql.TypeDescriptor.getTypeDescriptor(TypeDescriptor.java:981) at oracle.jdbc.driver.NamedTypeAccessor.otypeFromName(NamedTypeAccessor.java:78) at oracle.jdbc.driver.TypeAccessor.initMetadata(TypeAccessor.java:71) at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:833) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:897) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1034) at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3820) at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3867) at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1502) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:777) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1879) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1672) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:516) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:656) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:248) at org.apache.sqoop.tool.JobTool.run(JobTool.java:303) at org.apache.sqoop.Sqoop.run(Sqoop.java:150) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249) at org.apache.sqoop.Sqoop.main(Sqoop.java:258) Caused by: java.lang.ClassNotFoundException: oracle.xdb.XMLType at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 33 more [root@sandbox-hdp lib]# Kindly advise. Thanks, Roshan
... View more
06-25-2021
07:47 AM
Hi,
I am using Cloudera Sandbox for Hortonwork. Can anyone help me with Sqoop? I am trying to do an Oracle JDBC connection?
[root@sandbox-hdp lib]# sqoop list-databases --connect jdbc:oracle:thin:@10.124.0.70:1523/BI
Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/06/25 14:32:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187
21/06/25 14:32:10 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
21/06/25 14:32:10 INFO manager.SqlManager: Using default fetchSize of 1000
21/06/25 14:32:10 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:287)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:702)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:150)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249)
at org.apache.sqoop.Sqoop.main(Sqoop.java:258)
Thanks,
Roshan
... View more
Labels:
06-25-2021
06:53 AM
Which methods do you think would be most appropriate to use? I was thinking of using beehive to read the table/run query in real time and save it in parquet. The load from parqet to Kudu? or using Sqoop to read Oracle tables into HDFS. Then from HDFS to Kudu.
... View more
06-25-2021
04:35 AM
Hello Team, can you please advise if there is an equivalent of the function on Kudu below to extract address and location from a table with XML data type? select TMP_ACCOUNT_CODE_N,decode(EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'),'.',null,EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'))||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@SUB_LOCALITY_DESC') ||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@CITY_DESC') New_installation_address from tmp_address_xml@cbsstandby where address_type_n = 4 Regards, Roshan
... View more
Labels:
- Labels:
-
Apache Kudu