Member since
09-19-2020
46
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2295 | 07-13-2021 12:09 AM |
06-26-2021
04:05 AM
Thanks for the update. scala> val myRDD=spark.read.textFile("/devsh_loudacre/frostroad.txt") myRDD: org.apache.spark.sql.Dataset[String] = [value: string] why does myRDD.parallelize not working for above? scala> val myRDD1=sc.parallelize(myRDD) <console>:26: error: type mismatch; found : org.apache.spark.sql.Dataset[String] required: Seq[?] Error occurred in an application involving default arguments. val myRDD1=sc.parallelize(myRDD) Does the above mean a dataset has been created? what is the difference between the above and below? val myRDD2=sc.textFile("/devsh_loudacre/frostroad.txt") can I add the .parallelize function with the above command? Thanks, Roshan
... View more
06-25-2021
08:41 AM
I managed to fix it but getting error below because of XML types [root@sandbox-hdp lib]# sqoop job --exec myjob7 Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 21/06/25 15:32:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187 21/06/25 15:32:55 INFO manager.SqlManager: Using default fetchSize of 1000 Enter password: 21/06/25 15:33:09 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled. 21/06/25 15:33:09 INFO manager.SqlManager: Using default fetchSize of 1000 21/06/25 15:33:09 INFO tool.CodeGenTool: Beginning code generation 21/06/25 15:33:10 INFO manager.OracleManager: Time zone has been set to GMT 21/06/25 15:33:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM cb_account_master t WHERE 1=0 Exception in thread "main" java.lang.NoClassDefFoundError: oracle/xdb/XMLType at oracle.jdbc.oracore.OracleTypeADT.applyTDSpatches(OracleTypeADT.java:1081) at oracle.jdbc.oracore.OracleTypeADT.parseTDSrec(OracleTypeADT.java:1002) at oracle.jdbc.oracore.OracleTypeADT.parseTDS(OracleTypeADT.java:936) at oracle.jdbc.oracore.OracleTypeADT.init(OracleTypeADT.java:489) at oracle.jdbc.oracore.OracleTypeADT.init(OracleTypeADT.java:470) at oracle.sql.TypeDescriptor.getTypeDescriptor(TypeDescriptor.java:981) at oracle.jdbc.driver.NamedTypeAccessor.otypeFromName(NamedTypeAccessor.java:78) at oracle.jdbc.driver.TypeAccessor.initMetadata(TypeAccessor.java:71) at oracle.jdbc.driver.T4CPreparedStatement.executeForDescribe(T4CPreparedStatement.java:833) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:897) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1034) at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3820) at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3867) at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1502) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:777) at org.apache.sqoop.manager.SqlManager.execute(SqlManager.java:786) at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(SqlManager.java:289) at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(SqlManager.java:260) at org.apache.sqoop.manager.SqlManager.getColumnTypes(SqlManager.java:246) at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:328) at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1879) at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1672) at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:106) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:516) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:656) at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:248) at org.apache.sqoop.tool.JobTool.run(JobTool.java:303) at org.apache.sqoop.Sqoop.run(Sqoop.java:150) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249) at org.apache.sqoop.Sqoop.main(Sqoop.java:258) Caused by: java.lang.ClassNotFoundException: oracle.xdb.XMLType at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 33 more [root@sandbox-hdp lib]# Kindly advise. Thanks, Roshan
... View more
06-25-2021
07:47 AM
Hi,
I am using Cloudera Sandbox for Hortonwork. Can anyone help me with Sqoop? I am trying to do an Oracle JDBC connection?
[root@sandbox-hdp lib]# sqoop list-databases --connect jdbc:oracle:thin:@10.124.0.70:1523/BI
Warning: /usr/hdp/3.0.1.0-187/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hadoop/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/3.0.1.0-187/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
21/06/25 14:32:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7.3.0.1.0-187
21/06/25 14:32:10 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
21/06/25 14:32:10 INFO manager.SqlManager: Using default fetchSize of 1000
21/06/25 14:32:10 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
java.lang.RuntimeException: Could not load db driver class: oracle.jdbc.OracleDriver
at org.apache.sqoop.manager.OracleManager.makeConnection(OracleManager.java:287)
at org.apache.sqoop.manager.GenericJdbcManager.getConnection(GenericJdbcManager.java:59)
at org.apache.sqoop.manager.OracleManager.listDatabases(OracleManager.java:702)
at org.apache.sqoop.tool.ListDatabasesTool.run(ListDatabasesTool.java:49)
at org.apache.sqoop.Sqoop.run(Sqoop.java:150)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:186)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:240)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:249)
at org.apache.sqoop.Sqoop.main(Sqoop.java:258)
Thanks,
Roshan
... View more
Labels:
06-25-2021
06:53 AM
Which methods do you think would be most appropriate to use? I was thinking of using beehive to read the table/run query in real time and save it in parquet. The load from parqet to Kudu? or using Sqoop to read Oracle tables into HDFS. Then from HDFS to Kudu.
... View more
06-25-2021
04:35 AM
Hello Team, can you please advise if there is an equivalent of the function on Kudu below to extract address and location from a table with XML data type? select TMP_ACCOUNT_CODE_N,decode(EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'),'.',null,EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@STREET_DESC'))||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@SUB_LOCALITY_DESC') ||' '||EXTRACTVALUE (address_x, '//ADDRESS_DTLS/@CITY_DESC') New_installation_address from tmp_address_xml@cbsstandby where address_type_n = 4 Regards, Roshan
... View more
Labels:
- Labels:
-
Apache Kudu
06-24-2021
07:05 AM
Hello Team, I am working the tutorial on RDD. I am having some difficulties understanding some commands. Can you please advise what steps 3-8 do? . Encode the Schema in a string val schemaString = "name age" 4. Generate the schema based on the string of schema val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields) 5. Convert records of the RDD (people) to Rows val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0), attributes(1).trim)) 6. Apply the schema to the RDD val peopleDF = spark.createDataFrame(rowRDD, schema) 6. Creates a temporary view using the DataFrame peopleDF.createOrReplaceTempView("people") 7. SQL can be run over a temporary view created using DataFrames val results = spark.sql("SELECT name FROM people") 8.The results of SQL queries are DataFrames and support all the normal RDD operations. The columns of a row in the result can be accessed by field index or by field name results.map(attributes => "Name: " + attributes(0)).show() https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html Programmatically Specifying Schema What does the code below do? val ds = Seq(1, 2, 3).toDS() val ds = Seq(Person("Andy", 32)).toDS() Section Section DataSet API is clear. If we need to map the JSON file to a class we use the as(class name). So to map a file to a class we use the ".as[Classname]"? what does this command do? val ds = Seq(1, 2, 3).toDS() Thanks, Roshan
... View more
Labels:
- Labels:
-
Apache Spark
06-24-2021
06:43 AM
Hi @RangaReddy thanks a lot for sharing the link. It will help me a lot. Can you please advise why we have to include df (data frame name) before each column? df.select(df("name"), df("age") + 1).show() I noticed in groupBy() there is no df. Grateful if you can clarify this. Thanks, Roshan
... View more
06-24-2021
03:08 AM
Hi,
can you please advise why source path is not recognized on Windows?
SyntaxError: invalid syntax >>> spark-submit /D:/Spark/devsh/exercises/yarn/wordcount.py /devsh_loudacre/kb File "<stdin>", line 1 spark-submit /D:/Spark/devsh/exercises/yarn/wordcount.py /devsh_loudacre/kb
Thanks,
Roshan
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
06-23-2021
08:04 AM
I have been working with Oracle databases, in what way is DataFrames and DataSets similar to Oracle? Are they similar to views?
... View more
06-23-2021
07:56 AM
Hello Everyone, can you please tell me the difference between DataFrames and DataSets (with examples)? The explanations is still unclear http://spark.apache.org/docs/2.4.0/sql-programming-guide.html Thanks, Roshan
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Impala
-
Apache Spark