Member since
09-19-2020
46
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3783 | 07-13-2021 12:09 AM |
06-24-2021
07:05 AM
Hello Team, I am working the tutorial on RDD. I am having some difficulties understanding some commands. Can you please advise what steps 3-8 do? . Encode the Schema in a string val schemaString = "name age" 4. Generate the schema based on the string of schema val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable = true))
val schema = StructType(fields) 5. Convert records of the RDD (people) to Rows val rowRDD = peopleRDD.map(_.split(",")).map(attributes => Row(attributes(0), attributes(1).trim)) 6. Apply the schema to the RDD val peopleDF = spark.createDataFrame(rowRDD, schema) 6. Creates a temporary view using the DataFrame peopleDF.createOrReplaceTempView("people") 7. SQL can be run over a temporary view created using DataFrames val results = spark.sql("SELECT name FROM people") 8.The results of SQL queries are DataFrames and support all the normal RDD operations. The columns of a row in the result can be accessed by field index or by field name results.map(attributes => "Name: " + attributes(0)).show() https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html Programmatically Specifying Schema What does the code below do? val ds = Seq(1, 2, 3).toDS() val ds = Seq(Person("Andy", 32)).toDS() Section Section DataSet API is clear. If we need to map the JSON file to a class we use the as(class name). So to map a file to a class we use the ".as[Classname]"? what does this command do? val ds = Seq(1, 2, 3).toDS() Thanks, Roshan
... View more
Labels:
- Labels:
-
Apache Spark
06-24-2021
06:43 AM
Hi @RangaReddy thanks a lot for sharing the link. It will help me a lot. Can you please advise why we have to include df (data frame name) before each column? df.select(df("name"), df("age") + 1).show() I noticed in groupBy() there is no df. Grateful if you can clarify this. Thanks, Roshan
... View more
06-23-2021
08:04 AM
I have been working with Oracle databases, in what way is DataFrames and DataSets similar to Oracle? Are they similar to views?
... View more
06-23-2021
07:56 AM
Hello Everyone, can you please tell me the difference between DataFrames and DataSets (with examples)? The explanations is still unclear http://spark.apache.org/docs/2.4.0/sql-programming-guide.html Thanks, Roshan
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Impala
-
Apache Spark
06-22-2021
02:54 AM
Hi, can you please advise on the steps to build the workflows on Hue? Thanks, Roshan
... View more
06-15-2021
05:30 AM
Hello Alex, I have checked the port. netstat -tulpn | grep impalad tcp 0 0 <IP>:27000 0.0.0.0:* LISTEN 23290/impalad tcp 0 0 0.0.0.0:25000 0.0.0.0:* LISTEN 23290/impalad tcp6 0 0 :::22000 :::* LISTEN 23290/impalad tcp6 0 0 :::23000 :::* LISTEN 23290/impalad the port is 23290 I checked on Cloudera whether there is SSL for impala. Only Ranger configured. [root@rb-hadoop-02 admin]# impala-shell -i <IP>:23290 --user=hive --ssl Starting Impala Shell without Kerberos authentication SSL is enabled. Impala server certificates will NOT be verified (set --ca_cert to change) No handlers could be found for logger "thrift.transport.TSSLSocket" Error connecting: TTransportException, Could not connect to <IP>:23290: [Errno 111] Connection refused *********************************************************************************** Welcome to the Impala shell. (Impala Shell v3.4.0-SNAPSHOT (134517e) built on Sat Dec 12 11:15:02 UTC 2020) After running a query, type SUMMARY to see a summary of where time was spent. *********************************************************************************** [Not connected] > exit; Regards, Roshan
... View more
06-13-2021
07:19 PM
Hi, thanks for the update. Please find below details: impalad version 3.4.0-SNAPSHOT RELEASE (build 134517e42b7b6085e758195465f956f431e0e575) Built on Sat Dec 12 11:15:02 UTC 2020 Version: Cloudera Enterprise 7.1.3 (#4999720 built by jenkins on 20200805-1701 git: fa596184790377f07ba80e9cd4da8b875237939c) Java VM Name: OpenJDK 64-Bit Server VM Java Version: 11.0.10 Thanks, Roshan
... View more
- « Previous
- Next »