Created 06-23-2021 07:56 AM
Hello Everyone,
can you please tell me the difference between DataFrames and DataSets (with examples)?
The explanations is still unclear
http://spark.apache.org/docs/2.4.0/sql-programming-guide.html
Thanks,
Roshan
Created 06-24-2021 04:25 AM
Hi you can find following tutorial.
https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html
Created 06-24-2021 08:21 AM
Hi @roshanbi
Apache Spark offers several methods to use when selecting a column.
Scala Spark:
// Scala
import org.apache.spark.sql.functions.{expr, col, column}
// 6 ways to select a column
df.select(df.col("ColumnName"))
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(`ColumnName)
df.select($"ColumnName")
df.select(expr("ColumnName"))
PySpark:
# Python
from pyspark.sql.functions import expr, col, column
# 4 ways to select a column
df.select(df.ColumnName)
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(expr("ColumnName"))
Created 06-23-2021 08:04 AM
I have been working with Oracle databases, in what way is DataFrames and DataSets similar to Oracle? Are they similar to views?
Created 06-24-2021 04:25 AM
Hi you can find following tutorial.
https://www.cloudera.com/tutorials/dataframe-and-dataset-examples-in-spark-repl.html
Created 06-24-2021 06:43 AM
Hi @RangaReddy
thanks a lot for sharing the link. It will help me a lot.
Can you please advise why we have to include df (data frame name) before each column?
df.select(df("name"), df("age") + 1).show()
I noticed in groupBy() there is no df.
Grateful if you can clarify this.
Thanks,
Roshan
Created 06-24-2021 08:21 AM
Hi @roshanbi
Apache Spark offers several methods to use when selecting a column.
Scala Spark:
// Scala
import org.apache.spark.sql.functions.{expr, col, column}
// 6 ways to select a column
df.select(df.col("ColumnName"))
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(`ColumnName)
df.select($"ColumnName")
df.select(expr("ColumnName"))
PySpark:
# Python
from pyspark.sql.functions import expr, col, column
# 4 ways to select a column
df.select(df.ColumnName)
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(expr("ColumnName"))
Created 06-24-2021 08:23 AM