Support Questions
Find answers, ask questions, and share your expertise

data frames and data sets

Contributor

Hello Everyone,

 

can you please tell me the difference between DataFrames and DataSets (with examples)?

 

The explanations is still unclear

http://spark.apache.org/docs/2.4.0/sql-programming-guide.html

 

Thanks,

 

Roshan

2 ACCEPTED SOLUTIONS

Rising Star

Rising Star

Hi @roshanbi 

 

Apache Spark offers several methods to use when selecting a column.

 

Scala Spark:

// Scala
import org.apache.spark.sql.functions.{expr, col, column}
// 6 ways to select a column
df.select(df.col("ColumnName"))
df.select(col("ColumnName"))
df.select(column("ColumnName")) 
df.select(`ColumnName)
df.select($"ColumnName")
df.select(expr("ColumnName"))

 

PySpark:

# Python
from pyspark.sql.functions import expr, col, column
# 4 ways to select a column
df.select(df.ColumnName)
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(expr("ColumnName"))

 

View solution in original post

5 REPLIES 5

Contributor

I have been working with Oracle databases, in what way is DataFrames and DataSets similar to Oracle? Are they similar to views?

Rising Star

Contributor

Hi @RangaReddy 

 thanks a lot for sharing the link. It will help me a lot.

 

Can you please advise why we have to include df (data frame name) before each column?

df.select(df("name"), df("age") + 1).show()

 I noticed in groupBy() there is no df.

 

Grateful if you can clarify this.

 

Thanks,

 

Roshan

Rising Star

Hi @roshanbi 

 

Apache Spark offers several methods to use when selecting a column.

 

Scala Spark:

// Scala
import org.apache.spark.sql.functions.{expr, col, column}
// 6 ways to select a column
df.select(df.col("ColumnName"))
df.select(col("ColumnName"))
df.select(column("ColumnName")) 
df.select(`ColumnName)
df.select($"ColumnName")
df.select(expr("ColumnName"))

 

PySpark:

# Python
from pyspark.sql.functions import expr, col, column
# 4 ways to select a column
df.select(df.ColumnName)
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(expr("ColumnName"))

 

Rising Star

Hi @roshanbi 

 

If you are satisfied with my answer please Accept as Solution.

; ;