Support Questions

Find answers, ask questions, and share your expertise

data frames and data sets

avatar
Contributor

Hello Everyone,

 

can you please tell me the difference between DataFrames and DataSets (with examples)?

 

The explanations is still unclear

http://spark.apache.org/docs/2.4.0/sql-programming-guide.html

 

Thanks,

 

Roshan

2 ACCEPTED SOLUTIONS

avatar
Master Collaborator

avatar
Master Collaborator

Hi @roshanbi 

 

Apache Spark offers several methods to use when selecting a column.

 

Scala Spark:

// Scala
import org.apache.spark.sql.functions.{expr, col, column}
// 6 ways to select a column
df.select(df.col("ColumnName"))
df.select(col("ColumnName"))
df.select(column("ColumnName")) 
df.select(`ColumnName)
df.select($"ColumnName")
df.select(expr("ColumnName"))

 

PySpark:

# Python
from pyspark.sql.functions import expr, col, column
# 4 ways to select a column
df.select(df.ColumnName)
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(expr("ColumnName"))

 

View solution in original post

5 REPLIES 5

avatar
Contributor

I have been working with Oracle databases, in what way is DataFrames and DataSets similar to Oracle? Are they similar to views?

avatar
Master Collaborator

avatar
Contributor

Hi @RangaReddy 

 thanks a lot for sharing the link. It will help me a lot.

 

Can you please advise why we have to include df (data frame name) before each column?

df.select(df("name"), df("age") + 1).show()

 I noticed in groupBy() there is no df.

 

Grateful if you can clarify this.

 

Thanks,

 

Roshan

avatar
Master Collaborator

Hi @roshanbi 

 

Apache Spark offers several methods to use when selecting a column.

 

Scala Spark:

// Scala
import org.apache.spark.sql.functions.{expr, col, column}
// 6 ways to select a column
df.select(df.col("ColumnName"))
df.select(col("ColumnName"))
df.select(column("ColumnName")) 
df.select(`ColumnName)
df.select($"ColumnName")
df.select(expr("ColumnName"))

 

PySpark:

# Python
from pyspark.sql.functions import expr, col, column
# 4 ways to select a column
df.select(df.ColumnName)
df.select(col("ColumnName"))
df.select(column("ColumnName"))
df.select(expr("ColumnName"))

 

avatar
Master Collaborator

Hi @roshanbi 

 

If you are satisfied with my answer please Accept as Solution.