Member since
10-14-2017
5
Posts
0
Kudos Received
0
Solutions
03-04-2019
01:00 PM
This is a fine answer that lists other aspects of considering Tez over MR for Hive http://community.hortonworks.com/answers/83488/view.html
... View more
05-23-2018
06:59 PM
I couldn't come up with anything better than manually scanning the DataFrame to check if all values in a column are NULL. Something like: // Returns the names of all empty columns of DataFrame
def getEmptyColNames(df: DataFrame): Seq[String] = {
df.cache()
val colNames: Seq[String] = df.columns
colNames.filter { (colName: String) =>
df.filter(df(colName).isNotNull).count() == 0
}
}
// Drops all empty columns of DataFrame
def dropEmptyCols(df: DataFrame): DataFrame = {
val emptyColNames: Seq[String] = getEmptyColNames(df)
if (emptyColNames.isEmpty) df
else df.drop(emptyColNames: _*)
}
val dfOriginal: DataFrame
val dfNonEmptyCols: DataFrame = dropEmptyCols(dfOriginal) @Junfeng Chen Were you able to find a more efficient / smarter way?
... View more
10-23-2017
04:18 AM
adding these configuration settings as mentioned here (https://issues.apache.org/jira/browse/HIVE-2006😞 1. hive.server.read.socket.timeout=1000 2. hive.server.read.socket.timeout=1000 in the file hive-site.xml also didn't work
... View more
10-23-2017
01:47 AM
I cannot use the Cloudera interface as I'm running Hive server inside docker container and trying to connect to the Hive server from outside the container through python (PyHive v0.5, python v2.7.13)
... View more