Member since
02-02-2016
583
Posts
518
Kudos Received
98
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4191 | 09-16-2016 11:56 AM | |
| 1749 | 09-13-2016 08:47 PM | |
| 6943 | 09-06-2016 11:00 AM | |
| 4175 | 08-05-2016 11:51 AM | |
| 6247 | 08-03-2016 02:58 PM |
06-02-2016
10:09 AM
7 Kudos
@R Wys Can you please try this? just remove .snapshot directory. hdfs dfs -deleteSnapshot /path/ s201605-17-115857.294
... View more
06-01-2016
05:30 PM
1 Kudo
@omar harb There is no separate workers when spark runs on YARN, it will run normal as a normal app within YARN in the form of YARN containers. please refer below doc for spark on yarn architecture. https://spark-summit.org/2014/wp-content/uploads/2014/07/Spark-on-YARN-A-Deep-Dive-Sandy-Ryza.pdf http://spark.apache.org/docs/latest/running-on-yarn.html
... View more
06-01-2016
03:52 PM
3 Kudos
@omar harb I think HDP comes with Spark on Yarn not standalone unless you installed it manually therefore you won't find the spark master UI inside sandbox. Please start with below doc. http://hortonworks.com/hadoop-tutorial/a-lap-around-apache-spark/
... View more
06-01-2016
12:45 PM
Can you please share your Ambari & HDP version?
... View more
06-01-2016
10:07 AM
5 Kudos
Hi @Roberto Sancho Can you test the performance after disabling vectorization and running it on tez? set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
set hive.execution.engine=tez;
Also, please refer below doc for hive orc optimization. https://streever.atlassian.net/wiki/display/HADOOP/Optimizing+ORC+Files+for+Query+Performance
... View more
06-01-2016
09:15 AM
@Roberto Sancho Great.!! :), please accept the answer which helped you to close this thread.
... View more
05-31-2016
07:56 PM
3 Kudos
@Timothy Spann Did you tried with this syntax? var Rddtb= objHiveContext.sql("select * from sample")
val dfTable = Rddtb.toDF()
dfTable.write.format("orc").mode(SaveMode.Overwrite).saveAsTable("db1.test1")
... View more
05-31-2016
07:32 PM
2 Kudos
@hari kiran You have lots of transmitted packet errors which will definitely cause performance degrade. It might be occurring due to many issues like faulty NIC, faulty cable,RJ5 connector, duplex setting or some other network layer things. Please ask your OS team to rectify this issue and see if you will get improvement in hdfs writes. Meantime can you share the duplex setting? ethtool eth0
... View more
05-31-2016
04:54 PM
2 Kudos
@Sri Bandaru Below should solve this error after adding in below path. Ambari -> HDFS -> Config -> Core Site. hadoop.proxyuser.HTTP.groups = *
... View more
05-31-2016
04:48 PM
@Roberto Sancho Did you added any external jar recently inside cluster or specifically in hive?.
... View more