Member since
02-25-2017
7
Posts
0
Kudos Received
0
Solutions
12-04-2019
08:59 PM
Hi ,
Please share the Download Link for Sandbox image of HDP 2.6.4 VMware?
Thanks & Regards,
Neha Jain
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
11-06-2019
09:52 PM
Thanks for your reply. It is really helpful.
... View more
11-05-2019
07:14 AM
Hi,
We are currently using HDP 2.6.4 image for our development environment and plan to use the same for client deliveries in production environment. This HDP image supports Java8 whereas the company standard says we need support for Java11 for any client deliveries. How can we tackle this problem if we plan to use HDP in our production environment?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
05-31-2017
10:45 AM
Input:
df.show() userColumn rankColumn U5 5 U6 1 U1 1 U1 2 U5 4 U5 2 U2 4 U3 1
df = df.orderBy(userColumn, rankColumn) df.show() Expected Output: userColumn rankColumn U1 1 U1 2 U2 4 U3 1 U5 2
U5 4 U5 5 U6 1 Actual Output(if spark puts all data in one partition): userColumn rankColumn U1 1 U1 2 U2 4 U3 1 U5 2 U5 4 U5 5 U6 1 Actual Output(if spark does not put all data in one partition): U1 2 U1 1 U3 1 U2 4 U6 1 U5 2 U5 4 U5 5 Please let me know if you need any other details.
... View more
05-23-2017
07:39 AM
We are facing certain challenges in sorting of data on dataframes in Spark 1.6 . We are using df. orderBy(userColumn, rankColumn). The sorting of data is proper when the dataframe data is in one partition. As soon as the partition size increases , the dataframe sorting is not working on clustered environment. We tried Distribute by and sort by approach as well as per the below post: http://saurzcode.in/2015/01/hive-sort-vs-order-vs-distribute-vs-cluster/. This is also not working. Please suggest.
... View more
Labels:
- Labels:
-
Apache Spark
02-25-2017
12:27 PM
I am facing some issues while running Spark(1.6) jobs in Yarn cluster mode with below configurations:
--master yarn --deploy-mode cluster --executor-cores 8 --num-executors 3 --executor-memory 25G --driver-memory 6g --conf spark.network.timeout=10000000 --conf spark.cores.max=35 --conf spark.memory.fraction=0.6 --conf spark.memory.storageFraction=0.5 --conf spark.shuffle.memoryFraction=1
Also, I am giving spark.sql.shuffle.partitions=30 in spark config.xml. I am running the job with above command on a three node cluster setup of hortonworks where each node has around 51GB of memory available. The input data records is approx 254 million.
The job crashes when inserting data to Hive with Executor Lost issue and Exit code as 143. There is very high shuffling of data during processing. Can you please suggest what can be done to resolve this issue? Also, how can we determine based on input size , the memory parameters to be used for running the job?
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN