Member since
12-26-2018
4
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5118 | 02-21-2021 06:24 AM |
02-21-2021
06:24 AM
1 Kudo
Answered my own question. Two of the data nodes could not communicate (ssh) with each other due to a network config issue.
... View more
12-31-2020
02:02 AM
Hi, we are running a HDP 3.1 production cluster with 2 Master Nodes and 54 Data Nodes;Spark version is 2.3 and yarn is the cluster manager. Each data node has about 250GB RAM and 56 Cores. We are using a combination of Nifi and Spark to set up a Hive DWH as follows: Nifi picks input csv files from source and loads them to HDFS in parquet format Spark jobs picks these files and loads them to hive managed tables Another set of Spark jobs aggregates the data in the hive managed tables to hourly and daily hive managed tables. I noticed that my Spark jobs were very slow at times when writing to hive. I ran a test where i ran the same spark-submit job several times and it took between 10-13 minutes (fast) to 35-40 minutes (slow) to run the same job with the exact same parameters. My spark submit job: sudo /usr/hdp/3.1.0.0-78/spark2/bin/spark-submit --files /etc/hive/3.1.0.0-78/0/hive-site.xml,/etc/hadoop/3.1.0.0-78/0/core-site.xml,/etc/hadoop/3.1.0.0-78/0/hdfs-site.xml --driver-class-path /usr/hdp/3.-78/spark2/jars/postgresql-42.2.5.jar,/usr/hdp/3.1.0.0-78/spark2/jars/config-1.3.4.jar --jars /usr/hdp/3.1.0.0-78/spark2/jars/postgresql-42.2.5.jar,/usr/hdp/3.1.0.0-78/spark2/jars/config-1.3.4.jar --class my.domain.net.spark_job_name --master yarn --deploy-mode cluster --driver-memory 50G --driver-cores 40 --executor-memory 50G --num-executors 50 --executor-cores 10 --name spark_job_name --queue my_queue /home/nkimani/spark_job_name-1.0-SNAPSHOT.jar 2020-10-06 1 16 16 Yarn and Spark Logs indicated that 2 data nodes (data node 05 and 06) were consistently throwing the following error: {"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"FetchFailed","Block Manager Address":{"Executor ID":"32","Host":"svdt8c2r14-hdpdata06.my.domain.net","Port":38650},"Shuffle ID":0,"Map ID":546,"Reduce ID":132,"Message":"org.apache.spark.shuffle.FetchFailedException: Connection from svdt8c2r14-hdpdata06.my.domain.net/10.197.26.16:38650 closed\n\tat org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:528)\n\tat
..
..
{"Event":"SparkListenerTaskEnd","Stage ID":6,"Stage Attempt ID":0,"Task Type":"ShuffleMapTask","Task End Reason":{"Reason":"FetchFailed","Block Manager Address":{"Executor ID":"27","Host":"svdt8c2r14-hdpdata05.my.domain.net","Port":45584},"Shuffle ID":0,"Map ID":213,"Reduce ID":77,"Message":"org.apache.spark.shuffle.FetchFailedException: Connection from svdt8c2r14-hdpdata05.my.domain.net/10.197.26.15:45584 closed\n\tat org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:528)\n\tat I have tried the following: Run a whole day ping test to the 'faulty' data nodes -> The packet drop was 0% ruling out network connectivity issues i think Checked CPU and Memory usage for the 2 nodes and it was below the available capacity Ensured that the two nodes are time synched to our freeipa server I have run out of options and any help would be appreciated. What puzzles me is why these specific nodes. I would also like to add that the Hbase service (on Ambari) has also been reporting connection errors to one of these nodes Thanks, Kim
... View more
Labels:
- Labels:
-
Apache Spark
12-29-2018
11:39 AM
I found the solution today. I modified my PutDatabaseRecord processor. I explicitly put the Oracle Schema table instead of putting it part of the Table Name. I am now using only 2 processors, ExecuteSQL and PutDatabaseRecord. No need for the AvroSplit processor.
... View more
12-26-2018
11:28 AM
@mburgess I encountered a similar problem while trying to ingest data from one Oracle Table to another. It does not work if I use SplitAvro either. NiFi Version: 1.7.1 Source Table: CREATE TABLE IC_STAGE.TEMP_NIFI_1
(
ID_DATE VARCHAR2(500 BYTE),
REC NUMBER
) Destination Table: CREATE TABLE IC_STAGE.TEMP_NIFI
(
ID_DATE VARCHAR2(500 BYTE),
REC NUMBER
) ExecuteSQL: SELECT * FROM IC_STAGE.TEMP_NIFI_1 PutDatabaseSQL Properties: Record Reader: Avro Reader Schema Access Strategy: Use Embedded Avro Schema My schema, when i view the details of the queue looks like this: {"type":"record","name":"NiFi_ExecuteSQL_Record","namespace":"any.data","fields":[{"name":"ID_DATE","type":["null","string"]},{"name":"REC","type":["null",{"type":"bytes","logicalType":"decimal","precision":10,"scale":0}]}]} I got the same error even when I used "Schema Text" as my access strategy.
... View more