Member since
03-31-2017
57
Posts
1
Kudos Received
0
Solutions
07-05-2018
09:51 AM
Hi, @Felix Albani Thanks.
... View more
06-28-2018
10:20 AM
Hi,
I want to fetch stock exchange data from Alpha vantage API using Spark Streaming.
I used below API which return data in JSON format : https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=TCS&interval=1min&apikey=apikey
How to fetch continuous streaming of stock exchange data using Spark Streaming Java API.
... View more
Labels:
- Labels:
-
Apache Spark
06-14-2018
07:44 AM
Hi, @Felix Albani I set driver memory to 20 GB.I tried using below spark-submit parameters : ./bin/spark-submit --driver-memory 20g --executor-cores 3 --num-executors 20 --executor-memory 2g --conf spark.yarn.executor.memoryOverhead=1024 --conf spark.yarn.driver.memoryOverhead=1024 --class org.apache.TransformationOper --master yarn-cluster /home/hdfs/priyal/spark/TransformationOper.jar Cluster configuration is : 1 Master node(r3.xlarge) and 1 worker node(r3.xlarge) : 4 vCPUs, 30GB memory,40 GB storage Still getting the same issue spark job is in running state and YARN memory is 95% used.
... View more
06-13-2018
01:39 PM
Hi, @Vinicius Higa Murakami , @Felix Albani I have set spark.yarn.driver.memoryOverhead=1 GB,spark.yarn.executor.memoryOverhead=1 GB and spark_driver_memory=12 GB. I have set storage level to MEMORY_AND_DISK_SER(). Hadoop Cluster configuration is : 1 Master node(r3.xlarge) and 1 worker node (m4.xlarge).
Here is the spark-submit parameter :
./bin/spark-submit --driver-memory 12g --executor-cores 2 --num-executors 3 --executor-memory 3g --class org.apache.TransformationOper --master yarn-cluster /spark/TransformationOper.jar Spark job entered into running state but it has been executing for last one hour still execution not completed.
... View more
06-11-2018
07:07 AM
Hi, @Vinicius Higa Murakami I want to process 4 GB file so I have configured executor memory to 10 gb and number of executors to 10 in spark-env.sh file.Here is the spark-submit parameters : ./bin/spark-submit --class org.apache.TransformationOper --master local[2] /root/spark/TransformationOper.jar /Input/error.log I tried to set configuration manually using below spark-submit parameters : ./bin/spark-submit --driver-memory 5g --num-executors 10 --executor-memory 10g --class org.apache.TransformationOper --master local[2] /root/spark/TransformationOper.jar And set master as a yarn-cluster still got the OutOfMemoryError error.
... View more
06-08-2018
11:48 AM
@Jay Kumar SenSharma Thanks
... View more
06-08-2018
11:17 AM
Hi, I have created HDP 2.6 on AWS with 1 master node and 4 worker nodes.I am using cluster management tool Ambari.
I have configured spark-env.sh file on master node now i want to apply all those setting to all worker nodes on cluster. How to refresh the cluster configuration for reflecting the latest configs to all nodes in the cluster.
... View more
Labels:
- Labels:
-
Apache Hadoop
06-08-2018
11:11 AM
Hi, I have Created HDP 2.6 on AWS with master node(m4.2xlarge) and 4 worker nodes(m4.xlarge).
I want to process 4GB log file using Spark job but i am getting below error while executing Spark Job : Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure:
Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236) I have configured spark-env.sh file for master node : SPARK_EXECUTOR_MEMORY="5G"
SPARK_DRIVER_MEMORY="5G" but it throws the same error.
I also configured worker nodes with those settings and increase Java heap size for hadoop client,Resource Manager,Node Manager and for YARN still spark job aborted. Thanks,
... View more
Labels:
- Labels:
-
Apache Spark
03-23-2018
09:50 AM
@Rahul Soni, Hi, I edited the comment.Please check it.
... View more
03-23-2018
05:41 AM
@Rahul Soni, Thanks, Actually its a type mistake.I edited my question and i found that i forgot to close ' ')) '. I want to fetch following values , [/aLog/transaction],POST,[application/vnd.app.v1+json || application/json]
I tried below script, extract = FOREACH matched GENERATE FLATTEN(REGEX_EXTRACT_ALL(logmessage,'^(\\S+)\\s+"(\\{(\\S+),.*=(.*),.*=(.*)\\})"+\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+)\\s+(\\S+).*
(t1:chararray,t2:chararray,t3:chararray,t4:chararray,url:chararray,type:chararray,produces:chararray,t5:chararray,t6:chararray,classes:chararray,throw:chararray,exception:chararray); Output : (Mapped,{[/auditConfirmation/businessDates],methods=[GET],produces=[application/vnd.app.v1+json || application/json]},[/auditConfirmation/businessDates],[GET],[application/vnd.app.v1+json || application/json],on
to,public,java.lang.String,com.fhlb.controllers.rest.auditconfirmation.AuditConfirmationRestService.getCloseOFBusinessDates(java.lang.String),throws,com.fhlb.commons.CustomException)
I fetched the output which i want But i am getting one extra schema.Could you help me with regex which extract only expected output.I want to remove "{[/auditConfirmation/businessDates],methods=[GET],produces=[application/vnd.app.v1+json || application/json]}" from the output. I got the Expected output using below script : output = FOREACH extract GENERATE $4 as url,$5 as requesttype,$6 as produces;
... View more