Member since
11-30-2016
33
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
27001 | 02-28-2017 04:58 PM |
11-02-2017
06:54 AM
Thanks for the quick response , One such issue I can provide is https://issues.apache.org/jira/browse/SPARK-20922 which affected the Spark 2.1.1 . and few I got from the link https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=spark , where it clearly says to use the Spark version 2.2.0 in many places . Thanks,
... View more
11-02-2017
05:29 AM
Dear All, We are planning to Upgrade our HDP from 2.4.2 to 2.6.2 because we need to use the Spark 2.2.0 in our project. We are asked to use the Spark version 2.2.0 from our security team as it is latest vulnerable free version. So the question here is, Is Spark 2.1.1 which comes with HDP package 2.6.2 is vulnerable free ? because as we read and understand the latest vulnerable free version of Apache Spark available is 2.2.0 . Thanks in Advance , Param.
... View more
Labels:
05-09-2017
07:34 AM
@yvora Thanks for the response. 1. Can you try setting spark.yarn.stagingDir to hdfs:///user/tmp/ ? This is not working . 2. Can you please share which spark config are you trying to set which require RM address? I am trying to run the Spark application through java program , so when the master is yarn , by default it connects to resource manager @ 0.0.0.0:8032 in order to override this property , I need to set the same in spark configuration i.e sparkConf.set("spark.hadoop.yarn.resourcemanager.hostname",resourcemanagerHostname); sparkConf.set("spark.hadoop.yarn.resourcemanager.address",resourcemanagerAddress); But the problem is when the I have HA enabled the resource manager How to I connect to it . And some idea I got about my question is : There is way to achieve this in the Spark context as below . JavaSparkContext jsc = new JavaSparkContext(sparkConf); jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "core-site.xml"));
jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "hdfs-site.xml"));
jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "mapred-site.xml"));
jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "yarn-site.xml")); But it needs the resource manager and staging directory configuration even before creating the context ,so there is problem . And what I am looking for is something like above for the SparkConguration class/object . Thanks , Param.
... View more
05-08-2017
10:24 AM
Hi All, When I am trying to run the spark application in YARN mode using the HDFS file system it works fine when I provide the below properties . sparkConf.set("spark.hadoop.yarn.resourcemanager.hostname",resourcemanagerHostname);
sparkConf.set("spark.hadoop.yarn.resourcemanager.address",resourcemanagerAddress);
sparkConf.set("spark.yarn.stagingDir",stagingDirectory ); But the problem here , 1. Since my HDFS is NamdeNode HA enabled it won't work when I provide spark.yarn.stagingDir has the commons URL of hdfs example hdfs://hdcluster/user/tmp/ it gives error has unknown host hdcluster , But it works fine when I give the URL as hdfs://<ActiveNameNode>/user/tmp/ , But we don't in advance which will be active so how to resolve this . And few things I have noticed are SparkContext takes the Hadoop configuration but SparkConfiguration class won't have any methods to accepts Hadoop configuration. 2. How To provide the resource Manager address when Resource Manager are running in HA . Thanks in Advance , Param.
... View more
Labels:
- Labels:
-
Apache Spark
05-02-2017
05:01 AM
Thank you for the response , I agree with you completely , But My question even the partial result it not getting stored correctly . Thanks, Param.
... View more
04-27-2017
02:07 PM
Dear All, I am using the spark to train model and save it on file system as using org.apache.spark.ml.PipelineModel.save() . This works fine when I am running the spark in local mode but when I am running the spark in standalone mode where 2 workers are there , it is training the model correctly but when saving on the file it is storing the partial result .And the file system I am using is Local one i.e file:// . Could someone please point out what is the issue here. And is it an issue to store the trained PipelineModel on local file system. FYI .. Same code works when I use HDFS file system. Spark version I am using is : 2.0.0 Thanks in Advance , Param.
... View more
Labels:
- Labels:
-
Apache Spark
04-05-2017
06:09 AM
@yvora Actually I am submitting my spark application through program , I have done the same thing programatically by adding the sparkConfig.setJars(new String[]{"myjar.jar"}); still it is not working .
... View more
04-04-2017
05:36 PM
Hi All, When I am running the spark in local mode my code is working fine and when I am running it in standalone mode and deploy mode as client , I am getting the class not found exception "Caused by: java.lang.ClassNotFoundException:" for my custom class .Actually I am distributing the jar which has the class to worker by setting the property in spark configuration sparkConfig.setJars(new String[]{"myjar.jar"}); and I am seeing this jar getting distributed on worker node as well ,but I am getting the class not found exception.Why the executor is not able to fine the class in the jar on the worker node. Could someone please let me what mistake I am making here . Thanks in advance , Param.
... View more
Labels:
- Labels:
-
Apache Spark
03-31-2017
05:06 AM
@amankumbare Thanks for responding, Sumit and me work in the same team . It is happening for all topics and jdk we using is 7 . The main issue here is the content of the znode of the broker has no host and port information updated . ["PLAINTEXTSASL://xxxx.domain.com:9092"],"host":null,"version":2,"port":-1} -Is it the expected behavior ? -And it happens when listener value is PLAINTEXTSASL://xxxx.domain.com:9092 and everything is fine when it is just PLAINTEXT . Thanks in advance , Param.
... View more
03-17-2017
02:42 PM
Hi All , How to close the the question in hortenworks community by accepting the answer . Thanks , Param.
... View more
03-15-2017
03:24 PM
@Sandeep Nemuri Thanks its worked ..I tried this already but forgot to create the file on each node ,now its fine. And I just got one more question here : If I run in spark app YARN mode I can set the memory parameter through sparkconfiguration using spark.yarn.driver.memoryOverhead properties , Is something similar available for the standalone and local mode ? Thanks in advance , Param.
... View more
03-15-2017
11:31 AM
Hi All, When executing the spark application on YARN cluster can I access the local file system (Underlying OS FS). Though YARN is pointing to HDFS . Thanks , Param.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
03-15-2017
11:25 AM
Sorry for the delayed reply ...I got busy in some work. @Artem Ervits thanks a lot for all the responses . I was able to achieve this by setting the spark configuration as below ;- sparkConfig.set("spark.hadoop.yarn.resourcemanager.hostname","XXXXX");
sparkConfig.set("spark.hadoop.yarn.resourcemanager.address","XXXXX:8032");
sparkConfig.set("spark.yarn.access.namenodes", "hdfs://XXXXX:8020,hdfs://XXXX:8020"); sparkConfig.set("spark.yarn.stagingDir", "hdfs://XXXXX:8020/user/hduser/"); sparkConfig.set("--deploy-mode", deployMode); Thanks , Param.
... View more
03-08-2017
05:11 PM
Hi All, I am trying to describe the Kafka Consumer group by the below command . bin/kafka-consumer-groups.sh --new-consumer --bootstrap-server broker1:9092,broker2:9092 --describe --group console-consumer-44261 --command-config conf/security_user_created But I am getting an error as Consumer group `console-consumer-44261` does not exist or is rebalancing. I have checked the zookeeper , the consumer group exist and I am always getting this error . Could some point out me here , what mistake I am making here . Kakfa version I am using is 0.9 . Thanks in advance , Param.
... View more
Labels:
- Labels:
-
Apache Kafka
02-28-2017
05:08 PM
Hi All , When checking the offset for the kerberized kafka by the command . bin/kafka-consumer-offset-checker.sh --zookeeper XXXX:2181 --group test-consumer --security-protocol SASL_PLAINTEXT Getting the null output i.e Group Topic Pid Offset logSize Lag Owner Exiting due to: null. Could someone tell me is there any issue with command I am using . Kafka version I am using is 0.9.0.2.4. Thanks in advance , Param.
... View more
Labels:
- Labels:
-
Apache Kafka
02-28-2017
04:58 PM
@Artem Ervits , Thanks a lot for your time and help given. However I am able to achieve my objective by setting the properties of hadoop and yarn in spark configuration . sparkConfig.set("spark.hadoop.yarn.resourcemanager.hostname","XXX");
sparkConfig.set("spark.hadoop.yarn.resourcemanager.address","XXX:8032");
sparkConfig.set("spark.yarn.access.namenodes", "hdfs://XXXX:8020,hdfs://XXXX:8020");
sparkConfig.set("spark.yarn.stagingDir", "hdfs://XXXX:8020/user/hduser/");
Regards, Param.
... View more
02-27-2017
05:54 PM
@Artem Ervits , Thanks again ! And Sorry If I am asking too many questions here . What actually I am looking for is ..I should not use the spark-submit script as per the project requirement , So the cluster configuration I am passing through the spark config as given below . SparkConf sparkConfig = new SparkConf().setAppName("Example App of Spark on Yarn");
sparkConfig.set("spark.hadoop.yarn.resourcemanager.hostname","XXXX");
sparkConfig.set("spark.hadoop.yarn.resourcemanager.address","XXXXX:8032"); And it is able to identify the Resource Manager but it failing because it is not identifying the file system . Though I am setting the hdfs file system configuration as well. sparkConfig.set("fs.defaultFS", "hdfs://xxxhacluster");
sparkConfig.set("ha.zookeeper.quorum", "xxx:2181,xxxx:2181,xxxx:2181"); And it assuming it as the local file system. And error I am getting in the Resource Manager is exited with exitCode: -1000 due to: File file:/tmp/spark-0e6626c2-d344-4cae-897f-934e3eb01d8f/__spark_libs__1448521825653017037.zip does not exist Thanks and Regards, Param.
... View more
02-27-2017
06:44 AM
@Artem Ervits Thank you very much for the response . I am able to submit the job to YARN through the spark-submit command ,but what actually I am looking here is for doing the same thing trough the program . It would be great if you would give the template for the same, java preferably . -Param.
... View more
02-27-2017
06:38 AM
@Sriharsha Chintalapani Could you please answer this question .
... View more
02-27-2017
06:36 AM
Thanks for the response I very much agree to you answer .
... View more
02-26-2017
12:01 PM
1 Kudo
Hi All, I am new to spark , I am trying to submit the spark application from the Java program and I am able to submit the one for spark standalone cluster .Actually what I want to achieve is submitting the job to the Yarn cluster and I am able to connect to the yarn cluster by explicitly adding the Resource Manager property in the spark config as below . sparkConfig.set("spark.hadoop.yarn.resourcemanager.address","XXXX:8032"); But application is failing due to exited with exitCode: -1000 due to: File file:/tmp/spark-0e6626c2-d344-4cae-897f-934e3eb01d8f/__spark_libs__1448521825653017037.zip does not exist This I got it from the Resource manger log , what I found is that it is assuming the file system as local and not uploading the required libraries . Source and destination file systems are the same. Not copying file:/tmp/spark-1ed67f05-d496-4000-86c1-07fcf8526181/__spark_libs__1740543841989079602.zip This I got it from the Spark application where I am running my program . Issue I am suspecting here is it is assuming the file system as local not hdfs , Correct me If I am wrong . Question here is : 1.What is the actually issue for the job to fail , given the required data or log info above ? 2.Could you please tell me how to add the resource files to spark configuration like addResource in Hadoop configuration. Thanks in Advance , Param.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
-
Apache YARN
02-26-2017
08:43 AM
1 Kudo
Hi All , When I am trying to get the lag from the kerboraised kafka I am getting the exiting due to null error . Command I am using : bin/kafka-consumer-offset-checker.sh --zookeeper XXXXXXX:2181 --group XXXX-consumer --security-protocol PLAINTEXTSASL OutPut: Could not fetch offset for [metrics,0] due to kafka.common.NotCoordinatorForConsumerException.
Group Topic Pid Offset logSize Lag Owner
Exiting due to: null. Could someone please point out where am I making mistake here ? Thanks in Advance , Param.
... View more
Labels:
- Labels:
-
Apache Kafka
02-26-2017
08:36 AM
1 Kudo
Hi All , During Kerboraizing the kafka using the Ambari , it is setting the kafka security protocol to PLAINTEXTSASL instead of SASL_PLAINTEXT, but everywhere in the document is it mentioned that it must be SASL_PLAINTEXT , I have few questions regarding this . 1. Why Ambari setting the security protocol to PLAINTEXTSASL , is it a bug ? 2. Even though we are able to produce and consume the messages from program written in java. But in the producer we are setting the security protocol to PLAINTEXTSASL, and in the consumer SASL_PLAINTEXT , it is working fine , Question is how come it is working fine when actual protocol is just PLAINTEXTSASL. Thanks in Advance , Param.
... View more
Labels:
- Labels:
-
Apache Kafka
12-14-2016
05:54 PM
1 Kudo
Hi Everyone, We are using the end point co-processor to fetch the records from my HBase cluster. We are having the 3 nodes cluster and total number of regions are 180 . Call to the end point co-processor is taking the more time than the usual , after all the analysis the property I am doubting is hbase.regionserver.handler.count which is 30 by default. And my client code is making the Batch call to the coprocessor , And there are 10 such Batch call which are may be simultaneous and each batch call creates the 180 separate threads, so total number threads at the client end will be 1800 sometimes . I have changed the hbase.regionserver.handler.count from 30 to 100 still not seeing any performance improvement much . Now I am coming to the question : What is the feasible value for the property hbase.regionserver.handler.count ? How to know whether that property is impacting the performance or not ? In order to increase the property what are the other values I should be modifying for the proper functioning . Thanks in Advance . Param.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase