Member since
12-21-2015
32
Posts
14
Kudos Received
0
Solutions
02-22-2016
07:36 PM
1 Kudo
@Joe Widen Thanks Joe. Let me know if you could pl. comment on the reverse behaviour.
... View more
02-22-2016
06:35 PM
@Vinod Bonthu Thanks Vinod I understand this. However I tried with other dataset, here I am getting the results as per I wanted. According to above explanation, result should be 55,55!! But for the example in the question results are not as per the expectation. Could you pl. explain why? val nums = Array(99,99)
val rdd = sc.parallelize(nums)
rdd.collect // output is : 99,99
//Now doing similar changes as in the main question.
nums(0) = 55
nums(1) = 55
rdd.collect
// output is 99,99
... View more
02-22-2016
04:57 PM
2 Kudos
Hi folks, All the way, I have been reading that RDD are immutable but to my surprise today I found different result. I would like to know the reason and supporting documentation if possible. scala> val m = Array.fill(2, 2)(5)
m: Array[Array[Int]] = Array(Array(5, 5), Array(5, 5))
scala> val rdd = sc.parallelize(m)
scala> rdd.collect()
res6: Array[Array[Int]] = Array(Array(5, 5), Array(5, 5))
// Interesting here.
scala> m(0)(1) = 99
scala> rdd.collect()
res8: Array[Array[Int]] = Array(Array(5, 99), Array(5, 5))
Thanks
... View more
Labels:
02-19-2016
03:32 PM
2 Kudos
I am trying to join two data sets. CustomerIDSalesRecordRDD of type (Id, salesRecord) another (Id,Name). First data-set is partitioned by HashPartitioner and Second is partitioned by Custom Partitioner. When I join these RDDs by id and try to see which partition-er information is retained I randomly see that some times joinRDD displays custom partitioner and sometimes HashPartitioner. I received different partioner results while changing the number of partitions also. According to the Learning Spark book, rdd1.join(rdd2) retains the partition info from the rdd1. Here is the code. val hashPartitionedRDD = cusotmerIDSalesRecord.partitionBy(new HashPartitioner(10))
println("hashPartitionedRDD's partitioner " + hashPartitionedRDD.partitioner) // Seeing Instance of HashParitioner
val customPartitionedRDD = customerIdNamePair1.partitionBy(new CustomerPartitioner)
println("customPartitionedRDD partitioner " + customPartitionedRDD.partitioner) // Seeing instance of CustomPartitioner
// Ok till this point.
val expectedHash = hashPartitionedRDD.join(customPartitionedRDD)
val expectedCustom = customPartitionedRDD.join(hashPartitionedRDD)
// Following both are showing random behavior.
println("Expected Hash " + expectedHash.partitioner) // Seeing instance of Custom Partitioner
println("Expected Custom " + expectedCustom.partitioner) //Seeing instance of Custom Partitioner
// Just to add more to it when number of partitions of both the data sets I made equal I am seeing the reverse results. i.e.
// expectedHash shows CustomPartitioner and
// expectedCustom shows Hashpartitioner Instance.
... View more
Labels:
02-19-2016
03:28 PM
1 Kudo
Given the fact that Spark is in memory processing and Hadoop is more to do with Disk based processing (higher disk I/O), I was wondering for sizing the containers and RAM needs, do we need more RAM for running same use case with Spark when compared to Hadoop Map-Reduce?
... View more
Labels:
02-12-2016
11:31 AM
1 Kudo
Hi folks, One of the requirements is to redirect the all kind of logs/ ranger logs/access logs and also other component's logs to external file system. Probably on NFS and not on HDFS or in the DB. Does HDP provide out of the box solution for this? One workaround I can think of using Flume but wanted to know other approaches? Regards, DP
... View more
02-11-2016
12:03 PM
1 Kudo
Thanks @Artem Ervits
... View more
02-11-2016
12:03 PM
1 Kudo
Thanks @Neeraj Sabharwal
... View more
02-11-2016
11:30 AM
2 Kudos
Hi folks, I am trying to run the spark PI example on the Hortonworks cluster. I could run this successfully in the local mode and in the yarn client mode . When I try to run it using YARN cluster mode I do not see any out put or error. I am not sure if it is running or is there a bug. Can you please help me understand this behavior? Here are the commands I am trying to use. spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster ../lib/spark-examples*.jar 10 Also tried with spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster ../lib/spark-examples*.jar 10 (One more question are above both correct ?) Both the times I see the following console log but not the PI value. ( When I run in local mode or yarn-client mode I see the value of PI printed on the console) 16/02/11 11:06:05 WARN Client: spark.yarn.am.extraJavaOptions will not take effect in cluster mode
16/02/11 11:06:05 INFO SecurityManager: Changing view acls to: username
16/02/11 11:06:05 INFO SecurityManager: Changing modify acls to: username
16/02/11 11:06:05 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(username); users with modify permissions: Set(username)
16/02/11 11:06:06 INFO Client: Submitting application 432 to ResourceManager
16/02/11 11:06:06 INFO YarnClientImpl: Submitted application application_1454617624671_0432
16/02/11 11:06:07 INFO Client: Application report for application_1454617624671_0432 (state: ACCEPTED)
16/02/11 11:06:07 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1455188766121
final status: UNDEFINED
tracking URL: http://SomeIP:8088/proxy/application_1454617624671_0432/
user: username
..............
16/02/11 11:06:23 INFO Client: Application report for application_1454617624671_0432 (state: FINISHED)
16/02/11 11:06:23 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: APP_Master_host_IP
ApplicationMaster RPC port: 0
queue: default
start time: 1455188766121
final status: SUCCEEDED
tracking URL: http://SomeIP:8088/proxy/application_1454617624671_0432/
user: username
16/02/11 11:06:23 INFO ShutdownHookManager: Shutdown hook called
16/02/11 11:06:23 INFO ShutdownHookManager: Deleting directory /tmp/spark-54dc94ab-cf66-4d17-9940-1c31ba7e9850
[username@remoteIP bin]$
... View more
Labels:
12-23-2015
04:59 PM
@Neeraj Sabharwal I tried configuring Hive/PIG views as per the documentation. If you confirm that in the Keberized cluster and NN Highly available PIG/HIVe views not supported then I will close the thread 🙂 Thank you very much.
... View more
12-23-2015
03:39 PM
@Predrag Minovic The hive.server2.transport.mode is set to http. File explorer is working. We are on Ambari version: 2.1.2 Thank you. Is there any thing possibly missing?
... View more
12-23-2015
02:51 PM
Thanks
@Predrag Minovic
Indeed this is quite detailed. I've a user ambariserver and principal ambariserver/ambari_host_name@KDCRealm.com
I also verified following two properties are added in the custom core site.
hadoop.proxyuser.ambariserver.groups=*
hadoop.proxyuser.ambariserver.hosts=*
PIG/Hive view, I've added following two properties in the webhcat-site.xml
webhcat.proxyuser.ambariserver.groups=*
webhcat.proxyuser.ambariserver.hosts=*
Accessing the Hive View we see error.
H020 Could not establish connecton to HiveServer2_HOST:10000:org.apache.thrift.transport.TTransportException
... View more
12-23-2015
12:02 PM
Well but network is stable here. And no jobs are running on the cluster!
... View more
12-23-2015
10:54 AM
Hi, After restarting the cluster randomly we see couple of red alerts in Ambari. Earlier also I remember seeing them sometime back and now again I see them again. Can you suggest what could be going wrong? I checked the ports 10001 and 9083 are open/in use. HiveHive Metastore Process ==================================== Connection failed on host HIVE_HOST:10001 (Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/lib/hive/bin/:/usr/sbin/'"'"' ; ! beeline -u '"'"'jdbc:hive2://HIVE_HOST:10001/;transportMode=http;httpPath=cliservice;principal=hive/_HOST@REALM.COM'"'"' -e '"'"''"'"' 2>&1| awk '"'"'{print}'"'"'|grep -i -e '"'"'Connection refused'"'"' -e '"'"'Invalid URL'"'"''' was killed due timeout after 30 seconds) HiveHiveServer2 Process
============================= Metastore on HIVE_HOST failed (Execution of 'ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 'export PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/bin/:/usr/bin/:/usr/sbin/:/usr/hdp/current/hive-metastore/bin'"'"' ; export HIVE_CONF_DIR='"'"'/usr/hdp/current/hive-metastore/conf/conf.server'"'"' ; hive --hiveconf hive.metastore.uris=thrift://HIVE_HOST:9083 --hiveconf hive.metastore.client.connect.retry.delay=1 --hiveconf hive.metastore.failure.retries=1 --hiveconf hive.metastore.connect.retries=1 --hiveconf hive.metastore.client.socket.timeout=14 --hiveconf hive.execution.engine=mr -e '"'"'show databases;'"'"''' was killed due timeout after 30 seconds)
... View more
Labels:
12-22-2015
10:22 PM
@Neeraj Sabharwal, @Eric Walk Guys, Some comments advocate that in HA , Ambari views have issues. Are there limitations of PIG & HIVE Ambari Views that they cannot work with HDP cluster in High Availability ? Could you please confirm?
... View more
12-22-2015
05:52 PM
@Eric Walk For Hive as per your suggestion : I stopped Ambari, did kdestroy, did kinit with the ambariserver keytab and then tried accessing the Hive page. But I still see the same error. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; Host Details : local host is: "gateway/192.168.1.8"; destination host is: "NameNode1_Host":8020;
H020 Could not establish connecton to gateway_Host:10000: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused:
... View more
12-22-2015
10:30 AM
@Hemant Kumar @Predrag Minovic I think this is not true for Non Kererberized cluster. I remember configuring Pig view for HA-ed cluster on HDP 2.3, and it was working fine. Though after Kerberization I did not check the Pig views. Yesterday when I checked all are breaking.
... View more
12-22-2015
12:09 AM
@Eric Walk, @Neeraj Sabharwal I could access the File view but still facing the issues with Pig and Hive. Followed the steps of the documentation for Pig/Hive also. While I am trying to create a new script on Pig. I get the following error. java.net.UnknownHostException: hahdfs
java.net.UnknownHostException: hahdfs
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
For Hive: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]; DP
... View more
12-21-2015
07:22 PM
Thanks will check and update in a few hours. 🙂
... View more
12-21-2015
07:22 PM
Thanks Mark.
... View more
12-21-2015
06:59 PM
1 Kudo
@Neeraj Sabharwal I started all the datanodes, and then restarted the HDFS master process, this worked!
... View more
12-21-2015
06:16 PM
@Neeraj Sabharwal I thought the same but I've hardly 50 MB of data on the cluster. And it is showing this status from last 3 hours.
... View more
12-21-2015
06:08 PM
Hi, We have configured the NN in HA mode, it is a kerberized cluster. During the weekend we have shutdown all the nodes and today started all the nodes. I've valid TGT issues before issuing hadoop fs -ls commands. Now when I trying to issue hadoop fs -ls I see the following stack trace : ( Saying it is under safemode). When I checked the HDFS was really in Safemode. Using shell command I made it to move out of safemode. But still when we issue hadoop fs -ls / on the console I still see that NN are in safe mode. 15/12/21 16:37:13 INFO retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB over Namenode_HOST_2/192.168.1.4:8020 after 1 fail over attempts. Trying to fail over after sleeping for 676ms.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1872)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1306)
$ hadoop dfsadmin -safemode get
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
... View more
Labels:
12-21-2015
05:56 PM
1 Kudo
Hi Folks, In the kerberized cluster, we integrated AD for Ambari authentication. Using the AD users, I am able to login to Ambari. But when I log in by default it lands on the views. But When I click any of the views, I see an error. 500 Authentication requiredCollapse Stack Trace org.apache.hadoop.security.AccessControlException: Authentication required at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:334)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:91)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:608)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.access$100(WebHdfsFileSystem.java:458)
at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner$1.run(WebHdfsFileSystem.java:487)
While configuring the file view here are the properties I've used : Settings:
WebHDFS Username ${username} WebHDFS Authorization = auth=KERBEROS;proxyuser=admin Cluster Configuration Related to the cluster HDFS and name node details. After Kerberization I created a user "ambari-user/ambari-Host_name_here@KDCRealm.com And also created a key tab, copied on the ambari -server machine. Stopped Ambari server and then $ambari-server setupsecurity Specified the keytab of the Ambari-user (newly created the User in KDC) and started the Ambari-Server. Trying to access the Ambari -view but getting the above error. Did any one face similar issue? I am following the HDP documention section Configuring Ambari User Views with a Secure Cluster : http://hortonworks.com/wp-content/uploads/2015/04/AmbariUserViewsTechPreview_v1.pdf Regards, DP
... View more
Labels:
12-21-2015
03:46 PM
@Neeraj SabharwalI've a local KDC. in the local KDC I've admin/admin but non in the AD.
... View more
12-21-2015
02:34 PM
@Neeraj Sabharwal Yeah looks like users are getting Synch-ed. However I think here is the problem that : local user admin is being changed to ldap user ( flag changes in the user table) Is it expected behavior ?
... View more
12-21-2015
01:50 PM
One more observation @Neeraj Sabharwal : I updated the admin user's ldap_user flag to 0 as you mentioned and tried running the ambari ldap sync operation . On the console get the error : Enter Ambari Admin password:
Syncing all....... ERROR: Exiting with exit code 1. REASON: Sync event check failed. Error details: HTTP Error 403: Bad credentials Now I am checking the users table in the ambari database. I see few more users have been imported in the user's table. And the admin user's ldap_user flag is again set to 1. 🙂
... View more
12-21-2015
01:41 PM
You were right. The admin users had ldap_user flag set to 1. Not sure how it changed it...
... View more
12-21-2015
01:30 PM
I logged in to Ambari database and resetted the password to 'admin'. But still it did not work.
... View more