Member since
09-16-2017
20
Posts
1
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1142 | 12-01-2017 05:05 PM | |
2601 | 12-01-2017 04:59 PM |
12-01-2017
05:05 PM
All - just an update. The ES-Hadoop connector, as it should be, is something more in the benefit of Elasticsearch, not so much Spark or Hadoop. It will allow me to connect to the Elasticsearch cluster with spark-shell or PySpark. This is great for ad-hoc queries, however, for long term data movement, use Apache NiFi. The setup, if you are interested, can be found via Stackoverflow here, where I got some great help: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391 One issue I ran into was that we have SSL setup on Elasticsearch and while I was referencing that cert (I had to convert the PEM format to JKS, since Hadoop/Spark only understand JKS), it wasn't working. After working with Elasticsearch support, they had me add the CERT to the CACERTS file in my Java installation and everything worked after that. I had to do this on each box in my cluster for Spark/Hadoop if I ran a job across the cluster. If I ran in stand-alone mode, the single box was fine. Either way, this can save you a lot of issues, just add your Elasticsearch CERT to the CACERTS using the keytool.
... View more
12-01-2017
04:59 PM
All - just an update. I was able to get help resolving this on StackOverflow. See the post here: https://stackoverflow.com/questions/47399391/using-nifi-to-pull-elasticsearch-indexes?noredirect=1#comment82139433_47399391
... View more
12-01-2017
05:06 AM
I have a flow that gets data from Elasticsearch (ScrollElasticsearchHttp), from there, I put it through EvaluateJsonPath, then SplitJson. Then I put it through an InferAvroSchema to get it into a format that I can then use to use PutParquet into my HDFS. As you can tell me goal is to read entire indexes from Elasticsearch and then place then into HDFS as Parquet so I can do some machine learning with Hadoop/Spark. My issue is that the flow proceeds to the end (PutParquet) after the first scroll (10,000 scroll size). It writes the file and then on the 2nd scroll it errors out, as the file exists. The goal here it to read all of the index before processing further so it can write the whole index as one file to HDFS. I thought MergeContent would work, but I'm not able to get it to work like I think it should. So... maybe I'm doing the wrong thing.... How can I make sure that all the data of the multiple scrolls get written to one fine at the end and is placed as Parquet in HDFS? I assume there is a processor to merge the content/data as it comes in, but I'm not sure which processor to use and where in my flow it should be placed. Any help on how to solve this? Thanks so much!!
... View more
Labels:
- Labels:
-
Apache NiFi
11-28-2017
11:26 PM
All - thanks in advance for any help that can be provided. So, big picture: I have stood up a Hadoop/Spark cluster using Ambari (HDP 2.6.2/Hadoop 2.7.3/Spark 2.1.1) and want to do some advanced machine learning/analytics on some data. First use case is anomaly detection in syslog where we get that data from our Elasticsearch cluster. I was pointed to NiFi as a solution for automating our data movement from ES to HDFS. Each day in ES is a unique index (i.e. logstash-syslog-2017.11.28, etc.) and my goal is to setup NiFi to grab those indexes and save them to HDFS is Parquet format. Since everything in HDFS is going to be processes with MapReduce or Spark, this is a natural choice. I have a basic flow setup with some help from StackOverflow and Reddit (see flow.png). Now, this is using a GenerateFlowFile processor, since I already know how my data comes back. I just got a single document return and put that in as custom text so I wouldn't have to pound my ES cluster during testing. So, note that I have three outputs (PutFule, PutHDFS and PutParquet). The first two work just fine, they output files locally and to HDFS - no problem. The problem comes in on the third output, which is the one I need - PutParquet. I get an error: "Failed to write due to org.apache.nifi.processors.hadoop.execption.RecordReaderFactoryExcpetion: Unable to create RecordReader: Unable to create RecordReader." So, I thought, maybe it's due to all of the nested JSON I get back from Elasticsearch. I decided to go back to basics and get something working - so I followed the example from: https://community.hortonworks.com/articles/140422/convert-data-from-jsoncsvavro-to-parquet-with-nifi.html I still get the same error back, even when I change the JSON example and the Avro Schema to the ones specified in this example. So, I think I must have something unnecessary in my flow or some odd setting that I am unaware I need to setup. Things must have changed between whatever version of NiFi was being used in the above URL and 1.4, as the example shows Schema settings as part of the PutParquet processor options, however, in my version, that comes as part of the JsonTreeReader and associated AvroSchema. I am new to all of this and while I get the gist of NiFi, I start to get lost around the JsonTreeReader and AvroSchema stuff. What I'd like is to read in JSON from Elastic, convert that JSON to Parquet and store it. Do I need to define a schema for this, or is there some automated way I can have NiFi convert my read in JSON into something that can be stored as Parquet? Here is a example of how my data looks coming back from ES (some fields have been masked for obvious reasons). Any help on sorting this out and getting a working flow from ES to a Parquet file would be amazing, I've been working on this for around a week or so and am starting to come to the end of my rope on this.... Thanks so much! [
{
"hits": [
{
"app": {
"threadID": "6DE2CB70",
"environment": "DEV",
"service": "commonvpxLro",
"opID": "opID=HB-host-3009@205149-7520446b-5c",
"service_info": "VpxLRO"
},
"severity": "info",
"hostIP_geo": {
"location": {
"lon": XXXX,
"lat": XXXX
},
"postal_code": "Location 1"
},
"hostname": "DEV3-02",
"@timestamp": "2017-11-27T22:20:51.617Z",
"hostIP": "10.10.0.1",
"meta": {
"grok_match": "grok_match_1",
"received_at_indexer": "2017-11-27T22:20:51.727Z",
"received_from": "10.10.0.1",
"processed_at_indexer": "xvzzpaXXXXc",
"kafka_topic": "syslog",
"received_at_shipper": "2017-11-27T22:20:51.661Z",
"processed_at_shipper": "xvzzpaXXXXb"
},
"@version": "1",
"syslog": {
"program": "Vpxa",
"type": "vmware_esxi",
"priority": "166"
},
"message": "-- BEGIN session[938e0611-282b-22c4-8c93-776436e326c7]52dd2640-f406-2da1-6931-24930920b5db -- -- vpxapi.VpxaService.retrieveChanges -- 938e0611-282b-22c4-8c93-776436e326c7\n",
"type": "syslog",
"tags": [
"syslog",
"vmware",
"esxi",
"index_static",
"geoip"
]
}
]
}
]
... View more
- Tags:
- Data Ingestion & Streaming
- nifi-controller-service
- nifi-processor
- nifi-repository
- nifi-streaming
- nifi-templates
Labels:
- Labels:
-
Apache NiFi
10-31-2017
07:36 PM
Perfect! Thanks so much!
... View more
10-31-2017
06:03 PM
So, I think I fixed this, as the 'hdfs' user, I simply did an 'hdfs -chmod -R 777 /spark2-history' and restarted services. I'm no longer seeing the access / permission errors. Let me know if this was the correct fix or if I maybe did something I shouldn't have.... Thanks!
... View more
10-31-2017
05:21 PM
@Aditya Sirna - I checked this prior to posting - the Spark user does own that directory, but I don't think the issue with the Spark user. It seems to be with the other user zx6868a: org.apache.hadoop.security.AccessControlException: Permission denied: user=spark, access=READ, inode="/spark2-history/local-1505774309971":zx6878a:hadoop:-rwxrwx--- I think what is happening is that user is running PySpark / spark-submit as his own username, not as the Spark user. At least that is my guess. Would doing a CHMOD on that /spark2-hisotry folder to give everyone read and write access (chmod 777) be appropriate and fix this?
... View more
10-30-2017
10:12 PM
We are just getting underway to use Spark and the rest of our HDP 2.6.2 distribution for some machine learning. I got a ticket from our infrastructure guys late last week stating that I was running high on disk usage on one of my nodes. This particular node happens to be a Spark2 History Server. So, I go to check it out. Sure enough /var/log/spark2/ had one log that was over 14gb!! I removed that file, restarted the service and when I came in this morning after the weekend to check on it, once again, ~12gb. So I check the logs and see stuff like this: 17/10/30 15:00:47 INFO FsHistoryProvider: Replaying log path: hdfs://xczzpa0073.apsc.com:8020/spark2-history/local-1505774309971
17/10/30 15:00:47 ERROR FsHistoryProvider: Exception encountered when attempting to load application log hdfs://xczzpa0073.apsc.com:8020/spark2-history/local-1505774309971
org.apache.hadoop.security.AccessControlException: Permission denied: user=spark, access=READ, inode="/spark2-history/local-1505774309971":zx6878a:hadoop:-rwxrwx---
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1913)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2001)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1970)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1883)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
at sun.reflect.GeneratedConstructorAccessor5.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1240)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1225)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1213)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:309)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:274)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:266)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1538)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:331)
at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:327)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:327)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:786)
at org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:312)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:647)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:464)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$3$$anon$4.run(FsHistoryProvider.scala:352)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=spark, access=READ, inode="/spark2-history/local-1505774309971":zx6878a:hadoop:-rwxrwx---
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:219)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPathAccess(FSDirectory.java:1913)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:2001)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1970)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1883)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:700)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:377)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1554)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:272)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:291)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:203)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:185)
at com.sun.proxy.$Proxy11.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1238)
... 20 more
Ok, so it appears to be a permissions thing, but I am not sure how to fix this. A little background - I am in an enterprise setting, but have setup a vanilla HDP deployment with Ambari - no AD/Kerberos stuff going on, I am letting local process accounts deal with things. When you see a user like zxXXXX, that is a local user. In the example above, that is one of our contractors we have working on some of the heavy lifting for some of our machine learning algorithms. Looks like maybe he is running Spark or PySpark as his user, not the Spark user, but I can't really tell. Any idea what is going on here and how I can fix it to keep from getting so many error logs from building up? Thanks!
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
10-25-2017
10:01 PM
Since Spark 2.2 is not provided by HDP (yet), and we are trying to use "computeSVD", is there an alternative for the same functionality under Spark 2.1? Basically the code we are using to compute the singular decomposition value of a matrix of message identifiers needs functionality from that "computeSVD". Its provided in the Scala API of Spark 2.1.1, but not in the Python API. Is there something else I can use for this?
... View more
10-25-2017
05:55 PM
Thanks in advance on this - I am running Ambari and have deployed HDP-2.6.2.0 on a 12-node cluster (one name node, one secondary name-node and 10 data nodes). Originally when I did this I deployed Spark 2.1.1, HDFS 2.7.3 and other dependencies. One of our data scientists stated that he wants to use "computeSVD", but it is only available via the Python API in Spark 2.2. I'd like to upgrade Spark in place, but not sure if I need to upgrade other things, if I can do this via Ambari or what. Is there a process for doing this? Is Spark 2.2 provided in HDP at all yet? Thanks!
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
10-17-2017
11:52 PM
1 Kudo
I have a Hadoop/Spark cluster setup via Ambari (HDP -2.6.2.0). Now that I have my cluster running, I want to feed some data into it. We have an Elasticsearch cluster on premise (version 5.6). I want to setup the ES-Hadoop Connector (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/doc-sections.html) that Elastic provides so I can dump some data from Elastic to HDFS. I grabbed the ZIP file with the JARS and followed the directions on a blog post at CERN: https://db-blog.web.cern.ch/blog/prasanth-kothuri/2016-05-integrating-hadoop-and-elasticsearch-%E2%80%93-part-2-%E2%80%93-writing-and-querying So far, this seems reasonable, but I have some questions: 1. We have SSL/TLS setup on our Elasticsearch cluster, so when I perform a query, I obviously get an error using the example on the blog. What do I need to do on my Hadoop/Spark side and on the Elastic side to make this communication work? 2. I read that I need to add those JARS to the Spark classpath - is there a rule of thumb as to where i should put those on my cluster? I assume on of my Spark Client nodes, but I am not sure. Also, once i put them there, is there a way to add them to the classpath so that all of my nodes / client nodes have the same classpath? Maybe something in Ambari provides that? Basically what I am looking for is to be able to preform a query to ES from Spark that triggers a job that tells ES to push "X" amount of data to my HDFS. Based on what I can read on the Elastic site, this is how I think it should work, but I am really confused by the documentation. It's lacking and has confused both me and my Elastic team. Can someone provide some clear directions or some clarity around what I need to do to set this up?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
10-16-2017
03:37 PM
@Subramaniam Ramasubramania - Thanks for the feedback. I'm glad this will work with multiple applications. I tried it with the Pi example, but since that example executes so fast, it was still grabbing the 40000 port for both jobs. I'm relieved to know this will work long term. So for the port spacing, based on your recommendation, should I do the following: spark.blockManager.port = 40000, spark.broadcast.port = 40033, spark.driver.port = 40065, spark.executor.port = 40097, spark.fileserver.port = 40129, spark.replClassServer.port = 40161, spark.port.maxRetries = 5000? I know 5000 ports is a lot for the max retries, but I could probably bring that down to like 250, I just wanted to be safe. Does this sound better than what I have?
... View more
10-13-2017
08:00 PM
So, I decided to start back at square one and assign a specific port via the settings in Spark (in the Ambari interface). Here are those custom settings: spark.blockManager.port = 40000, spark.broadcast.port = 40001, spark.driver.port = 40002, spark.executor.port = 40003, spark.fileserver.port = 40004, spark.replClassServer.port = 40005, spark.port.maxRetries = 5000. I added that last one on as I read that give Spark a range to use. Now things are working. Will this somehow prevent me from running multiple applications at once. That is the feeling I get. What is the purpose of these settings?
... View more
10-13-2017
06:01 PM
I checked on the incoming/outgoing configuration, and that is not an issue. We don't configure things to that level. We either open the port or we don't open the port. I found another post that seems similar to my issues at: http://apache-spark-user-list.1001560.n3.nabble.com/spark-shell-driver-interacting-with-Workers-in-YARN-mode-firewall-blocking-communication-td5237.html . I think what I am facing is that on the data nodes, how does Spark know what port the worker should be listening on? It makes sense that, even though the firewall is open, nothing can connect it nothing is listening....
... View more
10-12-2017
11:06 PM
So, something one of my security guys just mentioned is that, even thought i have the ports open, how does the worker node know what port to be listening on.... He's got me convinced that even thought the port range is open, it still needs to be listening on those ports when the spark driver tries to be contacted by the job.... This seems logical, especially since when I turn off firewall services, it works. Am I going down the wrong path here?
... View more
10-12-2017
07:30 PM
I have just rolled out a Hadoop/Spark cluster in efforts to kick start a data science program at my company. I used Ambari as the manager and installed the Hortonworks distribution (HDFS 2.7.3, Hive 1.2.1, Spark 2.1.1, as well as the other required services. By the way, I am running RHEL 7. I have 2 name nodes, 10 data nodes, 1 hive node and 1 management node (Ambari). I built a list of firewall ports based on Apache and Ambari documentation and had my infrastructure guys push those rules. I ran into an issue with Spark wanting to pick random ports. When I attempted to run a Spark job (the traditional Pi example), it would fail, as I did not have the whole ephemeral port range open. Since we will probably be running multiple jobs, it makes sense to let Spark handle this and just choose from the ephemeral range of ports (1024 - 65535) rather than specifying a single port. I know I can pick a range, but to make it easy I just asked my guys to open the whole ephemeral range. At first my infrastructure guys balked at that, but when I told them the purpose, they went ahead and did so. Based on that, I thought I had my issue fixed, but when I attempt to run a job, it still fails with: Log Type: stderr
Log Upload Time: Thu Oct 12 11:31:01 -0700 2017
Log Length: 14617
Showing 4096 bytes of 14617 total. Click here for the full log.
Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:52 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:53 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:54 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:55 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:56 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:57 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:57 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:28:59 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:00 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:01 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:02 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:03 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:04 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:05 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:06 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:06 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:07 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:09 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:10 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:11 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:12 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:13 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:14 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:15 ERROR ApplicationMaster: Failed to connect to driver at 10.10.98.191:33937, retrying ...
17/10/12 11:29:15 ERROR ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:607)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:461)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:283)
at org.apache.spark.deploy.yarn.ApplicationMaster$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:783)
at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:67)
at org.apache.spark.deploy.SparkHadoopUtil$anon$1.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:781)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:804)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
17/10/12 11:29:15 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
17/10/12 11:29:15 INFO ShutdownHookManager: Shutdown hook called
At first I thought maybe I had some sort of misconfiguration with Spark and the namenodes/datanodes. However, to test it, I simply stopped firewalld on every node and attempted the job again and it worked just fine. So, my question - I have the entire 1024 - 65535 port range open - I can see the Spark drivers are trying to connect on those high ports (as show above - 30k - 40k range). However, for some reason when the firewall is on, it fails and when its off it works. I checked the firewall rules and sure enough, the ports are open - and those rules are working as I can access the web services for Ambair, Yarn and HFDS which are specified in the same firewalld xml rules file.... I am new to Hadoop/Spark, so I am wondering is there something I am missing? Is there some lower port under 1024 I need to account for? Here is a list of the ports below 1024 I have open, in addition to the 1024 - 65535 port range: 88
111
443
1004
1006
1019
It's quite possible I missed a lower number port that I really need and just don't know it. Above that, everything else should be handled by the 1024 - 65535 port range. Thank you in advance.
... View more
Labels:
- Labels:
-
Apache Spark
09-17-2017
06:03 AM
Ok everyone - I think I found the final piece of the solution: https://community.hortonworks.com/questions/12663/hdp-install-issues-about-hdp-select.html I had a few nodes that, for some reason, I had to reinstall the hdp-select package on. I'm not sure why and why it wasn't installed by the agent when the deployment started, but for some reason, I had 5 or 6 nodes that, when I ran "yum install hdp-select" it wasn't already installed. After doing that. I re-ran the installation and BAM! Everything started installing as expected. I do believe installing the libtirpc-devel-0.2.4-0.8.el7_3.i686.rpm was a key piece of this too (see my other comment). In the end: OS: RHEL 7.2 Ambari 2.5 HDP 2.6 I was able to install HDFS, all the Ambari Metrics stuff, Spark2 and Hive without an issue...
... View more
09-17-2017
06:03 AM
Another update- I attempted the suggestion from here: https://community.hortonworks.com/questions/112821/installing-a-3-node-cluster-in-aws-and-facing-some.html I installed the libtirpc-devel package using the following method: wget http://mirror.5ninesolutions.com/centos/7.3.1611/updates/x86_64/Packages/libtirpc-devel-0.2.4-0.8.el7_3.i686.rpm yum install libtirpc-devel-0.2.4-0.8.el7_3.i686.rpm ambari-server reset Then I started the installation all over again, twice - once with HDP 2.5 and once with 2.6. Neither worked and I get the same error on the installation: 2017-09-15 18:58:41,430 - Package['hadoop_2_6_0_3_8'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-09-15 18:58:41,539 - Installing package hadoop_2_6_0_3_8 ('/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8')
2017-09-15 18:58:41,922 - Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8' returned 1. Error: Nothing to do
2017-09-15 18:58:41,922 - Failed to install package hadoop_2_6_0_3_8. Executing '/usr/bin/yum clean metadata'
2017-09-15 18:58:42,235 - Retrying to install package hadoop_2_6_0_3_8 after 30 seconds Still searching for a solution.....
... View more
09-17-2017
06:03 AM
Also, as an update - I have attempted the process described in: https://community.hortonworks.com/questions/67376/hdp-25-installation-problem-in-centos7.html However, I get the same failures to install hadoop/HDFS... I'm still investigating and trying things, but wanted to update the post that I have tried this solution.
... View more
09-17-2017
06:03 AM
Hello all - I am reaching out to the community as I have hit a wall. I am attempting to install a Hadoop/Spark cluster on a series of 13 machines - 2 names nodes, 10 data nodes, and 1 hive server. These machines have been provided to me by my infrastructure team, so I have little selection over what I get. They are installed with RHEL 7.3 (Maipo). Memory/CPU/Disk are not an issue at this point, I have 8core/64gb/1tb boxes. This is a small proof of concept. I have attempted the install with HDP 2.6 and reverted back to try 2.5.3.0 (the last attempt). Each time I get the same issue. It seems Ambari is having issues installing the HDFS client (package: hadoop_2_6_0_3_8). Installing the Ambari server and agents was smooth, no issues there at all. Registering the agents and hosts went fine too. It was only when I attempt to deploy the cluster - that's when the trouble starts. Here is the output from one of the failures: stderr: /var/lib/ambari-agent/data/errors-147.txt Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 78, in <module>
HdfsClient().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 329, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_client.py", line 38, in install
self.install_packages(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 708, in install_packages
retry_count=agent_stack_retry_count)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install
self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 53, in install_package
self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 86, in checked_call_with_retries
return self._call_with_retries(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 98, in _call_with_retries
code, out = func(cmd, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8' returned 1. Error: Nothing to do
stdout: /var/lib/ambari-agent/data/output-147.txt 2017-09-15 16:58:19,497 - Stack Feature Version Info: Cluster Stack=2.5, Cluster Current Version=None, Command Stack=None, Command Version=None -> 2.5
2017-09-15 16:58:19,508 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
User Group mapping (user_group) is missing in the hostLevelParams
2017-09-15 16:58:19,509 - Skipping creation of User and Group as host is sys prepped or ignore_groupsusers_create flag is on
2017-09-15 16:58:19,509 - Skipping setting dfs cluster admin and tez view acls as host is sys prepped
2017-09-15 16:58:19,509 - FS Type:
2017-09-15 16:58:19,509 - Directory['/etc/hadoop'] {'mode': 0755}
2017-09-15 16:58:19,511 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 01777}
2017-09-15 16:58:19,526 - Initializing 2 repositories
2017-09-15 16:58:19,526 - Repository['HDP-2.5'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.3.0', 'action': ['create'], 'components': [u'HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None}
2017-09-15 16:58:19,536 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.5]\nname=HDP-2.5\nbaseurl=http://public-repo-1.hortonworks.com/HDP/centos7/2.x/updates/2.5.3.0\n\npath=/\nenabled=1\ngpgcheck=0'}
2017-09-15 16:58:19,537 - Repository['HDP-UTILS-1.1.0.21'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7', 'action': ['create'], 'components': [u'HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None}
2017-09-15 16:58:19,541 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.21]\nname=HDP-UTILS-1.1.0.21\nbaseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.21/repos/centos7\n\npath=/\nenabled=1\ngpgcheck=0'}
2017-09-15 16:58:19,541 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-09-15 16:58:19,652 - Skipping installation of existing package unzip
2017-09-15 16:58:19,652 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-09-15 16:58:19,663 - Skipping installation of existing package curl
2017-09-15 16:58:19,663 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-09-15 16:58:19,674 - Skipping installation of existing package hdp-select
2017-09-15 16:58:19,888 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-09-15 16:58:19,901 - Stack Feature Version Info: Cluster Stack=2.5, Cluster Current Version=None, Command Stack=None, Command Version=None -> 2.5
2017-09-15 16:58:19,933 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-09-15 16:58:19,953 - checked_call['rpm -q --queryformat '%{version}-%{release}' hdp-select | sed -e 's/\.el[0-9]//g''] {'stderr': -1}
2017-09-15 16:58:19,985 - checked_call returned (0, '2.6.0.3-8', '')
2017-09-15 16:58:19,996 - Package['hadoop_2_6_0_3_8'] {'retry_on_repo_unavailability': False, 'retry_count': 5}
2017-09-15 16:58:20,107 - Installing package hadoop_2_6_0_3_8 ('/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8')
2017-09-15 16:58:20,490 - Execution of '/usr/bin/yum -d 0 -e 0 -y install hadoop_2_6_0_3_8' returned 1. Error: Nothing to do
2017-09-15 16:58:20,490 - Failed to install package hadoop_2_6_0_3_8. Executing '/usr/bin/yum clean metadata'
2017-09-15 16:58:20,805 - Retrying to install package hadoop_2_6_0_3_8 after 30 seconds
Command failed after 1 tries I'd be forever in debt to anyone who can help me figure this out. I have been pulling my hair out all week and need to have a running cluster by Monday for a project start date. Does anyone have any ideas on the issue above and how to resolve it? I also found reference to the following post, which seems to be similar to my issue (thought I'm not 100% sure): https://community.hortonworks.com/questions/96763/hdp-26-ambari-install-fails-on-rhel-7-on-libtirpc.html I have little control over the OS version, but if anyone has a specific recipe to get Amabari/Hadoop/Spark installed on a working cluster with RHEL 7.3, I am all ears.
... View more
Labels: