Member since
01-05-2016
55
Posts
37
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
804 | 10-21-2019 05:16 AM | |
3762 | 01-29-2018 07:05 AM | |
2577 | 06-27-2017 06:42 AM | |
37045 | 05-26-2016 04:05 AM | |
26693 | 05-17-2016 02:15 PM |
05-26-2020
05:01 AM
Hi, as we all know CDH 6.3.3 and subsequent versions are not available anymore under "Express licensing" model. Yet, I was in the process of setting up a 6.3.1 installation and apparently this is not possible, because a valid authentication (enterprise) is required. What am I doing wrong? Below, a screenshot from the Installation Guide: But when I try to import the 6.3.1 Repository (WITHOUT Username and Password as stated on the documentation itself for version < 6.3.3 I get the following error: # rpm --import https://archive.cloudera.com/p/cm6/6.3.1/redhat7/yum/RPM-GPG-KEY-cloudera
curl: (22) The requested URL returned error: 401 Authentication required
errore: https://archive.cloudera.com/p/cm6/6.3.1/redhat7/yum/RPM-GPG-KEY-cloudera: lettura importazione fallita(2). What am I doing wrong? @Cloudera1 Is this a mistake I'm committing or is it a problem with Cloudera implementing their policies in a wrong way? Thanks for any insights
... View more
Labels:
10-21-2019
05:16 AM
1 Kudo
You can query the API exposed by Cloudera Manager and simplify your life. For example, you can run the following: curl -u <CM_USER>:<CM_PASSWD> http://<CM_IP_ADDRESS>:7180/api/v19/clusters/<CLUSTER_NAME>/services/hive2 You'll get a Json answer in reply to your Query, with all the details related to the desired service's status. You can finally parse your Json answer (e.g. using "jq" or directly inside your bash script) and take the desired actions HTH
... View more
01-29-2018
07:26 AM
1) Apparently, yes 2) The name of the user you're trying to use to log in to the remote system, I suppose. Pls note that the user you specify here would be the user "oozie" will run as, so you'd eventually get other problems of unpredictable nature when using Oozie 3) I don't really know, sorry about that... The fact is that even if I'm pretty sure to have understood the cause of your issue, I never had to deal with it directly myself. Maybe the easier way could be to follow the additional suggestions I wrote in my previous answer (give permissions to OS User "yarn" to "ssh" and/or "su"). Or, maybe, another possibility would be for you to create a "yarn" user on the remote system and grant this user with the correct permissions to get to the final working directory I hope you'll manage to get through the problems and make it 🙂
... View more
01-29-2018
07:05 AM
This is probably related to the fact that the shell action, when run from Oozie, runs as user "yarn" and not as the desired user you're specifying in the ssh command You can refer to this thread for more information about the issue: https://community.cloudera.com/t5/Batch-Processing-and-Workflow/How-to-run-Oozie-workfllow-or-action-as-another-user/td-p/26794 It should all boil down (in case your cluster is not secured with kerberos) to try and set up your environment, specifically the "linux-container-executor" configuration parameter (you go in Cloudera Admin UI --> Yarn --> Configuration). It's all explained in the linked document. Another alternative could be to grant OS user "yarn" with permissions to execute "ssh" and/or "su" so you can eventually switch user in your script before executing the remote ssh command HTH
... View more
07-22-2017
08:45 AM
Thanks mbigelow, following your suggestions I solved the massive error logging issue. I've processed in a Json validator the specific log file referenced in the Java stack trace: /user/spark/applicationHistory/application_1494352758818_0117_1 But the format was correct, according to the validator. So I just moved it away in a temporary directory. As soon as I did it, the error messages stopped clogging the system logs. So it was probably corrupted in a very subtle way... But it was definitely corrupted That Json file has been indeed generated by the Spark Action that is giving me problems, but it was an OLD file. New instances of that Spark Action are generating new Json logs, but they are not giving any troubles to the History Server (stopped having tons of exceptions logged as I just said) Unfortunately, the Spark job itself is still failing and it's needing further investigation on my side, so apparently this is not related to that specific error message. But I've solved an annoying problem, and at the same time I have cleared out the possibility of the Spark Action issue being related to that java exception Thanks!
... View more
07-19-2017
08:58 AM
Hi mbigelow, I've tried what ypu suggested (stop hive + running the action from the dropdown menu). Process was successful, but the warning in spark CLI is still there...
... View more
07-19-2017
07:05 AM
Additional info. If I run spark CLI (where my spark procedures are working, btw, differently from when they are launched in Oozie), as soon as I try to define a Dataframe I receive the following warning that I've never seen before the upgrade: In [17]: utenti_DF = sqlContext.table("xxxx.yyyy")
17/07/19 15:48:58 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0
17/07/19 15:48:58 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException Anyway, as I repeat, from CLI things work. I just thought this could be relevant
... View more
07-19-2017
06:47 AM
Hi all, after recently upgrading to CDH 5.11 I get tons of the following "Unexpected end-of-input" log entries related to "SPARK" (running on YARN) and classified as "ERRORS". I'm experiencing malfunctionings (failed Oozie Jobs) and I believe they are related to these errors, so I'd really like to solve the causing issue and see if the situation gets any better. In the logs, "source" is: FsHistoryProvider And "message" is: Exception encountered when attempting to load application log hdfs://xxxxx.xxxxx.zz:8020/user/spark/applicationHistory/application_1494352758818_0117_1
com.fasterxml.jackson.core.JsonParseException: Unexpected end-of-input: was expecting closing quote for a string value
at [Source: java.io.StringReader@1fec7fc4; line: 1, column: 3655]
at com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1369)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:599)
at com.fasterxml.jackson.core.base.ParserMinimalBase._reportInvalidEOF(ParserMinimalBase.java:532)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString2(ReaderBasedJsonParser.java:1517)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser._finishString(ReaderBasedJsonParser.java:1505)
at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:205)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:20)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:28)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:42)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:35)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:2888)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2034)
at org.json4s.jackson.JsonMethods$class.parse(JsonMethods.scala:19)
at org.json4s.jackson.JsonMethods$.parse(JsonMethods.scala:44)
at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:583)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$16.apply(FsHistoryProvider.scala:410)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$16.apply(FsHistoryProvider.scala:407)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:407)
at org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$3$$anon$4.run(FsHistoryProvider.scala:309)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748) Any suggestions/ideas? Thanks!
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
06-27-2017
06:42 AM
In the end I've been able to solve the issue. I've been tricked by the fact that applying again from scratch the "YARN Resources Allocation Tuning Guide" proposed a (in my opinion) misleading way of calculating a few important parameters. Guide can be found here: https://www.cloudera.com/documentation/enterprise/5-10-x/topics/cdh_ig_yarn_tuning.html In a matter of fact, the Guide contains a downloadable XLS file which is a tool for calculating optimal parameters. This XLS automatically calculates and proposes a few values to be assigned to YARN configuration: As you can see above, at step 4 I got proposed "2" for "yarn.nodemanager.resource.cpu-vcores" and "5632" for "yarn.nodemanager.resource.memory-mb" I later found out that the correct values to be assigned to those configurations are the 2 values proposed at "step 5" Definitely, partly my fault (I do not have deep knowledge of YARN configuration). But partly misleading doc indeed. I am now fine tuning, trying different settings for the various java heap sizes etc Still I have no idea why everything was working fine until recently and stopped working after upgrading to 5.11, as I did not change any configuration while upgrading and physical resources are identical
... View more