About maziyar

maziyar · ‎02-20-2018

Apache Livy is not exactly like Apache Spark on YARN in Zeppelin. There are things that you will notice you don't have even in Zeppelin 0.7.3 with Livy 0.4.0: No multiple output: If you have a count and a show and bunch of other stuff in one block you will only see the last result in Livy No ZeppelinContext in Livy No dep() to add dependency by user without you adding them manually and restart the Livy server I like Livy, but I had to move to Spark interpreter because of these features missing in Livy. Also, time to time you see error 500 which is really really hard to debug and see what caused crashing your app as suppose in Spark interpreter it will just show you the error itself.

maziyar · ‎01-25-2018

I just upgraded the entire cluster to 5.14 and the issue still remains: CDH: 5.14 CM: 5.14

maziyar · ‎01-22-2018

Unfortunately this happened again in 5.13.1 as I did "Update Hive Metastore NameNodes" and it added the port twice.

maziyar · ‎01-22-2018

If I export the current job it shows me this: "applications" : [ { "applicationId" : "application_1516618738289_0001", "name" : "livy-session-0", "startTime" : "1970-01-01T00:00:00.000Z", "user" : "maziyar", "pool" : "root.users.maziyar", "state" : "RUNNING", "progress" : 10.0, "attributes" : { }, "mr2AppInformation" : { } }, { "applicationId" : "application_1516618738289_0002", "name" : "Main", "startTime" : "1970-01-01T00:00:00.000Z", "user" : "maziyar", "pool" : "root.users.maziyar", "state" : "RUNNING", "progress" : 10.0, "attributes" : { }, "mr2AppInformation" : { } } ], "warnings" : [ ] } The startTime is in 1970 for some reason! This date is really famouse in Unix: "January 1, 1970 is the so called Unix epoch. It's the date where they started counting the Unix time. If you get this date as a return value, it usually means that the conversion of your date to the Unix timestamp returned a (near-) zero result. So the date conversion doesn't succeed" So is it the backend of Cloudera Manager that has ` returns 0` or the MySQL conversion some where pass unsupported format .

maziyar · ‎01-18-2018

I clear all the logs and previous jobs but the CM still have all the finished jobs with the wrong date. Also, still shows the new apps with that weird duration (which it looks like the converting the milliseconds to another time format went wrong). Does anybody know where is this data coming from? I have a MySQL setup for my CM. Can I look for this to see if this is a front-end issue or back-end or being inserted into file/table wrongly from the beginning. Many thanks.

maziyar · ‎01-15-2018

Hello, I am having a problem that I can't find any logical solution. Every job that requires YARN it will show up in "YARN Applications" UI on Cloudera Manager. Even though I can see all the running jobs on YARN Applications UI, ResourceManager UI, or Spark UI I have to widen my time selector to a year or two to see the finished jobs. I think this has something to do with displayed time. All the running jobs have the static `17540.7d` as their duration: At the same time these applications on `ResourceManager` are showing up with the right date/time: As you can see this makes it really hard to monitor and track anything in YARN Applications view in Cloudera Manager. Cloudera Manager express: 5.13.1 CDH: 5.13.1 Ubuntu Server 16.04 And I checked all the machines date/time to see if they are not sync. But unfortunately I can't find any issue in my cluster. NOTE: there is only one similar issue here, but I guess he can't see any jobs even by widening time window. (I can see jobs with wider time window 1-2yrs) http://community.cloudera.com/t5/Batch-Processing-and-Workflow/Completed-YARN-applications-not-visible-in-Cloudera-Manager-s/m-p/19858#M617 Best, Maziyar

maziyar · ‎11-02-2017

I think you are missing this which it was mentioned here: [desktop] use_new_editor=true Hope it helps

maziyar · ‎09-16-2017

Hi, Sorry I forgot to come back here and say how I found a quick workaround. So, here's how I do it: import org.apache.spark.sql.DataFrameWriter val options = Map("path" -> "this is the path to your warehouse") // for me every database has a different warehouse. I am not using the default warehouse. I am using users' directory for warehousing DBs and tables //and simply write it! df.write.options(options).saveAsTable("db_name.table_name") So as you can see a simple path to the warehouse of the database will solve the problem. I want to say Spark 2 is not aware of these metadata, but when you look at your spark.catalog you can see everything is there! So I don't know why it can't decide where is the path to your database when you want to write.save. Hope this helps 🙂

maziyar · ‎09-12-2017

OK, I have tried and it seems it's best to copy hive-site.xml into livy/conf/ and it will load it in every session. Best,

maziyar · ‎09-12-2017

Hi, Something to be careful is when you do "Deploy Client Configuration" on your Spark2 service it will remove the link or the hive-site.xml if you have copied it. I have noticed all these config are in $SPARK_CONF_DIR/yarn-conf/ so I wish Livy could also load them when it starts up the Spark.

Online	Offline
Last Visited	‎01-04-2022 01:48 AM

Member Since	‎11-04-2016 03:47 AM
Last Visited	‎01-04-2022 01:48 AM
Posts	74
Kudos received	16

Cloudera Community

Re: CM 6.1 hosts status Unknown Health randomly an...

Re: Cloudera Express 6.x release

Re: Can't start cloudera-scm-server after upgradin...

Re: Saving Spark 2.2 dataframs in Hive table

Re: Livy cannot find the database

Re: How to terminate Zeppelin applications when id...

Re: YARN Applications display wrong formatted dura...

Re: [CDH 5.10 upgrade] Wrong FS Hive tables

Re: YARN Applications display wrong formatted dura...

Re: YARN Applications display wrong formatted dura...

YARN Applications display wrong formatted duration

Re: HUE Spark notebook

Re: Saving Spark 2.2 dataframs in Hive table

Re: Zeppelin, Livy, Hive, Kerberos & Spark 1.6

Re: Zeppelin, Livy, Hive, Kerberos & Spark 1.6