About mph

sunile_manjee · ‎06-11-2018

yes that is it. Basically inside the iterator you would create a large insert statement. INSERT INTO films (code, title, did, date_prod, kind) VALUES ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'), ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); Your column names can come from the dataframe and the values are from the dataframe it self. therefore nothing is hard coded and you can reuse this code for virtually any database which uses ansi sql inserts.

ssahi · ‎06-28-2017

Hi @MPH Check the other related paramters in the documentation that I called above such as: yarn.timeline-service.entity-group-fs-store.scan-interval-seconds And make sure to restart the yarn timelineserver after making the changes.

karri_linnoinen · ‎04-11-2017

What was your solution to fixing the corrupted interpreter.json? I ran into this same issue and was able to resolve it in the following manner: My issue arose when my namenode (which is running Ambari and Zeppelin) ran out of diskspace. This started a chain reaction in Ambari where services started dropping due to the inability to write data (logs) into the local filesystem. After freeing up some space in the local fs, the failed services started to become healthy in Ambari when the healthchecks returned successful statuses. Zeppelin then was the only one not working and restarting the services didn't go through -- the error message was the same as the original poster's: ValueError:No JSON object could be decoded To resolve this, I went to the /etc/zeppelin/conf directory, and noted that the interpreter.json file was 0 bytes. This file contains all the interpreter settings. After renaming this file with the suffix .bkp I restarted the Zeppelin service in Ambari and the interpreter.json file was repopulated. The ownership of the file did not correspond with the others in the directory so I needed to chown the file with the appropriate ownership. Note: I noted that after the interpreter.json is corrupted, and repopulated, any changes made prior to this are lost. So you will need to add them again in Zeppelin. Also sometimes the notebook-authentication.json, which is in the same folder, might also become corrupted. This file however is not repopulated on service restart. It contains interpreter specific authentication information.

gsux666 · ‎11-07-2017

I would like to know the same - if there is way to change the default port via the ambari bootstrap instead of the UI wizard.

pvillard · ‎01-10-2017

Hi @MPH, The best practice for a production environment is to have a dedicated cluster for HDF (it is easier for high availability and resources management). However, if you are not looking for high availability with only one HDF node, then you could imagine the situation where HDF is running on an edge node. However, keep in mind that, at the moment, HDP and HDF are managed by two different Ambari. Hope this helps.

anatva · ‎01-24-2017

I have checked spark 1.5.0 documentation and model.save(sc,"hdfs path"), <ModelClass>.load(sc,"hdfs path") are supported. Can you give a specific example ?

iroberts · ‎09-12-2016

@mike harding to add to this, Tez by default first initializes an AM whereas MapReduce does so at submission only. This is the reason you see the behavior you describe. The tez container has a timeout setting as you stated and that will determine how long lived that initial AM is

vshukla · ‎08-02-2016

Livy sessions are recycled after an hour of session inactivity. This timeout is configured with livy.server.session.timeout

LesterMartin · ‎02-13-2017

Good point. For my Sandbox testing, I decided to just use the steps provided in http://stackoverflow.com/questions/40550011/zeppelin-how-to-restart-sparkcontext-in-zeppelin to stop the SparkContext when I need to do something outside of Zeppelin. Not ideal, but working good enough for some multi-framework prototyping I'm doing.

mph · ‎07-27-2016

The cluster is fairly small as its mostly experimental but I have 3 out of the 4 nodes in the cluster that each have 4 vCores and 1GB of memory, with a global YARN minimum memory container size of 256MB. So when you say slots I'm assuming that would translate into 12 slots/containers potentially? i.e. a container representing 1vCore + 256MB. I had assumed that for the resource (CPU/RAM) available in my cluster that the query I'm running on the dataset sizes I'm working with i..e 30-40k records would be more than enough?

Online	Offline
Last Visited	‎07-16-2020 05:48 AM

Member Since	‎04-13-2016 05:05 PM
Last Visited	‎07-16-2020 05:48 AM
Posts	80
Kudos received	12

Cloudera Community

Re: Zeppelin error on restart

Re: Why does write.mode("append") cause spark to c...

Re: can I delete files in /ats/done manually witho...

Re: Zeppelin error on restart

Re: Can I change the SSH port Ambari uses?

Re: Which of these approaches to HDF (Nifi) and HD...

Re: Is there a way to save a model in PYSPARK (pyt...

Re: How do I stop beeline holding onto YARN contai...

Re: Does the HiveContext object expire in Zeppelin...

Re: How can I limit the amount of YARN memory allo...

Re: Hive query running on Tez contains a Mapper th...