Member since
04-13-2016
80
Posts
12
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3226 | 03-17-2017 06:10 PM |
06-11-2018
05:38 AM
yes that is it. Basically inside the iterator you would create a large insert statement. INSERT INTO films (code, title, did, date_prod, kind) VALUES
('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy'); Your column names can come from the dataframe and the values are from the dataframe it self. therefore nothing is hard coded and you can reuse this code for virtually any database which uses ansi sql inserts.
... View more
06-28-2017
12:48 PM
Hi @MPH Check the other related paramters in the documentation that I called above such as: yarn.timeline-service.entity-group-fs-store.scan-interval-seconds And make sure to restart the yarn timelineserver after making the changes.
... View more
04-11-2017
02:34 PM
What was your solution to fixing the corrupted interpreter.json? I ran into this same issue and was able to resolve it in the following manner: My issue arose when my namenode (which is running Ambari and Zeppelin) ran out of diskspace. This started a chain reaction in Ambari where services started dropping due to the inability to write data (logs) into the local filesystem. After freeing up some space in the local fs, the failed services started to become healthy in Ambari when the healthchecks returned successful statuses. Zeppelin then was the only one not working and restarting the services didn't go through -- the error message was the same as the original poster's: ValueError:No JSON object could be decoded To resolve this, I went to the /etc/zeppelin/conf directory, and noted that the interpreter.json file was 0 bytes. This file contains all the interpreter settings. After renaming this file with the suffix .bkp I restarted the Zeppelin service in Ambari and the interpreter.json file was repopulated. The ownership of the file did not correspond with the others in the directory so I needed to chown the file with the appropriate ownership. Note: I noted that after the interpreter.json is corrupted, and repopulated, any changes made prior to this are lost. So you will need to add them again in Zeppelin. Also sometimes the notebook-authentication.json, which is in the same folder, might also become corrupted. This file however is not repopulated on service restart. It contains interpreter specific authentication information.
... View more
11-07-2017
06:22 PM
I would like to know the same - if there is way to change the default port via the ambari bootstrap instead of the UI wizard.
... View more
01-10-2017
03:47 PM
1 Kudo
Hi @MPH, The best practice for a production environment is to have a dedicated cluster for HDF (it is easier for high availability and resources management). However, if you are not looking for high availability with only one HDF node, then you could imagine the situation where HDF is running on an edge node. However, keep in mind that, at the moment, HDP and HDF are managed by two different Ambari. Hope this helps.
... View more
01-24-2017
09:01 PM
I have checked spark 1.5.0 documentation and model.save(sc,"hdfs path"), <ModelClass>.load(sc,"hdfs path") are supported. Can you give a specific example ?
... View more
09-12-2016
01:01 PM
@mike harding to add to this, Tez by default first initializes an AM whereas MapReduce does so at submission only. This is the reason you see the behavior you describe. The tez container has a timeout setting as you stated and that will determine how long lived that initial AM is
... View more
08-02-2016
11:29 PM
1 Kudo
Livy sessions are recycled after an hour of session inactivity. This timeout is configured with livy.server.session.timeout
... View more
02-13-2017
02:36 PM
Good point. For my Sandbox testing, I decided to just use the steps provided in http://stackoverflow.com/questions/40550011/zeppelin-how-to-restart-sparkcontext-in-zeppelin to stop the SparkContext when I need to do something outside of Zeppelin. Not ideal, but working good enough for some multi-framework prototyping I'm doing.
... View more
07-27-2016
11:16 AM
The cluster is fairly small as its mostly experimental but I have 3 out of the 4 nodes in the cluster that each have 4 vCores and 1GB of memory, with a global YARN minimum memory container size of 256MB. So when you say slots I'm assuming that would translate into 12 slots/containers potentially? i.e. a container representing 1vCore + 256MB. I had assumed that for the resource (CPU/RAM) available in my cluster that the query I'm running on the dataset sizes I'm working with i..e 30-40k records would be more than enough?
... View more