About Jim_B

Jim_B · ‎08-17-2016

Is there a chart or other summary documentation on when it is necessary to install Hadoop clients on a specific host? What exactly does installing a client do, other than make sure that the config files are installed.

Jim_B · ‎06-15-2016

Some additional information - This only happens when installing the SQOOP client (put all the clients in one by one until Java got updated!) Also tried removing Java 1.8 from the ambari-server.properties file, but no change in behaviour Have not yet had time to try the yum.conf exclude statement, but will try today.

Jim_B · ‎06-15-2016

When running ambari setup the java home was specified as /usr/java/latest, and this directory exists on each node pointing to Java 1.7.

Jim_B · ‎06-13-2016

Building out a new cluster with Ambari 2.1 and using pre-installed JDK 1.7 (custom JDK option during setup). However, on the nodes where we install clients, it is automatically updating to 1.8. This does not happen on nodes where we do NOT install client (eg. Name nodes). - Anyone hear of this???

Jim_B · ‎06-07-2016

In the document "http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_installing_manually_book/content/installing_flume.html", it states "Hortonworks recommends that administrators not install Flume agents on any node in a Hadoop cluster." That is a really subtle (and hard to notice!) way of saying to put Flume on dedicated servers. As noted above, in a smaller cluster you can get away with putting them on other nodes. A lot of this depends on the volume of data being processed by Flume and what else if running on the host. There is also some good info on flume resource at https://cwiki.apache.org/confluence/display/FLUME/Flume's+Memory+Consumption.

Jim_B · ‎05-10-2016

@Abdelkrim Hadjidj So, I AM trying to get the powers to be to switch over to NiFi, but in the mean time we have a short time frame to port what they have with as little changes as possible. Under Starting Flume The document also shows starting Flume from the command line. In this scenario, you could put each one in a separate config file. I am just wondering if this is how most large enterprises are running in production. And, if so, how they are monitoring them. BTW, I had accidentally posted this an answer, so not sure if everyone saw it.

Jim_B · ‎05-06-2016

I need to find out what the best practice is for running a set of flume agents in production. All the answers I find dance around the issue. I am clear that when setting up in Ambari, you can create a number of config groups for Flume, and each agent needs to be concatenated into the flume.conf for that group. So, each agent runs 1 instance on each host associated with the configuration group. At this point, you can see and restart individual agents through Ambari. However (and here’s the problem), if you make a change to any of the agents configuration or add a new one then you need to restart ALL of the agents in that group for the change to take effect! Not acceptable in my case where I have 4 apps running 2 or 3 agents each. It certainly does not seem to be acceptable to have to restart all applications flume agents whenever a change is made! So, am I missing something or are large enterprises simply using shell scripts to start the agent on each host? If they are using script, then what is being used for monitoring and auto-restart?

Jim_B · ‎04-22-2016

Just in case you do want to manually clean the trash expunge Usage: hadoop fs -expunge Empty the Trash. Refer to the HDFS Architecture Guide for more information on the Trash feature.

Jim_B · ‎04-12-2016

We have 4 apps running Flume, and are experiencing performance issues and running out of file descriptors. We have 4 apps, running 4 instances each across 16 data nodes. They have approximate volumes: App A - 60 GB per month App B - 150 KB per month App C - 54 GB per day App D - 330 GB per day We have been advised to move these onto dedicated hosts (4 hosts running 1 agent for each app = 4 per node). My Questions are: 1. Is this a best practice for placement of Flume Agents? 2. With this cause downsides with data locality of HDFS files that are written out?

Jim_B · ‎03-30-2016

Do we have any rules of thumb on what the total database size for metadata databases for say a 100 node cluster would be. I have an initial install with all metadata (Ambari, Hive, Oozie, Hue) using a single MySql instance. The DBA's are asking what kind of space to expect once the cluster is up to production size.

Online	Offline
Last Visited	‎11-12-2020 12:26 AM

Member Since	‎05-22-2019 10:28 AM
Last Visited	‎11-12-2020 12:26 AM
Posts	70
Kudos received	22

Cloudera Community

Re: Hive queries are failing in Ineractive quey HD...

Re: Disable Hive shell for user and provide access...

Re: Unable to get Nifi site-to-site RPG to balance...

Re: Ranger permissions to create temporary functio...

Re: Ambari client install performs unwanted JDK up...

When to install Hadoop clients

Re: Ambari client install performs unwanted JDK up...

Re: Ambari client install performs unwanted JDK up...

Ambari client install performs unwanted JDK updgra...

Re: Do i need to install flume agent in dedicated ...

Re: Flume in Production - To Ambari or not to Amb...

Flume in Production - To Ambari or not to Ambari,...

Re: hdfs trash compaction

Best Practice for Flume placement - data nodes vs ...

Re: What are Oozie Production Recommendations?