About JordanMoore

JordanMoore · ‎04-06-2018

@Rajesh Reddy - No, wget only tests HTTP/S connections. Not plain TCP that Kafka and Zookeeper are using

JordanMoore · ‎04-05-2018

Well, can you issue "kafka-topics --zookeeper $ZK_VIP" or "kafka-console-producer --bootstrap-server $KAFKA_VIP" commands without error?

JordanMoore · ‎03-20-2018

You will need to query the Hive metastore to 1. Filter on External tables 2. JOIN all tables (TBLS table) with all databases (DBS table) 3. Select the path See here, for example - https://stackoverflow.com/questions/44151670/search-a-table-in-all-databases-in-hive Or if you cannot connect to the metastore, you will need to scan over Hive tables - https://stackoverflow.com/questions/35004455/how-to-get-all-table-definitions-in-a-database-in-hive

JordanMoore · ‎03-12-2018

I would suggest you use HDFS connect rather than Spark Streaming as it is more fault tolerant. Kafka Connect is built into the base Kafka libraries, but you need to compile and add HDFS Connect separately to the classpath of Connect. Build from here: https://github.com/confluentinc/kafka-connect-hdfs and use a tagged branch rather than master as the releases are publicly available libraries, not SNAPSHOT builds that require you to compile Kafka from source.

JordanMoore · ‎02-24-2018

@Tom C That's just a warning that you have existing processes on the machines. If you let it uninstall packages or delete user accounts, you'll have downtime on the cluster, and services might not stop gracefully, so you risk additional corruption. I've added machines like this that are provisioned by Puppet, and so there are some extra background services running, but I just ignore that warning, and Ambari has set them up fine. Regarding the Hive Metastore, if you have set it up to use an external Postgres/MySQL database (recommended), I would probably first let Ambari first install the embedded Derby database for Hive, then manually edit the hive-site XML to point to the old one.

JordanMoore · ‎02-24-2018

Ambari doesn't control any data residing on the datanodes, so you should be safe there. What I would do is let all the Hadoop components remain running "in the dark", by stopping all ambari-agents in the cluster, maybe even uninstalling it. Then, install and setup a new Ambari server, add a cluster, but register no hosts. Configure each of stopped ambari agents to point at the new Ambari server address, and start them. Add the hosts in the Ambari server UI, selecting "manual registration" option at the bottom of the dialog. Hopefully all the hosts register successfully. After which, you are given the option of installing clients and servers. Now, you could try to "reinstall" what is already there, but you might want to deselect all the servers on the datanode column. In theory, it will try to perform OS package installation, and say that the service already exists, and doesn't error out. If it does error, then restart the install process, but deselect everything -- At which point, it should continue and now you have Ambari back up and running with all the hosts monitored, just with no processes to configure. To add the services back, you would need to use the Ambari REST API to add back in the respective Services, Components, and Host Components that you have running on the cluster. If you can't remember what those are from all the things you have the options of installing in HDP, then go to each host and do a process check to see what's running.

JordanMoore · ‎02-22-2018

@Rakesh AN I have worked for at least three companies trying to follow Agile/Scrum, and their development cycles of code does follow it. It's hard to upgrade hundreds of Hadoop nodes and software versions, make sure they all work with other components of the cluster, all without breaking other pieces in two-week sprints, though. Stand up meetings are all about perception management between team members and management. It again, has no special relationship or difference whether it is Hadoop development, web or mobile development, etc.

JordanMoore · ‎02-21-2018

It might be beneficial to stuff this in a Docker container and run make prod tarball Then, you can run the Docker container for the respective environment, and simply copy the tarball to external clusters.

JordanMoore · ‎02-21-2018

I don't believe hardware or infrastructure setup should follow any such workflow. Maybe you need separate environments to isolate workloads, but other than that, it's the code itself that follows development patterns as anything else. Examples of such code could include MapReduce, Hive scripts, Oozie jobs, Spark processes, NiFi dataflows, etc. In terms of MapReduce or Spark, you can use CI/CD processes to build code and push it to HDFS, and submit it to YARN to run once, or submit it to Oozie to run on a schedule. Hadoop itself just offers HDFS, YARN, and MapReduce. It's everything else that is very specific to your needs and processes.

JordanMoore · ‎02-02-2018

I assume this returns a limited result set, though, for large tables?

Online	Offline
Last Visited	‎12-07-2015 12:15 PM

Member Since	‎11-19-2015 11:49 AM
Last Visited	‎12-07-2015 12:15 PM
Posts	158
Kudos received	25

Cloudera Community

Re: what is the most best monitoring tool for hado...

Re: What are the resources and technologies requir...

Re: How can I run kafka connect to import data fro...

Re: HDP Component working in deep

Re: I want to add an additional edge node to my ex...

Re: Can we have a VIP for kafka and zookeeper serv...

Re: Can we have a VIP for kafka and zookeeper serv...

Re: How to get all the External Hive table details...

Re: import data from kafka topics to HDFS

Re: Install a new Ambari-Server on existing HDP st...

Re: Install a new Ambari-Server on existing HDP st...

Re: How Hadoop works with agile and scrum methodol...

Re: Compile / Install Hue from Source Code on Cent...

Re: How Hadoop works with agile and scrum methodol...

Re: How to store Query Results to Local Drive