Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
14817 | 09-01-2018 01:27 AM | |
1825 | 09-01-2018 01:18 AM | |
5416 | 08-20-2018 09:39 PM | |
922 | 07-20-2018 04:51 PM | |
2409 | 07-16-2018 09:41 PM |
04-06-2018
04:47 PM
@Rajesh Reddy - No, wget only tests HTTP/S connections. Not plain TCP that Kafka and Zookeeper are using
... View more
04-05-2018
07:57 PM
Well, can you issue "kafka-topics --zookeeper $ZK_VIP" or "kafka-console-producer --bootstrap-server $KAFKA_VIP" commands without error?
... View more
03-20-2018
06:57 PM
2 Kudos
You will need to query the Hive metastore to 1. Filter on External tables 2. JOIN all tables (TBLS table) with all databases (DBS table) 3. Select the path See here, for example - https://stackoverflow.com/questions/44151670/search-a-table-in-all-databases-in-hive Or if you cannot connect to the metastore, you will need to scan over Hive tables - https://stackoverflow.com/questions/35004455/how-to-get-all-table-definitions-in-a-database-in-hive
... View more
03-12-2018
08:05 PM
I would suggest you use HDFS connect rather than Spark Streaming as it is more fault tolerant. Kafka Connect is built into the base Kafka libraries, but you need to compile and add HDFS Connect separately to the classpath of Connect. Build from here: https://github.com/confluentinc/kafka-connect-hdfs and use a tagged branch rather than master as the releases are publicly available libraries, not SNAPSHOT builds that require you to compile Kafka from source.
... View more
02-24-2018
07:02 PM
2 Kudos
@Tom C That's just a warning that you have existing processes on the machines. If you let it uninstall packages or delete user accounts, you'll have downtime on the cluster, and services might not stop gracefully, so you risk additional corruption. I've added machines like this that are provisioned by Puppet, and so there are some extra background services running, but I just ignore that warning, and Ambari has set them up fine. Regarding the Hive Metastore, if you have set it up to use an external Postgres/MySQL database (recommended), I would probably first let Ambari first install the embedded Derby database for Hive, then manually edit the hive-site XML to point to the old one.
... View more
02-24-2018
03:25 AM
1 Kudo
Ambari doesn't control any data residing on the datanodes, so you should be safe there. What I would do is let all the Hadoop components remain running "in the dark", by stopping all ambari-agents in the cluster, maybe even uninstalling it. Then, install and setup a new Ambari server, add a cluster, but register no hosts. Configure each of stopped ambari agents to point at the new Ambari server address, and start them. Add the hosts in the Ambari server UI, selecting "manual registration" option at the bottom of the dialog. Hopefully all the hosts register successfully. After which, you are given the option of installing clients and servers. Now, you could try to "reinstall" what is already there, but you might want to deselect all the servers on the datanode column. In theory, it will try to perform OS package installation, and say that the service already exists, and doesn't error out. If it does error, then restart the install process, but deselect everything -- At which point, it should continue and now you have Ambari back up and running with all the hosts monitored, just with no processes to configure. To add the services back, you would need to use the Ambari REST API to add back in the respective Services, Components, and Host Components that you have running on the cluster. If you can't remember what those are from all the things you have the options of installing in HDP, then go to each host and do a process check to see what's running.
... View more
02-22-2018
08:19 PM
@Rakesh AN I have worked for at least three companies trying to follow Agile/Scrum, and their development cycles of code does follow it. It's hard to upgrade hundreds of Hadoop nodes and software versions, make sure they all work with other components of the cluster, all without breaking other pieces in two-week sprints, though. Stand up meetings are all about perception management between team members and management. It again, has no special relationship or difference whether it is Hadoop development, web or mobile development, etc.
... View more
02-21-2018
06:51 PM
It might be beneficial to stuff this in a Docker container and run
make prod tarball
Then, you can run the Docker container for the respective environment, and simply copy the tarball to external clusters.
... View more
02-21-2018
06:49 PM
I don't believe hardware or infrastructure setup should follow any such workflow. Maybe you need separate environments to isolate workloads, but other than that, it's the code itself that follows development patterns as anything else. Examples of such code could include MapReduce, Hive scripts, Oozie jobs, Spark processes, NiFi dataflows, etc. In terms of MapReduce or Spark, you can use CI/CD processes to build code and push it to HDFS, and submit it to YARN to run once, or submit it to Oozie to run on a schedule. Hadoop itself just offers HDFS, YARN, and MapReduce. It's everything else that is very specific to your needs and processes.
... View more
02-02-2018
09:58 PM
I assume this returns a limited result set, though, for large tables?
... View more