Member since
10-01-2016
13
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
517 | 10-21-2016 05:00 AM |
10-13-2017
06:07 AM
We are planning to setup HDF component and mainly use NIFI. We tend to follow some suggestion to have seperate cluster from existing HDP. But, it seems we will need a minimume 3 node for HDF also to have a valid zookeeper setup. So far, we don't expect heavy data flow yet and hesitate to make this investement. What is the best practise here for 'small' setup, for example: - Can HDF be setup on single server ? - Can HDF be managed under HDP together ? Regards, Allen
... View more
Labels:
- Labels:
-
Cloudera DataFlow (CDF)
05-28-2017
07:31 AM
We are in an decision point to select the right approach to transform the data. Want to have your input ? Our case: We use Hive as main data lake store and all the data (so far) are structured data. Same as traditional data warehouse, we need to do transformation (lookup, aggregation, etc) on source tables to target tables. Now, need to decide which approach to go. Now I tend to go with coding approach (HiveQL, Spark) and build our own metadata. But the tools like Talend was also recommended by others. So want to hear some ideas here. Once driver behind the decision is that I want to build a team of high tech skills. I do have traditional ETL background (informatica, datastages, etc) and see the pros and cons. So don't want settle with " a large team of low-skill programmers supporting a single tool" and believe "today's big data developers are a bit more technical than their data warehousing counterparts. And so, they are even less enamored by clunky frameworks, less intimidated of writing a lot of code if necessary". Your thought ?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
04-19-2017
09:00 AM
I tried install to add Hive Client and Tze client into an host but failed. When i tried to re-install it, only found the two components have removed from available component list, and it seems there is no option to delete the existing one (unsuccessfully) either. Any suggestions on how to proceed from here to re-install these ? Thanks
... View more
Labels:
- Labels:
-
Apache Ambari
03-19-2017
02:34 PM
For cluster managed by Ambari, it seems Ambari will also be a single point of failure. Should we consider build HA for Ambari ? If yes, what is the common practice ? It seems there are many discussion on HA for names server, Hive, etc, but seems nothing for Ambari, also wondering why.
... View more
Labels:
- Labels:
-
Apache Ambari
10-21-2016
05:00 AM
This question can be ignored. After further reading, i understood this is expected. Ambari keeps the configuration somewhere else (most likely in its own repository and then extract and refresh the .xml during start up. Sound makes sense for the usage of repository).
... View more
10-20-2016
05:57 AM
1 Kudo
Just wondering whether Abmari does anything additional, when start HDFS/YARN, comparing to from command line directly. I noted below case while testing rack awareness which prompts this questions.
Thanks, Allen. I have 4 node cluster installed by Ambari
By default, it seems Ambari has already has rack awareness enabled.
core-site.xml: <name>net.topology.script.file.name</name>core-site.xml: <value>/etc/hadoop/conf/topology_script.py</value>
topology_script.py reads config data from topology_mappings.data topology_script.py:DATA_FILE_NAME = os.path.dirname(os.path.abspath(__file__)) + "/topology_mappings.data"
For testing purpose, I modified the rack info in topology_mappings.data a: when restart the HDFS and YARN manually, the change takes effect b. when restart by Ambari, the change is NOT take effect. And further, the topology_mappings.data content is over-written with 'default' value and all my change has gone. (Is Ambari collecting the rack info automatically somewhere ? )
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Hadoop
10-18-2016
12:05 PM
Yes. I did use the local repository. It seems i did not change the local repository under /etc/yum.repos.d. After i added, i have restarted the installation (wiped out the ambari repository and created a a new one) and it seems progressing. Finger crossed till the completion. Thanks
... View more
10-18-2016
07:29 AM
Hi Sagar, Thanks. See below screenshot and log. a) No error and seems not too much info in output*. b) One thing noticed the timestamp of last file (hostcheck.result) is few minutes back. Does this means it is still running, but just slow - about 1 hour 40min from 2nd last log? Regards, Allen log.txt
... View more
10-18-2016
07:02 AM
The installation has been hanging in last step - show 0% overall for long time. All previous steps were OK. - How to check what is exactly happening ? is there a logfile somewhere on server can check ? - How to do a restart/a fresh start ? i had tried to interrupt the installation (kill -9 back end, even restart the server). But when i restart the server, seems Ambari jump to the last step directly and continue hanging. - FYI, i am trying to setup a 4 nodes cluster under Redhat 7. HDP 2.3
... View more
Labels:
- Labels:
-
Apache Ambari