Member since
10-01-2016
13
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
226 | 10-21-2016 05:00 AM |
10-13-2017
06:07 AM
We are planning to setup HDF component and mainly use NIFI. We tend to follow some suggestion to have seperate cluster from existing HDP. But, it seems we will need a minimume 3 node for HDF also to have a valid zookeeper setup. So far, we don't expect heavy data flow yet and hesitate to make this investement. What is the best practise here for 'small' setup, for example: - Can HDF be setup on single server ? - Can HDF be managed under HDP together ? Regards, Allen
... View more
- Tags:
- hdf
Labels:
05-28-2017
07:31 AM
We are in an decision point to select the right approach to transform the data. Want to have your input ? Our case: We use Hive as main data lake store and all the data (so far) are structured data. Same as traditional data warehouse, we need to do transformation (lookup, aggregation, etc) on source tables to target tables. Now, need to decide which approach to go. Now I tend to go with coding approach (HiveQL, Spark) and build our own metadata. But the tools like Talend was also recommended by others. So want to hear some ideas here. Once driver behind the decision is that I want to build a team of high tech skills. I do have traditional ETL background (informatica, datastages, etc) and see the pros and cons. So don't want settle with " a large team of low-skill programmers supporting a single tool" and believe "today's big data developers are a bit more technical than their data warehousing counterparts. And so, they are even less enamored by clunky frameworks, less intimidated of writing a lot of code if necessary". Your thought ?
... View more
Labels:
04-19-2017
09:00 AM
I tried install to add Hive Client and Tze client into an host but failed. When i tried to re-install it, only found the two components have removed from available component list, and it seems there is no option to delete the existing one (unsuccessfully) either. Any suggestions on how to proceed from here to re-install these ? Thanks
... View more
Labels:
03-19-2017
02:34 PM
For cluster managed by Ambari, it seems Ambari will also be a single point of failure. Should we consider build HA for Ambari ? If yes, what is the common practice ? It seems there are many discussion on HA for names server, Hive, etc, but seems nothing for Ambari, also wondering why.
... View more
Labels:
11-12-2016
01:46 PM
Maybe a dummy question here. For a unix user on data node to performance hdfs operation, does the same user also need to be created on name node ? From below document, the answer seems to be Yes. The user/group has to be exist on name node as the mapping is "performed on the NameNode" However, from testing i did, it seems the user on data node can do hdfs command without the same user on Namenode. Did i miss something here ? Thanks Allen The document: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/GroupsMapping.html " For HDFS, the mapping of users to groups is performed on the NameNode. Thus, the host system configuration of the NameNode determines the group mappings for the users." The testing: - On data node. create a unix user (aniu), add it to hadoop group. This user does not exist on Namenode. - $ id
uid=1012(aniu) gid=1012(aniu) groups=1012(aniu),1001(hadoop) $ hdfs dfs -mkdir /tmp/aniu $ hdfs dfs -ls /tmp drwxr-xr-x - aniu hdfs 0 2016-11-12 08:22 /tmp/aniu
... View more
Labels:
10-21-2016
05:00 AM
This question can be ignored. After further reading, i understood this is expected. Ambari keeps the configuration somewhere else (most likely in its own repository and then extract and refresh the .xml during start up. Sound makes sense for the usage of repository).
... View more
10-20-2016
05:57 AM
1 Kudo
Just wondering whether Abmari does anything additional, when start HDFS/YARN, comparing to from command line directly. I noted below case while testing rack awareness which prompts this questions.
Thanks, Allen. I have 4 node cluster installed by Ambari
By default, it seems Ambari has already has rack awareness enabled.
core-site.xml: <name>net.topology.script.file.name</name>core-site.xml: <value>/etc/hadoop/conf/topology_script.py</value>
topology_script.py reads config data from topology_mappings.data topology_script.py:DATA_FILE_NAME = os.path.dirname(os.path.abspath(__file__)) + "/topology_mappings.data"
For testing purpose, I modified the rack info in topology_mappings.data a: when restart the HDFS and YARN manually, the change takes effect b. when restart by Ambari, the change is NOT take effect. And further, the topology_mappings.data content is over-written with 'default' value and all my change has gone. (Is Ambari collecting the rack info automatically somewhere ? )
... View more
Labels:
10-18-2016
12:05 PM
Yes. I did use the local repository. It seems i did not change the local repository under /etc/yum.repos.d. After i added, i have restarted the installation (wiped out the ambari repository and created a a new one) and it seems progressing. Finger crossed till the completion. Thanks
... View more
10-18-2016
07:29 AM
Hi Sagar, Thanks. See below screenshot and log. a) No error and seems not too much info in output*. b) One thing noticed the timestamp of last file (hostcheck.result) is few minutes back. Does this means it is still running, but just slow - about 1 hour 40min from 2nd last log? Regards, Allen log.txt
... View more
10-18-2016
07:02 AM
The installation has been hanging in last step - show 0% overall for long time. All previous steps were OK. - How to check what is exactly happening ? is there a logfile somewhere on server can check ? - How to do a restart/a fresh start ? i had tried to interrupt the installation (kill -9 back end, even restart the server). But when i restart the server, seems Ambari jump to the last step directly and continue hanging. - FYI, i am trying to setup a 4 nodes cluster under Redhat 7. HDP 2.3
... View more
Labels:
10-01-2016
03:34 AM
I got below error while start Hive CLI under root. FYI, Hive seems work fine under Ambari, in which i can run sql command. Thanks. [root@sandbox aniu]# hive cli
WARNING: Use "yarn jar" to launch YARN applications.
Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
Exception in thread "main" java.lang.RuntimeException: org.apache.tez.dag.api.TezException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1475289177250_0006 to YARN : Application application_1475289177250_0006 submitted by user root to unknown queue: default
... View more
- Tags:
- Data Processing
- Hive
Labels: