Member since 
    
	
		
		
		09-29-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                286
            
            
                Posts
            
        
                601
            
            
                Kudos Received
            
        
                60
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 12846 | 03-21-2017 07:34 PM | |
| 3766 | 11-16-2016 04:18 AM | |
| 2142 | 10-18-2016 03:57 PM | |
| 5100 | 09-12-2016 03:36 PM | |
| 8448 | 08-25-2016 09:01 PM | 
			
    
	
		
		
		02-29-2016
	
		
		09:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 There is no explicit deny.  You should set Hive Run as User to be False. For ranger all queries should run as Hive.
Then set you database access policy in Ranger and it will work.
See also   https://community.hortonworks.com/articles/234/securing-hdp-23-with-apache-ranger.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-29-2016
	
		
		03:02 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Additional white papers from EMC:   http://www.criticism.com/white-papers/white-papers.php  Latest EMC Best Practices January 2015 Version:   https://www.emc.com/collateral/white-papers/h13926-wp-emc-isilon-hadoop-best-practices-onefs72.pdf 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-28-2016
	
		
		12:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Junichi Oda see this also    https://community.hortonworks.com/articles/19601/how-to-limit-the-size-of-ranger-log-and-number-of.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-19-2016
	
		
		01:59 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 @Robin Dong    There are many ways.  Use the Teradata Connector: Download connector http://hortonworks.com/hdp/addons/  Documentation   http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HortonworksConnectorForTeradata/content/index.html  You would need then to script to get data for 100 tables.  There is a limit to how many you can run in parallel.  Or use an ETL tool like Talend.  SUPPORTING FILES:  Copy the following files (attached) to the SQOOP library folder
/user/lib/sqoop/lib/hortonworks-teradata-connector-xxxxxx.jar
/user/lib/sqoop/lib/teradata-connector-xxxxhadoopxxxx.jar
/user/lib/sqoop/lib/terajdbc4.jar
/user/lib/sqoop/lib/tdgssconfig.jar
#Note this may already be installed in the TDH
#Place the JDBC Drivers in /usr/lib/sqoop/lib
#Set Classpath
export HIVE_HOME=/usr/lib/hive
export HADOOP_HOME=/usr/lib/hadoop
export SQOOP_HOME=/usr/lib/sqoop
export HADOOP_CLASSPATH=$(hcat -classpath)
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)
# Hive Import:
sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'
#Define a Table based on one in a database:
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \ 
  --table employees --hive-table emps
#Other Examples
sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
sqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
    
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-19-2016
	
		
		01:53 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Robin Dong has anyone been in contact with you from HW?  From the myriad of questions it does seem you need some assistance in your deployment.  While it is admirable you are doing so much on your own, and we want to continue to provide assistance in this forum, perhaps the most efficient way we can provide assistance is to understand your use case offline and see how we can support you. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-18-2016
	
		
		05:42 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Use SyncSort   http://www.syncsort.com/getattachment/989f1bac-4cda-4e70-bd97-41b4ff72fffc/Syncsort-Mainframe-to-Hadoop.aspx 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-17-2016
	
		
		06:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 if you just want to execute it, select is only needed.  Remember UDFs are used in DDL and DML in a Hive Statement.
So Select means you can use a UDF in a Select Statement in SQL. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-17-2016
	
		
		06:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 This is Hive UDF Allow in Sandbox.  And no you do not have to set Delegate Admin.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-13-2016
	
		
		04:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		57 Kudos
		
	
				
		
	
		
					
							 This article is for those who want a cheat sheet for a smooth installation of HDP in a Dev, or Test with one or more of the following requirements: 
 
 Place all the log data into a different directory, not /var/log 
 All your service user names must be prefixed with the cluster name. The requirement is that these users must be centrally managed by AD or an LDAP. 
 You do not have any local users in the Hadoop cluster, including Hadoop service users. This becomes important if you wish to have Centrify deployed also, or if you would be deploying multiple clusters with a single LDAP/ AD integration. Once again, these service names should have a cluster-prefix. 
 You want to set appropriate YARN, Tez and MapReduce, Amabri Metrics Memory Parameters during Install. 
 
 Side Note: It is always prudent to get Professional Service assistance to either install or configure your production deployment, to make sure all the per-requisites, unique to your environment are covered and met. 
 -------------------------------------------------------------------------------------------------------- 
  Step 1: Do Your Research..... Plan, Plan, Plan, Do it Right the First time, or Risk Doing it Over, and Over Again 
 This article is not intended to replace the Hortonworks docs or all the excellent resources here in HCC or elsewhere. 
 Apart from the Hortonworks docs, review: 
 
 Hortonworks Operational Best Practices Webcast and Slides 
 Typical Hadoop Cluster Networking Practices 
 Best Practice Linux File System for Hadoop and ext4 vs. XFS 
 Yarn Directories Recommended Size and Disk. 
 Best Practice Zookeeper Placement 
 Best Practice for Storm and Kafka Deployment and Unofficial Storm and Kafka Best practices Guide 
 Name Node Garbage Collection Best Practice 
 Tools to test the Performance, Scale and Reliability of Your Cluster 
 
 -------------------------------------------------------------------------------------------------------- 
 Step 2: Get your Disk partitions Right 
 See the following for some guidance. Take note of the hadoop properties and default locations. You need to have this done ahead of time. 
   
 Disk Partition Baseline 
    
 -------------------------------------------------------------------------------------------------------- 
   
 Name Nodes Disk Partitioning 
    
   
 -------------------------------------------------------------------------------------------------------- 
   
 Data Nodes Disk Partition 
    
   
 -------------------------------------------------------------------------------------------------------- 
   
 Ambari/ Edge/ Ranger/ Knox Nodes Disk Partition 
    
   
 -------------------------------------------------------------------------------------------------------- 
   
 Storm and Kafka Nodes Disk Partition 
    
    
 -------------------------------------------------------------------------------------------------------- 
   
   
 Step 3: Don't Scrimp on Master Nodes. Know the Placement of Your Master Services 
 If you want to do yourself an injustice, just allocate one or two master nodes. 
 If you want to do things properly, and you want to be set for up to 50 nodes, then please have at least 3 master nodes, better 4, if you doing HA, with at least 1 Edge and 1 Admin/ Ambari Server. 
 It is a PAIN and some effort involved to move master services if you don't get it right. 
 Figure out where you placing your Master Services. Use the following as a Guide: 
    
    
 -------------------------------------------------------------------------------------------------------- 
 Step 4: Get a Dedicated Database Server with HA for Ambari, Hive, Metastore, Oozie, Ranger  
   
 Oozie by default installs on Derby. You do not want Derby in your cluster. 
 Ambari by default installs on Postgres. You can decide to keep it there. 
 Hive's metastore uses MySQL. You can use a dedicated MySQL Database for Hive, Ranger Admin, and Oozie. Bear in mind though that if you restart Hive's metastore, it may affect Ranger and Oozie. 
 The instructions for setting up the databases before an Ambari install is located at Using Non Default Databases 
 -------------------------------------------------------------------------------------------------------- 
 Step 5: Create Service Accounts Beforehand in your LDAP 
 Decide what you rcluster prefix would be. Do not put an underscore "_" or a hyhen "-" in your prefix. 
 The list of service accounts you need to create are located here. 
 Solr is missing from the list. You need this user if you want to install Ranger, for Ranger uses Solr from HDP 2.3 and above for auditing and to show audit events in the UI. 
 Create a solr user with default group solr, with membership in the hadoop group also. 
 IMPORTANT: On each node, get the AD or LDAP UID for hdfs, and group hadoop; edit the /etc/passwd and /etc/groups and add the users there with the CORRECT UID fom AD or Ldap. I have found that even though you choose the option to 
  Skip Group Modifications  to not modify the Linux groups in the cluster, and you tell Ambari to do not Manage HDFS, some of the yum installs still tries to create the, Ambari would respect your wishes but not yum. 
 Make sure the entries in your /etc/passwd and /etc/groups have your cluster prefix. 
 When you install through Ambari it is very important that you config the right properties so that Ambari is aware of your centrally managed cluster-prefixed service names: 
 Set Skip Group Modification
Tell Ambari DO not Manage  HDFS
 
 Follow the instructions at 
 Setting properties that depend on service usernamesgroups 
 There is one property missing from the doc. 
 Also set HDFS User to your <cluster-prefix>-hdfs also in Advanced hadoop-env. 
    
 -------------------------------------------------------------------------------------------------------- 
 Step 6: Use Hortonworks Handy Scripts to Automatically Prepare the Environment Across all Nodes  
 So you have your disk partitions, your network is setup, you have decided on your master services placement, you have created the service names in LDAP with a cluster prefix, you have edited your /etc/passwd and /etc/groups. 
 Here comes the fun part. 
 Go to your Ambari node and perform the following: 
 # Install Hortonworks Public Tools
> yum install wget
> wget –qO- --no-check-certificate https://github.com/hortonworks/HDP-Public-Utilities/raw/master/Installation/install_tools.sh | bash
>./install.sh
>cd hdp
#Everything will be installed to /root/hdp; create the /root/hdp/Hostdetail.txt file with all the hostnames for your cluster.
# Hostname –f > /root/hdp/Hostdetail.txt
vi /root/hdp/Hostdetail.txt
#To set up Password-less SSH
> ssh-keygen
>chmod 700 ~/.ssh
>chmod 600 ~/.ssh/id_rsa
# Distribute the keys to other nodes.  The copy command is needed because the ./distribute_ssh_keys.sh script thinks the private key is at /tmp/ec2_keypair.  Else if you set up your nodes with a root passwrd, when prompted by the script, just enter it.
> cp <your nodes private key> /tmp/ec2_keypair
> ./distribute_ssh_keys.sh ~/.ssh/id_rsa.pub  
#Optional: Copy the private key to all nodes if you want password less ssh from any node to any node.  Don't do this, if you only want password-less ssh ONLY from the Ambari Node.  Password-less ssh is only needed for Ambari to install Agents on all nodes, else without it you need to install the Agents and configure them yourself.
>./copy_file ~/.ssh/id_rsa ~/.ssh/id_rsa
# Test passwordless SSH
> ssh <node>
#Now run a script to set all the OS pre-requisites for a cluster install.  You may have to edit ./run_command.sh and add to the ssh command, ssh -tty, since the ./hdp_preinstall.sh script has sudo commands in it.
> ./run_command.sh 'mkdir /root/hdp'
> ./copy_file.sh /root/hdp/hdp_preinstall.sh /root/hdp/hdp_preinstall.sh
> vi run_command.sh (add "-tty" to the ssh call)
# Now in one swoop set the OS parameters
> ./run_command.sh './root/hdp/hdp_preinstall.sh'
REBOOT ALL NODES
#DOUBLE CHECK That all the Nodes retain all the OS Environment Configuration Changes for HDP Install
> ./pre_install_check.sh | tee report.txt
#View the report.  Ignore the Repo warnings for Ambari and HDP, if you are connected to internet and you will pull the repos from there duing install.
> vi report.txt
# Now get your YARN Parameters to use when you install the cluster via Ambari
# Download Hortonworks Companion files
> wget http://public-repo-1.hortonworks.com/HDP/tools/2.3.4.0/hdp_manual_install_rpm_helper_files-2.3.4.0.3485.tar.gz
> tar -zxvf hdp_manual_install_rpm_helper_files-2.3.4.0.3485.tar.gz
> cd  hortonworks-HDP-Public-Utilities-d617f44
# Now run the Script to determine your memory parameters that you would set in Ambari during the Customize Services Step.  Put your Number of Cores (c), Memory per Node (m), Disks per Node for HDFS (d) and Whether HBase will be installed or not (-k) into the python call
>python yarn-utils.py -c 16 -m 64 -d 4 -k True
 
 See Determine YARN and HDP memory 
 Make a note of these memory settings to to plug in during Ambari Install. 
 -------------------------------------------------------------------------------------------------------- 
 Step 7: Installing Ambari 
   
 Now you start install Ambari and HDP from the doc at 
 http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Installing_HDP_AMB/content/_using_a_local_repository.html 
 
 Don't forget about setting your cluster-prefixed service name for hdfs and hbase 
 Don't choose a cluster name that has an underscore (_) because HDFS HA does not like it. 
 Don't forget to change the locations as per the Disk Partition diagrams above of all 
 You can change the directory for Hadoop logs upon install if you wish. See https://community.hortonworks.com/questions/4329/log-file-location-is-there-a-way-to-change-varlog.html 
 Don't forget to set the YARN and MapReduce Memory Parameters found from the python script. 
 Don't Forget to set the name Node Garbage Collection. 
 You can do the following to get Ambari running better during install: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/ch_tuning_ambari_performance.html 
 During Install you can configure Ambari Metrics: See https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning and http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html 
 You can follow this to tune Tez During the Install. See https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html 
 IMPORTANT: For less that 10 Data Nodes 
 
 Set mapred.submit.replication =3 in mapred-site.xml 
 This is to prevent the job related staging files to be created with default replication factor of 10, which would lead to under-replicated block warnings. 
 -------------------------------------------------------------------------------------------------------- 
 Step 8 Install SmartSense, only offered by Hortonworks. 
 Finally INSTALL SMARTSENSE, if you are a Hortonworks Customer. If you are not, why NOT? You are missing all the value from SmartSense to auto tune your cluster. (In Ambari 2.2 it is available as a Service.) 
 -------------------------------------------------------------------------------------------------------- 
 Step 9 Security Tips 
 
 If you plan to install Ranger, INSTALL SOLR FIRST. Don't Add the Ranger Service as yet after you install the cluster. 
 Make sure that you use the <cluster-prefix>-solr user in your install, so that the proces runs under that user 
 Enable Kerberos if you can BEFORE adding Ranger. If not, that is fine, you would have to configure Ranger and all the plug ins after the fact, but it is easier if you enable Kerberos first. 
 Storm, Kafka, Solr Needs Kerberos before you authorize with Ranger 
 There is no Security without Kerberos. 
 
 -------------------------------------------------------------------------------------------------------- 
 Finally 
 Most issues are due to a rouge process running having a local uid and not the LDAP, AD UID, so double check using ps -ef. If you set up your /etc/psswd and /etc/group properly before hand, you should not have this issue. 
 Some issues come up if your files and/ or logs are owned by the local hdfs user. Again if you did not choose the 'Skip Group Modification' option, and told Ambari to not manage HDFS, or set the hdfs user properly during install to the <cluster-prefix>-hdfs, or setup your /etc/psswd and /etc/group you would get this problem. 
 Remember some yum installs do not care what you set in Ambari for the hdfs user, so you may have to run those manually, so look out for that. 
 -------------------------------------------------------------------------------------------------------- 
 Update: 
 A good resource: 
 https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=45580306 
 https://community.hortonworks.com/questions/21405/where-to-write-fsimage-files-when-running-qjm-nn-h.html 
   
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		02-13-2016
	
		
		06:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							@Gerd Koenig You should be able to accept your own answer now 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













