Member since
09-29-2015
286
Posts
601
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11478 | 03-21-2017 07:34 PM | |
2894 | 11-16-2016 04:18 AM | |
1619 | 10-18-2016 03:57 PM | |
4276 | 09-12-2016 03:36 PM | |
6240 | 08-25-2016 09:01 PM |
02-29-2016
09:41 PM
There is no explicit deny. You should set Hive Run as User to be False. For ranger all queries should run as Hive.
Then set you database access policy in Ranger and it will work.
See also https://community.hortonworks.com/articles/234/securing-hdp-23-with-apache-ranger.html
... View more
02-29-2016
03:02 PM
Additional white papers from EMC: http://www.criticism.com/white-papers/white-papers.php Latest EMC Best Practices January 2015 Version: https://www.emc.com/collateral/white-papers/h13926-wp-emc-isilon-hadoop-best-practices-onefs72.pdf
... View more
02-28-2016
12:31 AM
2 Kudos
@Junichi Oda see this also https://community.hortonworks.com/articles/19601/how-to-limit-the-size-of-ranger-log-and-number-of.html
... View more
02-19-2016
01:59 AM
2 Kudos
@Robin Dong There are many ways. Use the Teradata Connector: Download connector http://hortonworks.com/hdp/addons/ Documentation http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_HortonworksConnectorForTeradata/content/index.html You would need then to script to get data for 100 tables. There is a limit to how many you can run in parallel. Or use an ETL tool like Talend. SUPPORTING FILES: Copy the following files (attached) to the SQOOP library folder
/user/lib/sqoop/lib/hortonworks-teradata-connector-xxxxxx.jar
/user/lib/sqoop/lib/teradata-connector-xxxxhadoopxxxx.jar
/user/lib/sqoop/lib/terajdbc4.jar
/user/lib/sqoop/lib/tdgssconfig.jar
#Note this may already be installed in the TDH
#Place the JDBC Drivers in /usr/lib/sqoop/lib
#Set Classpath
export HIVE_HOME=/usr/lib/hive
export HADOOP_HOME=/usr/lib/hadoop
export SQOOP_HOME=/usr/lib/sqoop
export HADOOP_CLASSPATH=$(hcat -classpath)
export LIB_JARS=$(echo ${HADOOP_CLASSPATH} | sed -e 's/::*/,/g’)
# Hive Import:
sqoop —hive-import —hive-overwrite - -create-hive-table —hive-table <table-name> —null-string ‘\\N' —null-non-string ‘\\N'
#Define a Table based on one in a database:
sqoop create-hive-table --connect jdbc:mysql://db.example.com/corp \
--table employees --hive-table emps
#Other Examples
sqoop import -libjars ${LIB_JARS} -Dteradata.db.input.target.table.schema="cust_id int, acct_type string, acct_nbr string, acct_start_date date, acct_end_date date" -Dteradata.db.input.file.format=orcfile --connect jdbc:teradata://<teradata host ip address>/Database=financial --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
sqoop import —connect jdbc:teradata://192.168.1.13/Database=retail --connection-manager org.apache.sqoop.teradata.TeradataConnManager --username dbc --password dbc --table accts --hive-import --hive-table financial.accts
... View more
02-19-2016
01:53 AM
1 Kudo
@Robin Dong has anyone been in contact with you from HW? From the myriad of questions it does seem you need some assistance in your deployment. While it is admirable you are doing so much on your own, and we want to continue to provide assistance in this forum, perhaps the most efficient way we can provide assistance is to understand your use case offline and see how we can support you.
... View more
02-18-2016
05:42 AM
Use SyncSort http://www.syncsort.com/getattachment/989f1bac-4cda-4e70-bd97-41b4ff72fffc/Syncsort-Mainframe-to-Hadoop.aspx
... View more
02-17-2016
06:39 AM
if you just want to execute it, select is only needed. Remember UDFs are used in DDL and DML in a Hive Statement.
So Select means you can use a UDF in a Select Statement in SQL.
... View more
02-17-2016
06:26 AM
2 Kudos
This is Hive UDF Allow in Sandbox. And no you do not have to set Delegate Admin.
... View more
02-13-2016
04:50 PM
57 Kudos
This article is for those who want a cheat sheet for a smooth installation of HDP in a Dev, or Test with one or more of the following requirements:
Place all the log data into a different directory, not /var/log
All your service user names must be prefixed with the cluster name. The requirement is that these users must be centrally managed by AD or an LDAP.
You do not have any local users in the Hadoop cluster, including Hadoop service users. This becomes important if you wish to have Centrify deployed also, or if you would be deploying multiple clusters with a single LDAP/ AD integration. Once again, these service names should have a cluster-prefix.
You want to set appropriate YARN, Tez and MapReduce, Amabri Metrics Memory Parameters during Install.
Side Note: It is always prudent to get Professional Service assistance to either install or configure your production deployment, to make sure all the per-requisites, unique to your environment are covered and met.
--------------------------------------------------------------------------------------------------------
Step 1: Do Your Research..... Plan, Plan, Plan, Do it Right the First time, or Risk Doing it Over, and Over Again
This article is not intended to replace the Hortonworks docs or all the excellent resources here in HCC or elsewhere.
Apart from the Hortonworks docs, review:
Hortonworks Operational Best Practices Webcast and Slides
Typical Hadoop Cluster Networking Practices
Best Practice Linux File System for Hadoop and ext4 vs. XFS
Yarn Directories Recommended Size and Disk.
Best Practice Zookeeper Placement
Best Practice for Storm and Kafka Deployment and Unofficial Storm and Kafka Best practices Guide
Name Node Garbage Collection Best Practice
Tools to test the Performance, Scale and Reliability of Your Cluster
--------------------------------------------------------------------------------------------------------
Step 2: Get your Disk partitions Right
See the following for some guidance. Take note of the hadoop properties and default locations. You need to have this done ahead of time.
Disk Partition Baseline
--------------------------------------------------------------------------------------------------------
Name Nodes Disk Partitioning
--------------------------------------------------------------------------------------------------------
Data Nodes Disk Partition
--------------------------------------------------------------------------------------------------------
Ambari/ Edge/ Ranger/ Knox Nodes Disk Partition
--------------------------------------------------------------------------------------------------------
Storm and Kafka Nodes Disk Partition
--------------------------------------------------------------------------------------------------------
Step 3: Don't Scrimp on Master Nodes. Know the Placement of Your Master Services
If you want to do yourself an injustice, just allocate one or two master nodes.
If you want to do things properly, and you want to be set for up to 50 nodes, then please have at least 3 master nodes, better 4, if you doing HA, with at least 1 Edge and 1 Admin/ Ambari Server.
It is a PAIN and some effort involved to move master services if you don't get it right.
Figure out where you placing your Master Services. Use the following as a Guide:
--------------------------------------------------------------------------------------------------------
Step 4: Get a Dedicated Database Server with HA for Ambari, Hive, Metastore, Oozie, Ranger
Oozie by default installs on Derby. You do not want Derby in your cluster.
Ambari by default installs on Postgres. You can decide to keep it there.
Hive's metastore uses MySQL. You can use a dedicated MySQL Database for Hive, Ranger Admin, and Oozie. Bear in mind though that if you restart Hive's metastore, it may affect Ranger and Oozie.
The instructions for setting up the databases before an Ambari install is located at Using Non Default Databases
--------------------------------------------------------------------------------------------------------
Step 5: Create Service Accounts Beforehand in your LDAP
Decide what you rcluster prefix would be. Do not put an underscore "_" or a hyhen "-" in your prefix.
The list of service accounts you need to create are located here.
Solr is missing from the list. You need this user if you want to install Ranger, for Ranger uses Solr from HDP 2.3 and above for auditing and to show audit events in the UI.
Create a solr user with default group solr, with membership in the hadoop group also.
IMPORTANT: On each node, get the AD or LDAP UID for hdfs, and group hadoop; edit the /etc/passwd and /etc/groups and add the users there with the CORRECT UID fom AD or Ldap. I have found that even though you choose the option to
Skip Group Modifications to not modify the Linux groups in the cluster, and you tell Ambari to do not Manage HDFS, some of the yum installs still tries to create the, Ambari would respect your wishes but not yum.
Make sure the entries in your /etc/passwd and /etc/groups have your cluster prefix.
When you install through Ambari it is very important that you config the right properties so that Ambari is aware of your centrally managed cluster-prefixed service names:
Set Skip Group Modification
Tell Ambari DO not Manage HDFS
Follow the instructions at
Setting properties that depend on service usernamesgroups
There is one property missing from the doc.
Also set HDFS User to your <cluster-prefix>-hdfs also in Advanced hadoop-env.
--------------------------------------------------------------------------------------------------------
Step 6: Use Hortonworks Handy Scripts to Automatically Prepare the Environment Across all Nodes
So you have your disk partitions, your network is setup, you have decided on your master services placement, you have created the service names in LDAP with a cluster prefix, you have edited your /etc/passwd and /etc/groups.
Here comes the fun part.
Go to your Ambari node and perform the following:
# Install Hortonworks Public Tools
> yum install wget
> wget –qO- --no-check-certificate https://github.com/hortonworks/HDP-Public-Utilities/raw/master/Installation/install_tools.sh | bash
>./install.sh
>cd hdp
#Everything will be installed to /root/hdp; create the /root/hdp/Hostdetail.txt file with all the hostnames for your cluster.
# Hostname –f > /root/hdp/Hostdetail.txt
vi /root/hdp/Hostdetail.txt
#To set up Password-less SSH
> ssh-keygen
>chmod 700 ~/.ssh
>chmod 600 ~/.ssh/id_rsa
# Distribute the keys to other nodes. The copy command is needed because the ./distribute_ssh_keys.sh script thinks the private key is at /tmp/ec2_keypair. Else if you set up your nodes with a root passwrd, when prompted by the script, just enter it.
> cp <your nodes private key> /tmp/ec2_keypair
> ./distribute_ssh_keys.sh ~/.ssh/id_rsa.pub
#Optional: Copy the private key to all nodes if you want password less ssh from any node to any node. Don't do this, if you only want password-less ssh ONLY from the Ambari Node. Password-less ssh is only needed for Ambari to install Agents on all nodes, else without it you need to install the Agents and configure them yourself.
>./copy_file ~/.ssh/id_rsa ~/.ssh/id_rsa
# Test passwordless SSH
> ssh <node>
#Now run a script to set all the OS pre-requisites for a cluster install. You may have to edit ./run_command.sh and add to the ssh command, ssh -tty, since the ./hdp_preinstall.sh script has sudo commands in it.
> ./run_command.sh 'mkdir /root/hdp'
> ./copy_file.sh /root/hdp/hdp_preinstall.sh /root/hdp/hdp_preinstall.sh
> vi run_command.sh (add "-tty" to the ssh call)
# Now in one swoop set the OS parameters
> ./run_command.sh './root/hdp/hdp_preinstall.sh'
REBOOT ALL NODES
#DOUBLE CHECK That all the Nodes retain all the OS Environment Configuration Changes for HDP Install
> ./pre_install_check.sh | tee report.txt
#View the report. Ignore the Repo warnings for Ambari and HDP, if you are connected to internet and you will pull the repos from there duing install.
> vi report.txt
# Now get your YARN Parameters to use when you install the cluster via Ambari
# Download Hortonworks Companion files
> wget http://public-repo-1.hortonworks.com/HDP/tools/2.3.4.0/hdp_manual_install_rpm_helper_files-2.3.4.0.3485.tar.gz
> tar -zxvf hdp_manual_install_rpm_helper_files-2.3.4.0.3485.tar.gz
> cd hortonworks-HDP-Public-Utilities-d617f44
# Now run the Script to determine your memory parameters that you would set in Ambari during the Customize Services Step. Put your Number of Cores (c), Memory per Node (m), Disks per Node for HDFS (d) and Whether HBase will be installed or not (-k) into the python call
>python yarn-utils.py -c 16 -m 64 -d 4 -k True
See Determine YARN and HDP memory
Make a note of these memory settings to to plug in during Ambari Install.
--------------------------------------------------------------------------------------------------------
Step 7: Installing Ambari
Now you start install Ambari and HDP from the doc at
http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_Installing_HDP_AMB/content/_using_a_local_repository.html
Don't forget about setting your cluster-prefixed service name for hdfs and hbase
Don't choose a cluster name that has an underscore (_) because HDFS HA does not like it.
Don't forget to change the locations as per the Disk Partition diagrams above of all
You can change the directory for Hadoop logs upon install if you wish. See https://community.hortonworks.com/questions/4329/log-file-location-is-there-a-way-to-change-varlog.html
Don't forget to set the YARN and MapReduce Memory Parameters found from the python script.
Don't Forget to set the name Node Garbage Collection.
You can do the following to get Ambari running better during install: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/ch_tuning_ambari_performance.html
During Install you can configure Ambari Metrics: See https://cwiki.apache.org/confluence/display/AMBARI/Configurations+-+Tuning and http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/_ams_general_guidelines.html
You can follow this to tune Tez During the Install. See https://community.hortonworks.com/articles/14309/demystify-tez-tuning-step-by-step.html
IMPORTANT: For less that 10 Data Nodes
Set mapred.submit.replication =3 in mapred-site.xml
This is to prevent the job related staging files to be created with default replication factor of 10, which would lead to under-replicated block warnings.
--------------------------------------------------------------------------------------------------------
Step 8 Install SmartSense, only offered by Hortonworks.
Finally INSTALL SMARTSENSE, if you are a Hortonworks Customer. If you are not, why NOT? You are missing all the value from SmartSense to auto tune your cluster. (In Ambari 2.2 it is available as a Service.)
--------------------------------------------------------------------------------------------------------
Step 9 Security Tips
If you plan to install Ranger, INSTALL SOLR FIRST. Don't Add the Ranger Service as yet after you install the cluster.
Make sure that you use the <cluster-prefix>-solr user in your install, so that the proces runs under that user
Enable Kerberos if you can BEFORE adding Ranger. If not, that is fine, you would have to configure Ranger and all the plug ins after the fact, but it is easier if you enable Kerberos first.
Storm, Kafka, Solr Needs Kerberos before you authorize with Ranger
There is no Security without Kerberos.
--------------------------------------------------------------------------------------------------------
Finally
Most issues are due to a rouge process running having a local uid and not the LDAP, AD UID, so double check using ps -ef. If you set up your /etc/psswd and /etc/group properly before hand, you should not have this issue.
Some issues come up if your files and/ or logs are owned by the local hdfs user. Again if you did not choose the 'Skip Group Modification' option, and told Ambari to not manage HDFS, or set the hdfs user properly during install to the <cluster-prefix>-hdfs, or setup your /etc/psswd and /etc/group you would get this problem.
Remember some yum installs do not care what you set in Ambari for the hdfs user, so you may have to run those manually, so look out for that.
--------------------------------------------------------------------------------------------------------
Update:
A good resource:
https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=45580306
https://community.hortonworks.com/questions/21405/where-to-write-fsimage-files-when-running-qjm-nn-h.html
... View more
Labels:
02-13-2016
06:12 AM
1 Kudo
@Gerd Koenig You should be able to accept your own answer now
... View more