About RyanCicak

aervits · ‎07-28-2016

I recommend you try it in dev or on virtual environment. Did you use Ubuntu 14.04 to install HDP? Probably not. Do one machine at a time.

RyanCicak · ‎06-29-2016

Security is a key element when discussing Big Data. A common requirement with security is data encryption. By following the instructions below, you'll be able to setup transparent data encryption in HDFS on defined directories otherwise known as encryption zones "EZ". Before starting this step-by-step tutorial, there are three HDP services that are essential (must be installed): 1) HDFS 2) Ranger 3) Ranger KMS Step 1: Prepare environment As explained in the HDFS "Data at Rest" Encryption manual a) If using Oracle JDK, verify JCE is installed (OpenJDK has JCE installed by default) If the server running Ranger KMS is using Oracle JDK, you must install JCE (necessary for Ranger KMS to run) instructions on installing JCE can be found here b) CPU Support for AES-NI optimization AES-NI optimization requires an extended CPU instruction set for AES hardware acceleration. There are several ways to check for this; for example: cat /proc/cpuinfo | grep aes Look for output with flags and 'aes'. c) Library Support for AES-NI optimization You will need a version of the libcrypto.so library that supports hardware acceleration, such as OpenSSL 1.0.1e. (Many OS versions have an older version of the library that does not support AES-NI.) A version of the libcrypto.so libary with AES-NI support must be installed on HDFS cluster nodes and MapReduce client hosts -- that is, any host from which you issue HDFS or MapReduce requests. The following instructions describe how to install and configure the libcrypto.so library. RHEL/CentOS 6.5 or later On HDP cluster nodes, the installed version of libcrypto.so supports AES-NI, but you will need to make sure that the symbolic link exists: sudo ln -s /usr/lib64/libcrypto.so.1.0.1e /usr/lib64/libcrypto.so On MapReduce client hosts, install the openssl-devel package: sudo yum install openssl-devel d) Verify AES-NI support To verify that a client host is ready to use the AES-NI instruction set optimization for HDFS encryption, use the following command: hadoop checknative You should see a response similar to the following: 15/08/12 13:48:39 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native 14/12/12 13:48:39 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library Native library checking: hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0 zlib: true /lib64/libz.so.1 snappy: true /usr/lib64/libsnappy.so.1 lz4: true revision:99 bzip2: true /lib64/libbz2.so.1 openssl: true /usr/lib64/libcrypto.so Step 2: Create an Encryption key This step will outline how to create an encryption key using Ranger. a) Login to Ranger http://RANGER_FQDN_ADDR:6080/ * To access Ranger KMS (Encryption) - login using the username "keyadmin", the default password is "keyadmin" - remember to change this password b) Choose Encryption > Key Manager * In this tutorial, "hdptutorial" is the name of the HDP cluster. Your name will be different, depending on your cluster name. c) Choose Select Service > yourclustername_kms d) Choose "Add New Key" e) Create the new key Length - either 128 or 256 * Length of 256 requires JCE installed on all hosts in the cluster"The default key size is 128 bits. The optional -size parameter supports 256-bit keys, and requires the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all hosts in the cluster. For installation information, see the Ambari Security Guide." Step 3: Add KMS Ranger Policies for encrypting directory a) Login to Ranger http://RANGER_FQDN_ADDR:6080/ * To access Ranger KMS (Encryption) - login using the username "keyadmin", the default password is "keyadmin" - remember to change this password b) Choose Access Manager > Resource Based Policies c) Choose Add New Policy d) Create a policy - the user hdfs must be added to GET_METDATA and GENERATE_EEK -> using any user calls the user hdfs in the background - the user "nicole" is a custom user I created to be able to read/write data using the key "yourkeyname" Step 4: Create an Encryption Zone a) Create a new directory hdfs dfs -mkdir /zone_encr * Leave the directory empty until the directory has been encrypted (recommend using a superuser to create the directory) b) Create an encryption zone hdfs crypto -createZone -keyName yourkeyname -path /zone_encr * Using the user "nicole" above to create the encryption zone c) Validate the encryption zone exists hdfs crypto -listZones * must be a superuser to call this command (or part of a superuser group like hdfs) The command should output: [nicole@hdptutorial01 security]$ hdfs crypto -listZones /zone_encr yourkeyname * You will now be able to read/write data to your encrypted directory /zone_encr. If you receive any errors - including "IOException:" when creating an encryption zone in Step 4 (b) take a look at your Ranger KMS server /var/log/ranger/kms/kms.log -> there usually is a permission issue accessing the key * To find out more about how transparent data encryption in HDFS works, refer to the Hortonworks blog here Tested in HDP: 2.4.2

ahadjidj · ‎06-17-2016

Hi @Ryan Cicak The best practice is to configure Ranger audits to both Solr and HDFS. HDFS is used for long term audit storage so you won't want to delete audit data. Solr should be used for short term storage. By using Solr you have data indexed and you can query it quickly from Ranger UI. I am not aware of any setting or property in Ranger to set a TTL and automatically delete data. You may leverage Solr TTL feature to purge data (link) or schedule a job to issue a delete query periodically.

RyanCicak · ‎06-10-2016

A remote Linux system can use NFS (Network File System) to mount an HDFS file system and interact with the file system. Before proceeding, it's important to understand that your linux instance is directly accessing your HDFS system through the network, therefore you will incur network latency. Depending on your dataset size, you have to remember you could be potentially processing gigabytes or more of data on a single machine therefore this is not the best approach for large datasets. These steps will show you how to mount and interact with a remote HDFS node within your Linux system: 1) The linux system must have NFS installed (CentOS for demo) yum install nfs-utils nfs-utils-lib 2) Your HDP cluster must have an NFS Gateway installed (Ambari allows this option with one click) * Keep track of either the FQDN or IP address of the NFSGateway 3) In Ambari, under HDFS > Advanced > General set Access time precision = 3600000 3) Mount the NFS Gateway on your linux system (must be root) mount -t nfs -o vers=3,proto=tcp,nolock myipaddressorfqdnofnfsgateway:/ /opt/remotedirectory 4) On both your HDFS node & remote Linux system add the same user with the same uid (making sure neither already exist) useradd -u 1234 testuser * If your user/uid doesn't match between HDFS node and your remote Linux system - whatever uid you are logged in as on your remote Linux system will be passed and interpreted by the NFS Gateway. For example if your Linux system has usertest (uid = 501) and you write a file to HDFS's /tmp, the file owner of the file will be whichever user on the HDFS node matches uid=501 - therefore it is good practice to match both the username and the uid across both systems. 5) On your remote Linux system, login as your "testuser" and go-to your mounted NFS directory cd /opt/remotedirectory You will now be able to interact with HDFS with native linux command such as cp, less, more, etc:.

pminovic · ‎06-25-2016

As mentioned in https://community.hortonworks.com/questions/37192/error-no-package-python27-available-while-setting.html, the tutorial has been corrected.

rhryniewicz · ‎06-12-2016

Updated tutorial: 1) using centos-release-scl 2) wget https://bootstrap.pypa.io/ez_setup.py Thanks!

Former Member · ‎06-07-2016

If you want to see dates and update history for tutorials then I would suggest looking at the source in github. https://github.com/hortonworks/tutorials Tutorials are updated on Sandbox update release schedule which tend to correspond to major HDP releases. Here you can see the latest version of the HDP tutorials for HDP 2.4: https://github.com/hortonworks/tutorials/tree/hdp/tutorials/hortonworks For example, I believe earlier you had a question on Ipython Notebook with Spar tutorial and here you can see the history of updates for this tutorial: https://github.com/hortonworks/tutorials/commits/hdp/tutorials/hortonworks/ipython-notebook-with-spark/tutorial.md The tutorials list what is the prerequisite to do the tutorial. If you want to learn more about the tutorials or make a contribution then at the bottom of each tutorial there is paragraph that talks about the github repo and contribution guide. I am happy to chat with you to see how we can make this template more descriptive.

ravi1 · ‎06-01-2016

Lockout is not at DB level here since we are not authenticating with DB username/password but ambari username/password. So, I don't think there will be a way to lockout at DB level. It has to be implemented at ambari application level, and as @jeff pointed out, can be an enhancement

jeff1 · ‎05-31-2016

The design was to leave space to handle more than one cluster at a time in a single Ambari Server instance. Maybe Ambari could have a convenient resource /api/v1/cluster (like you mention) that will convert to the [0] cluster automatically? If that makes sense, maybe file that JIRA in Ambari project?

dreamcoding · ‎06-28-2016

@Ryan Cicak Hi,this is the demo that help me well. But when I execeted the command: insert into brancha(full_name,ssn,location) values ('ryan', '111-222-333', 'chicago'); It report error like this: java.io.FileNotFoundException: File does not exist: hdfs://localhost:9000/usr/local/data-governance/apache-atlas-0.7-incubating-SNAPSHOT/hook/hive/atlas-client-0.7-incubating-SNAPSHOT.ja The detail of this issue is posted on my another thread: https://community.hortonworks.com/questions/41898/using-hive-hook-file-does-not-exist-atlas-client-0.html Please check. I hope you can help me. Thank you very much.

Online	Offline
Last Visited	‎10-30-2024 03:39 PM

Member Since	‎06-05-2019 07:09 AM
Last Visited	‎10-30-2024 03:39 PM
Posts	128
Kudos received	128

Cloudera Community

Re: HDP 2.5.3 spark-submit sparkSQL, not able to i...

Re: Apache PIG - Script per Table to data cleansin...

Re: Is it possible to convert a text file format t...

Re: Hadoop archive job unsuccessful

Re: GetKafka not getting messages in Apache Nifi

Re: Ubuntu 14.04 upgrade without HDP upgrade

Using Transparent Data Encryption in HDFS (Non-Ke...

Re: Ranger Audit Retention

Interacting with HDFS using native linux commands ...

Re: Tutorial-380 has one mistake

Re: Error (No package python27 available.) while s...

Re: Hortonworks Tutorials - Created/Modified Dates

Re: Ambari lockout on default authentication

Re: Ambari API Question

Re: Using Apache Atlas to view Data Lineage