Member since
06-05-2019
117
Posts
127
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
361 | 12-17-2016 08:30 PM | |
271 | 08-08-2016 07:20 PM | |
618 | 08-08-2016 03:13 PM | |
480 | 08-04-2016 02:49 PM | |
547 | 08-03-2016 06:29 PM |
05-16-2018
11:20 PM
7 Kudos
In order to debug pairing DLM, you'll need the following pre-req: 1) Root access to the DPS VM Problem statement - have you received an error when pairing a cluster? Follow these step-by-step instructions to access the DLM log, to gain granular log information that will help you debug: 1) Run the "sudo docker ps" command to gain the container id for "dlm-app": In the image above, the container id for "dlm-app" is "83d879e9a45e". 2) Once you receive the container id, you can run the following command "sudo docker exec -it 83d879e9a45e /bin/tailf /usr/dlm-app/logs/application.log" This will give you insight into the DPS-DLM application, in the example above you'll see "ERROR". The error log will post once you click "pair" in the DLM UI. Using the information from the log, you'll be able to troubleshoot your issue.
... View more
- Find more articles tagged with:
- dlm
- dps
- Issue Resolution
- pair
- setup
04-03-2018
07:50 PM
4 Kudos
Reading and writing files to a MapR cluster (version 6) is simple, using the standard PutFile or GetFile, utilizing the MapR NFS. If you've searched high and low on how to do this, you've likely read articles and GitHub projects specifying steps. I've tried these steps without success, meaning whats out there is too complicated or out-dated to solve NiFi reading/writing to MapR. You don't need to re-compile the HDFS processors with the MapR dependencies, just follow the steps below: 1) Install the MapR client on each NiFi node #Install syslinux (for rpm install)
sudo yum install syslinux
#Download the RPM for your OS http://package.mapr.com/releases/v6.0.0/redhat/
rpm -Uvh mapr-client-6.0.0.20171109191718.GA-1.x86_64.rpm
#Configure the mapr client connecting with the cldb
/opt/mapr/server/configure.sh -c -N ryancicak.com -C cicakmapr0.field.hortonworks.com:7222 -genkeys -secure
#Once you have the same users/groups on your OS (as MapR), you will be able to use maprlogin password (allowing you to login with a Kerberos ticket)
#Prove that you can access the MapR FS
hadoop fs -ls /
2) Mount the MaprR FS on each NiFi node sudo mount -o hard,nolock cicakmapr0.field.hortonworks.com:/mapr /mapr *This will allow you to access the MapRFS on the mount point /mapr/yourclustername.com/location 3) Use the PutFile and GetFile processor referencing the /mapr directory on your NiFi nodes *Following 1-3 allows you to quickly read/write to MapR, using NiFi.
... View more
- Find more articles tagged with:
- HDFS
- How-ToTutorial
- mapr
- NiFi
- nifi-processor
- putfile
- solutions
Labels:
11-19-2018
03:23 PM
What kind of extension should have PACKAGES file?
... View more
10-25-2017
07:37 PM
7 Kudos
Installing the Alarm Fatigue Demo via Cloudbreak:
There are multiple ways to deploy the Alarm Fatigue Demo via Cloudbreak. Below are four options:
1) Deploy via the Cloudbreak UI
a) Login to https://cbdtest.field.hortonworks.com b) Select your credentials – if you credentials don’t exist, create them under “Manage Credentials” c) Once your credentials are selected, click “Create Cluster” d) Make-up a cluster name and choose the Availability Zone (SE) and then click “Setup Network and Security" e) “fieldcloud-openstack-network” should be selected and click “Choose Blueprint” f) Select the Blueprint called “alarm_fatigue_v2” Host Group 1 (Select Ambari Server, alarm-fatigue-demo and pre-install-java8) Host Group 2 (select pre-install-java8) Host Group 3 (select pre-install-java8) g) Click on “Review and Launch” e) Click on “Create and start cluster” (After clicking, the deployment via Cloudbreak will likely take 30-50 minutes, go get a coffee)
2) Deploy via Bash Script (specifying configuration file)
Create file .deploy.config with the following
Version=0.5
CloudBreakServer=
https://cbdtest.field.hortonworks.com
CloudBreakIdentityServer=
http://cbdtest.field.hortonworks.com:8089
CloudBreakUser=admin@example.com
CloudBreakPassword=yourpassword
CloudBreakCredentials=
CloudBreakClusterName=alarmfatigue-auto
CloudBreakTemplate=openstack-m3-xlarge
CloudBreakRegion=RegionOne
CloudBreakSecurityGroup=openstack-connected-platform-demo-all-services-port-v3
CloudBreakNetwork=fieldcloud-openstack-network
CloudBreakAvailabilityZone=SE
Change the highlighted
Then execute the following:
wget -O -
https://raw.githubusercontent.com/ryancicak/northcentral_hackathon/master/CloudBreakArtifacts/cloudbreak-cmd/deployer.sh| bash
3) Deploy via Bash Script inputting configurations (while prompted) Just execute wget -O -https://raw.githubusercontent.com/ryancicak/northcentral_hackathon/master/CloudBreakArtifacts/cloudbreak-cmd/deployer.sh| bash and fill out the information as prompted
4) Deploy via Jenkins
All four options will deploy install, configure and run all necessary services including "Alarm Fatigue Demo Control"
... View more
- Find more articles tagged with:
- Cloudbreak
- demo
- hdf
- hdp-2.3.4
- How-ToTutorial
- solutions
- streaming
10-25-2017
04:09 PM
Repo Description Quickly spin-up an end-to-end Alarm Fatigue Demo via Cloudbreak. All services (including the "Alarm Fatigue Demo Control") will be installed/configured/running after the Cloudbreak Blueprint / Recipe executes. Watch the Youtube installation of the Alarm Fatigue Demo: https://www.youtube.com/watch?v=Kilnu-YOCcc&feature=youtu.be The Alarm Fatigue Demo consists of a custom Ambari Service called "Alarm Fatigue Demo Control", which generates patient vitals every 5 seconds for 4 devices (4 patients). NiFi is used to pull the vitals (tailing the log file), stores all vitals in Hive, enriching the data (from Hive) and storing the data in Kafka. Streaming Analytics Manager then picks up the enriched patient information (with vitals) real-time, stores enriched data in HDFS, then aggregating the vitals every 1 minute, storing the aggregates in Druid cubes, and finally running rules (pulseRate > 100) and then sending notification(s) to the doctor - reducing Alarm Fatigue. One device will consistently throw high pulse rates, from the Hive table "device" column problemPercentage between 0.0-1.0 (0-100%), where device GUID ec93da97-08c6-43c4-a0a6-cb689723cf19 will throw a high pulse rate (greater than 100), 100% of the time. HDP services used: HDFS, YARN, MapReduce2, Tez, Hive (ACID), Zookeeper, Atlas, Cloudbreak, Ambari HDF services used: Kafka, Druid, NiFi, Schema Registry, Streaming Analytics Manager What is “Alarm Fatigue”?
Alarm fatigue or alert fatigue occurs when one is exposed to a large number of frequent alarms (alerts) and consequently becomes desensitized to them. Desensitization can lead to longer response times or to missing important alarms. There were 138 preventable deaths between 2010 and 2015, caused from alarm fatigue.
(https://en.wikipedia.org/wiki/Alarm_fatigue)
How can Alarm Fatigue be reduced? Instead of only sounding an alarm, being heard by the closest nurse or doctor, a notification should be sent to the proper doctor/nurse containing a severity level and acknowledgement.
What will HDP/HDF do to reduce Alarm Fatigue? It all starts on the edge device, being the various sensors in a hospital room (blood pressure, pulse rate monitor, respiratory rate monitor, thermometer, etc:.). For this use-case, we will assume our target hospital contains sensors with active connections to raspberry pi device(s). The raspberry pi device will gather logs from the sensors, therefore we will install MiNiFi and tail the logs. MiNiFi will then bi-directionally communicate with a centralized NiFi instance located at the hospital. The custom service "Alarm Fatigue Demo Control" emulates the function of Raspberry PI running MiNiFi collecting data from the sensors. High-Level Architecture: Repo Info Github Repo URL https://github.com/ryancicak/northcentral_hackathon.git Github account name ryancicak Repo name northcentral_hackathon.git
... View more
- Find more articles tagged with:
- ambari-extensions
- Cloudbreak
- Data Ingestion & Streaming
- demo
Labels:
09-22-2017
03:08 PM
Thanks Ryan. Can you please verify the following would work? The value of the FlowFileAttribute grok.expression is (?<severity>.{1}) (?<time>.{8}) (?<sequence>.{8}) (?<source>.{12}) (?<destination>.{12}) (?<action>.{30}) %{GREEDYDATA:data} Within Configure Processor of the ExtractGrok Processor, the value of Grok Expression is ${grok.expression} The expected behavior is that the ExtractGrok Processor would continue to work as though the Grok Expression were hardcoded with (?<severity>.{1}) (?<time>.{8}) (?<sequence>.{8}) (?<source>.{12}) (?<destination>.{12}) (?<action>.{30}) %{GREEDYDATA:data}
... View more
03-16-2017
11:59 PM
In Azure if I have an ExpressRoute in place, do you recommend a cache only DNS server in the Hadoop
vnet?
... View more
07-10-2018
09:18 AM
Hello, Thank you for the post. The data flow is working fine. however, I am getting duplicate records into hive table. Am I missing something here? I would like to import entire table records only one time and followed by incremental records only.
... View more
03-10-2017
05:36 PM
Hi @Ryan Cicak, I checked the metastore URI, it was not correct, so I fixed that, but now I'm getting a different error; no Kerberos on my setup; and our HDP is HA This error is similar to what @Matt Burgess pointed to as an issue at onetime in the past - NIFI-2873, but that issue should be resolved in NiFi 1.1.0 and above; I am trying this in NiFi 1.1.2, but still same issue as pre-1.1.0 Any thoughts ??
... View more
03-02-2017
03:23 PM
Hi Sunile, As we discussed yesterday, I found this installing HDP 2.5.3 using Ambari 2.4.2. Looking further into this, RHEL 7.3 comes installed with snappy 1.1.0-3.el7 while HDP 2.5.3 needs snappy 1.0.5-1.el6.x86_64. I spun up a RHEL 7.3 instance and ran the following command, showing snappy 1.1.0-3.el7 came pre-installed: As Jay posted - Looking at the latest documentation for Ambari 2.4.2, I found this problem in "Resolving Cluster Deployment Problems" - there should be a bug fix that goes into RHEL 7 (so we don't rely on a rhel 6 dependency) https://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-troubleshooting/content/resolving_cluster_install_and_configuration_problems.html - What do you think?
... View more
02-03-2017
03:05 AM
Detailed documentation about Tez View and debugging Hive Views is available:
For a quick overview of what Tez View can do, see How to Analyze or Debug Hive Queries. For how to set up Tez View after you have Ambari up-and-running and details about understanding the visualizations (such as DAGs) in the Tez View, see Using Tez View and Hive High Performance Best Practices.
... View more
11-30-2016
01:18 AM
5 Kudos
Prerequisites 1) Service Ambari Infra installed -> Ranger will use Ambari Infra's SolrCloud for Ranger Audit 2) MySQL installed and running (I'll use Hive's Metastore MySQL instance * MySQL is one of the many DB options) Installing Apache Ranger using Ambari Infra (SolrCloud) for Ranger Audit 1) Find the location of mysql-connector-java.jar (assume /usr/share/java/mysql-connector-java.jar) run the following command on the Ambari server sudo ambari-server setup --jdbc-db=mysql --jdbc-driver=/usr/share/java/mysql-connector-java.jar 2) In Ambari, click Add Service 3) Choose Ranger and click Next 4) Choose "I have met all the requirements above." and click Proceed (this was done in #1 above) 5) Assign master(s) for "Ranger Usersync" and "Ranger Admin" and click Next 6) Assign Slaves and Clients - since we did not install Apache Atlas, Ranger TagSync is not required and click Next 7) Customize Services -> Ranger Audit, click on "OFF" to enable SolrCloud Before clicking: After clicking: 😎 Customize Services -> Ranger Admin, enter "Ranger DB host" the DB you chose (in my case, I chose MySQL) and a password "Ranger DB password" for the user rangeradmin *Ranger will automatically add the user "rangeradmin" Add the proper credentials for a DB user that has administrator credentials (this administrator will create the user rangeradmin and Ranger tables) MySQL create an administrator user *Note: rcicak2.field.hortonworks.com is the server where Ranger is being installed CREATE USER 'ryan'@'rcicak2.field.hortonworks.com' IDENTIFIED BY 'lebronjamesisawesome';
GRANT ALL PRIVILEGES ON *.* TO 'ryan'@'rcicak2.field.hortonworks.com' WITH GRANT OPTION; Click Next 9) Review -> Click Deploy * Install, Start and Test will show you the progess of Ranger installing 10) Choose Ranger in Ambari 11) Choose "Configs" and "Ranger Plugin" and select the services you'd like Ranger to authorize (You'll need to restart the service after saving changes)
... View more
Labels:
04-12-2017
12:51 PM
@Ryan Cicak I followed procedure, but i am getting the handshake Exception and I made false "nifi.remote.input.secure" still didnt help. Please help me did I miss anything. 2017-04-12 06:13:50,696 INFO [StandardProcessScheduler Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=616d3e3e-015b-1000-0000-000000000000] to run with 1 threads 2017-04-12 06:13:51,006 ERROR [Timer-Driven Process Thread-1] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode 2017-04-12 06:13:51,008 ERROR [Timer-Driven Process Thread-1] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode 2017-04-12 06:13:51,020 INFO [NiFi Site-to-Site Connection Pool Maintenance] o.apache.nifi.remote.client.PeerSelector org.apache.nifi.remote.client.PeerSelector@1656d5bc Successfully refreshed Peer Status; remote instance consists of 1 peers
2017-04-12 06:14:01,023 ERROR [Timer-Driven Process Thread-3] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode 2017-04-12 06:14:01,023 ERROR [Timer-Driven Process Thread-3] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode 2017-04-12 06:14:11,036 ERROR [Timer-Driven Process Thread-2] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode 2017-04-12 06:14:11,036 ERROR [Timer-Driven Process Thread-2] o.a.nifi.remote.StandardRemoteGroupPort RemoteGroupPort[name=from minifi,target=http://hdf.hadoop.com:9090/nifi/] failed to communicate with http://hdf.hadoop.com:9090/nifi/ due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode 2017-04-12 06:14:21,049 ERROR [Timer-Driven Process Thread-4] o.a.n.r.c.socket.EndpointConnectionPool EndpointConnectionPool[Cluster URL=http://hdf.hadoop.com:9090/nifi/] failed to communicate with Peer[url=nifi://namenode.hadoop.com:10000,CLOSED] due to org.apache.nifi.remote.exception.HandshakeException: org.apache.nifi.remote.exception.ProtocolException: Expected to receive ResponseCode, but the stream did not have a ResponseCode Thanks Chaitanya
... View more
10-03-2016
05:17 PM
2 Kudos
If you've received the error exitCode=7 after enabling Kerberos, you are hitting this Jira bug. Notice the bug outlines the issue but does not outline a solution. The good news is the solution is simple, as I'll document below. Problem: If you've enabled Kerberos through Ambari, you'll get through around 90-95% of the last step "Start and Test Services" and then receive the error: 16/09/26 23:42:49 INFO mapreduce.Job: Running job: job_1474928865338_0022
16/09/26 23:42:55 INFO mapreduce.Job: Job job_1474928865338_0022 running in uber mode : false
16/09/26 23:42:55 INFO mapreduce.Job: map 0% reduce 0%
16/09/26 23:42:55 INFO mapreduce.Job: Job job_1474928865338_0022 failed with state FAILED due to: Application application_1474928865338_0022 failed 2 times due to AM Container for appattempt_1474928865338_0022_000002 exited with
exitCode: 7
For more detailed output, check application tracking page:
http://master2.fqdn.com:8088/cluster/app/application_1474928865338_0022
Then, click on links to logs of each attempt.Diagnostics: Exception from container-launch.
Container id: container_e05_1474928865338_0022_02_000001
Exit code: 7
Stack trace: ExitCodeException exitCode=7:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:576)
at org.apache.hadoop.util.Shell.run(Shell.java:487)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:753)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:371)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:303)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is ambari-qa
main : requested yarn user is ambari-qa
Container exited with a non-zero exit code 7
Failing this attempt. Failing the application. You'll notice running "Service Checks" for Tez, MapReduce2, YARN, Pig (any service that involves creating a YARN container) will fail with the exitCode=7. This is because in YARN, the local-dirs likely has the "noexec" flag specified meaning the binaries that are added to these directories cannot be executed. Solution: Open /etc/fstab (with the proper permissions) and remove the noexec flag under all mounted drives specified under "local-dirs" in YARN. Then either remount or reboot your machine - problem solved.
... View more
- Find more articles tagged with:
- Hadoop Core
09-24-2016
12:24 AM
1 Kudo
You may be in a bind if you need to install HDP on Azure with CentOS 6 or RHEL 6 and certain services (not everything). By following these steps below, you will be able to use ambari-server to install HDP on any of the supported Hortonworks/Azure VMs. 1) Configure your VMs - use the same VNet for all VMs Run the next steps as root or sudo the commands: 2) Update /etc/hosts on all your machines: vi /etc/hosts
172.1.1.0 master1.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net
172.1.1.1 master2.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net
172.1.1.2 master3.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net
172.1.1.3 worker1.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net
172.1.1.4 worker2.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net
172.1.1.5 worker3.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net * use the FQDN (find the fqdn by typing hostname -f). The ip address are internal and can be found on eth0 by typing ifconfig 3) Edit /etc/sudoers.d/waagent so that you don't need to type a password when sudoing a) change permissions on /etc/sudoers.d/waagent: chmod 600 /etc/sudoers.d/waagent
b) update the file "username ALL = (ALL) ALL" to "username ALL = (ALL) NOPASSWD: ALL": vi /etc/sudoers.d/waagent c) change permissions on /etc/sudoers.d/waagent: chmod 440 /etc/sudoers.d/waagent * change username to the user that you sudo with (the user that will install Ambari) 3) Disable iptables a) service iptables stop
b) chkconfig iptables off * If you need iptables enabled, please make the necessary port configuration changes found here 4) Disable transparent huge pages a) Run the following in your shell: cat > /usr/local/sbin/ambari-thp-disable.sh <<-'EOF'
#!/usr/bin/env bash
# disable transparent huge pages: for Hadoop
thp_disable=true
if [ "${thp_disable}" = true ]; then
for path in redhat_transparent_hugepage transparent_hugepage; do
for file in enabled defrag; do
if test -f /sys/kernel/mm/${path}/${file}; then
echo never > /sys/kernel/mm/${path}/${file}
fi
done
done
fi
exit 0
EOF
b) chmod 755 /usr/local/sbin/ambari-thp-disable.sh
c) sh /usr/local/sbin/ambari-thp-disable.sh * Perform a-c on all hosts to disable transparent huge pages 5) If you don't have a private key generated (where the host running ambari-server can use a privat key to login to all the hosts - please perform this step) a) ssh-keygen -t rsa -b 2048 -C "username@master1.jd32j3j3kjdppojdf3349dsfeow0.dx.internal.cloudapp.net"
b) ssh-copy-id -i /locationofgeneratedinaabove/id_rsa.pub username@master1 * Run b above on all hosts, this way you can ssh using the username into all hosts from the ambari-server host without a password 6) Install the ambari repo on the server where you'll install Ambari (documentation 😞 wget -nv http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.2.2.0/ambari.repo -O /etc/yum.repos.d/ambari.repo 7) Install ambari-server: yum install ambari-server 😎 Setup ambari-server: ambari-server setup * You can use the defaults by pressing ENTER 9) Start ambari-server: ambari-server start
* This could take a few minutes to startup depending on the speed of your machine 10) Open your browser and go to the ip address where ambari-server is running http://ambariipaddress:8080 * Continue with your HDP 2.4.3 installation
... View more
09-14-2016
05:25 AM
1 Kudo
Hi @Ryan Cicak Several processors that call an API (if not all) have a property Connection Timeout. You can set this property to wait for a fixed duration depending on your data source, network condition and so on (look at GetHttp for instance). You can use this property with a max retry strategy. The processor wait until the time out expire, and try again until it reaches a max retry number. If the max retry is reached, the flowfile goes into a processor that handle this special case (alert an admin, store data in a dir for errors, etc)
... View more
09-14-2016
11:43 AM
2 Kudos
What you described is the correct process, the NAR needs to be copied to the lib directory on each node of the cluster, and then the nodes need to be restarted. Nothing has changed in 1.0.0 that changes this approach.
... View more
02-07-2017
06:31 PM
1 Kudo
Hi @Sunile Manjee, The Tag policy flow states that once an access request comes in, for say Hive, the Hive service is scanned for any link to a tag based service. If found, all policies under the tag service will be scanned for the tag associated with the resources in the request. There can be only 1 tag policy for a tag, and so the policy which matches the tag is scanned. If there is a deny policy item denying the user access to the tagged resource, the flow terminates and the access request is denied. Like @Chethana Krishnakumar mentioned, You could check the association between Hive service and tag service. Also, you could check if the Hive resource is actually tagged through Atlas properly.
... View more
08-10-2016
09:59 PM
Hi @Ryan Cicak, Once i had restarted the VM from Azure, Ambari highlighted that i had to restart some of the services.
... View more
09-14-2016
04:53 PM
1 Kudo
I was able to take a closer look at this one and it appears that reading from directories with a large number of files is going to be a problem with both the GetFile and the ListFile processors in their current form. The root of the problem is that the processors are using the java.io.File.listFiles() method to bring back the directory listing. This is known to be a hog with directories containing a large number of files. The filters and batch size properties are applied after the full listing has been pulled back, meaning that you'll have to bring back a list of all files in a directory even if you only want a small subset of them. A potential solution (for a later version) would be to use the java.nio packages to read the files as a directory stream, allowing you to apply the filter to the stream itself and stop at a configurable batch size. I would also argue that ListFile needs a configurable batch size for this very reason. I will submit an issue for this one.
... View more
08-08-2016
07:32 PM
1 Kudo
Hi @john doe I recently ran PutKafka and GetKafka in NiFi (connecting to a local VM). I found that adding the FQDN and ip to /etc/hosts made this work for me. For example if the FQDN is host1.local and IP is 192.168.4.162 then adding 192.168.4.162 host1.local to /etc/hosts Made this work.
... View more
08-04-2016
03:50 PM
Hi @sbhat - this is certainly helpful, thank you for the reference!
... View more
08-03-2016
03:45 AM
Thank you! I believe java upgrade is not required. only openssl upgrade should fix this.
... View more
07-15-2016
11:28 PM
8 Kudos
Teradata's JDBC connector contains two jar files (tdgssconfig.jar and terajdbc4.jar) that must both be contained within the classpath. NiFi Database processors like ExecuteSQL or PutSQL use a connection pool such as DBCPConnectionPool which defines your JDBC connection to a database like Teradata. Follow the steps below to integrate Teradata JDBC connector into your DBCPConnectionPool: 1) Download the Teradata connectors (tdgssconfig.jar and terajdbc4.jar) - you can download the Teradata v1.4.1 connector on http://hortonworks.com/downloads/ 2) Extract the jar files (tdgssconfig.jar and terajdbc4.jar) from hdp-connector-for-teradata-1.4.1.2.3.2.0-2950-distro.tar.gz and move these files to your NIFI_DIRECTORY/lib/* 3) Restart NiFi 4) Under your DBCPConnectionPool (Controller > Controller Services), Edit your existing DBCPConnectionPool (if your pool is active, disable it before editing) 5) Under the Configuration Controller Service > Properties, define the following Database Connection URL: your Teradata jdbc connection url Database Driver Class Name: com.teradata.jdbc.TeraDriver Database Driver Jar Url: * Do not define anything, since you added the two jars to the NiFi classpath (nifi/lib), the driver jars will be automatically picked up -> you could only add one Jar here and you need two *which is why we added to the nifi/lib directory Database User: Provide Database user Password: Provide password for Database user You're all set, you'll now be able to connect to Teradata from NiFi!
... View more
- Find more articles tagged with:
- Data Ingestion & Streaming
- Database
- How-ToTutorial
- NiFi
- nifi-processor
- teradata
Labels:
08-18-2017
06:55 PM
Looks like this issue: https://issues.apache.org/jira/browse/RANGER-1631
... View more
06-29-2016
01:54 PM
5 Kudos
Security is a key element when discussing Big Data. A common requirement with security is data encryption. By following the instructions below, you'll be able to setup transparent data encryption in HDFS on defined directories otherwise known as encryption zones "EZ". Before starting this step-by-step tutorial, there are three HDP services that are essential (must be installed): 1) HDFS 2) Ranger 3) Ranger KMS Step 1: Prepare environment As explained in the HDFS "Data at Rest" Encryption manual a) If using Oracle JDK, verify JCE is installed (OpenJDK has JCE installed by default) If the server running Ranger KMS is using Oracle JDK, you must install JCE (necessary for Ranger KMS to run) instructions on installing JCE can be found here b) CPU Support for AES-NI optimization AES-NI optimization requires an extended CPU instruction set for AES hardware acceleration. There are several ways to check for this; for example: cat /proc/cpuinfo | grep aes
Look for output with flags and 'aes'. c) Library Support for AES-NI optimization You will need a version of the libcrypto.so library that supports hardware acceleration, such as OpenSSL 1.0.1e. (Many OS versions have an older version of the library that does not support AES-NI.) A version of the libcrypto.so libary with AES-NI support must be installed on HDFS cluster nodes and MapReduce client hosts -- that is, any host from which you issue HDFS or MapReduce requests. The following instructions describe how to install and configure the libcrypto.so library. RHEL/CentOS 6.5 or later On HDP cluster nodes, the installed version of libcrypto.so supports AES-NI, but you will need to make sure that the symbolic link exists: sudo ln -s /usr/lib64/libcrypto.so.1.0.1e /usr/lib64/libcrypto.so On MapReduce client hosts, install the openssl-devel package: sudo yum install openssl-devel d) Verify AES-NI support To verify that a client host is ready to use the AES-NI instruction set optimization for HDFS encryption, use the following command: hadoop checknative You should see a response similar to the following: 15/08/12 13:48:39 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
14/12/12 13:48:39 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
Native library checking:
hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0
zlib: true /lib64/libz.so.1
snappy: true /usr/lib64/libsnappy.so.1
lz4: true revision:99
bzip2: true /lib64/libbz2.so.1
openssl: true /usr/lib64/libcrypto.so Step 2: Create an Encryption key This step will outline how to create an encryption key using Ranger. a) Login to Ranger http://RANGER_FQDN_ADDR:6080/ * To access Ranger KMS (Encryption) - login using the username "keyadmin", the default password is "keyadmin" - remember to change this password b) Choose Encryption > Key Manager * In this tutorial, "hdptutorial" is the name of the HDP cluster. Your name will be different, depending on your cluster name. c) Choose Select Service > yourclustername_kms
d) Choose "Add New Key"
e) Create the new key Length - either 128 or 256 * Length of 256 requires JCE installed on all hosts in the cluster"The default key size is 128 bits. The optional -size parameter supports 256-bit keys, and requires the Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy File on all hosts in the cluster. For installation information, see the Ambari Security Guide." Step 3: Add KMS Ranger Policies for encrypting directory a) Login to Ranger http://RANGER_FQDN_ADDR:6080/ * To access Ranger KMS (Encryption) - login using the username "keyadmin", the default password is "keyadmin" - remember to change this password b) Choose Access Manager > Resource Based Policies c) Choose Add New Policy d) Create a policy - the user hdfs must be added to GET_METDATA and GENERATE_EEK -> using any user calls the user hdfs in the background - the user "nicole" is a custom user I created to be able to read/write data using the key "yourkeyname"
Step 4: Create an Encryption Zone a) Create a new directory hdfs dfs -mkdir /zone_encr * Leave the directory empty until the directory has been encrypted (recommend using a superuser to create the directory) b) Create an encryption zone hdfs crypto -createZone -keyName yourkeyname -path /zone_encr * Using the user "nicole" above to create the encryption zone c) Validate the encryption zone exists hdfs crypto -listZones * must be a superuser to call this command (or part of a superuser group like hdfs) The command should output: [nicole@hdptutorial01 security]$ hdfs crypto -listZones
/zone_encr yourkeyname
* You will now be able to read/write data to your encrypted directory /zone_encr. If you receive any errors - including "IOException:" when creating an encryption zone in Step 4 (b) take a look at your Ranger KMS server /var/log/ranger/kms/kms.log -> there usually is a permission issue accessing the key * To find out more about how transparent data encryption in HDFS works, refer to the Hortonworks blog here Tested in HDP: 2.4.2
... View more
- Find more articles tagged with:
- Encryption
- HDFS
- How-ToTutorial
- Ranger
- ranger-kms
- Security
06-17-2016
03:31 PM
Hi @Ryan Cicak The best practice is to configure Ranger audits to both Solr and HDFS. HDFS is used for long term audit storage so you won't want to delete audit data. Solr should be used for short term storage. By using Solr you have data indexed and you can query it quickly from Ranger UI. I am not aware of any setting or property in Ranger to set a TTL and automatically delete data. You may leverage Solr TTL feature to purge data (link) or schedule a job to issue a delete query periodically.
... View more
06-10-2016
11:35 PM
2 Kudos
A remote Linux system can use NFS (Network File System) to mount an HDFS file system and interact with the file system. Before proceeding, it's important to understand that your linux instance is directly accessing your HDFS system through the network, therefore you will incur network latency. Depending on your dataset size, you have to remember you could be potentially processing gigabytes or more of data on a single machine therefore this is not the best approach for large datasets. These steps will show you how to mount and interact with a remote HDFS node within your Linux system: 1) The linux system must have NFS installed (CentOS for demo) yum install nfs-utils nfs-utils-lib 2) Your HDP cluster must have an NFS Gateway installed (Ambari allows this option with one click) * Keep track of either the FQDN or IP address of the NFSGateway 3) In Ambari, under HDFS > Advanced > General set Access time precision = 3600000 3) Mount the NFS Gateway on your linux system (must be root) mount -t nfs -o vers=3,proto=tcp,nolock myipaddressorfqdnofnfsgateway:/ /opt/remotedirectory
4) On both your HDFS node & remote Linux system add the same user with the same uid (making sure neither already exist) useradd -u 1234 testuser * If your user/uid doesn't match between HDFS node and your remote Linux system - whatever uid you are logged in as on your remote Linux system will be passed and interpreted by the NFS Gateway. For example if your Linux system has usertest (uid = 501) and you write a file to HDFS's /tmp, the file owner of the file will be whichever user on the HDFS node matches uid=501 - therefore it is good practice to match both the username and the uid across both systems. 5) On your remote Linux system, login as your "testuser" and go-to your mounted NFS directory cd /opt/remotedirectory You will now be able to interact with HDFS with native linux command such as cp, less, more, etc:.
... View more
- Find more articles tagged with:
- Ambari
- Hadoop Core
- HDFS
- How-ToTutorial
- Linux
- nfs
Labels:
06-25-2016
01:53 AM
1 Kudo
As mentioned in https://community.hortonworks.com/questions/37192/error-no-package-python27-available-while-setting.html, the tutorial has been corrected.
... View more
06-07-2016
07:09 AM
1 Kudo
If you want to see dates and update history for tutorials then I would suggest looking at the source in github. https://github.com/hortonworks/tutorials Tutorials are updated on Sandbox update release schedule which tend to correspond to major HDP releases. Here you can see the latest version of the HDP tutorials for HDP 2.4: https://github.com/hortonworks/tutorials/tree/hdp/tutorials/hortonworks For example, I believe earlier you had a question on Ipython Notebook with Spar tutorial and here you can see the history of updates for this tutorial: https://github.com/hortonworks/tutorials/commits/hdp/tutorials/hortonworks/ipython-notebook-with-spark/tutorial.md The tutorials list what is the prerequisite to do the tutorial. If you want to learn more about the tutorials or make a contribution then at the bottom of each tutorial there is paragraph that talks about the github repo and contribution guide. I am happy to chat with you to see how we can make this template more descriptive.
... View more