About ed_gleeck

ed_gleeck · ‎06-22-2017

If i'm understanding correctly, you simply want a way to transfer from you shared folder to another linux file system on the edge node and NOT to hdfs. There are a few ways to do it. 1. Use winscp like @Jay SenSharma mentioned 2. Create an nfs share on the edge node so you could simply use your regular winodws folder to drag and drop 3. Create an ftp(s) service on the edge node so you could use ftp to transfer the file 4. Use a program like filezilla to do the transfer for you. And these are just some of the options. You may also be interested in how to upload from your workstation to, eventually hdfs. 1. After uploading to edge node and if you have the hdfs client, simply use hdfs commands as @Jay SenSharma has mentioned 2. Use an NFS gateway. This way, the hadoop file system can be displayed as a regular folder in your windows machine.. pretty cool actually. 3. Use ambari files view to upload files in your shared folder directly to hdfs 4. Use 3rd party tools to move the files for you. BI tools have hooks that can you webhdfs api to upload the files for you directly to hdfs.

ed_gleeck · ‎02-21-2017

Does performing a: hdfs haadmin -failover nn1 nn2 Ever a disruptive procedure? Suppose all services are up and running (zookeeper, zkfc, JN), it's my understanding that this should be a safe procedure and wouldn't cause jobs to fail, but wanted to know in what circumstances would this potentially be problematic.

ed_gleeck · ‎12-07-2016

Ahh.. actually misread your query... thought you were simply reading off a schema... @Michael Young explains it better 🙂

ed_gleeck · ‎12-07-2016

Simply put, the first query only hits the metastore database and doesn't launch a map reduce job. On the other hand, the second query runs a map side mapreduce job EDIT: Interestingly enough, for the first query, hive makes some good decisions on how to read the data. A simple select * could essentially simply be fetching a file from hdfs like an hdfs get... simplified, but true.

ed_gleeck · ‎12-06-2016

as @Ramesh Mani mentioned, this seems to be more authorization related. For a quick fix, try to assigning read permissions the hdfs level (hadoop fs -chmod 755 /apps/hive/warehouse). For a more valid way of doing it, go to ranger and go to your hdfs policies and make sure you have the proper permissions for hive user to access the said directory.

ed_gleeck · ‎11-30-2016

@Sunile Manjee This would really depend on the cluster size and the number of jobs running. Very hard to gauge. A 10 node cluster with around 15-20 components can easily generate 1GB of audit logs PER DAY. Again depends on the cluster activity. You could use this as a baseline, but again, really hard to gauge. Then again consider this only if being forced to use DB and after strongly advising against using DB as oppose to using Solr for ranger audits 🙂

ed_gleeck · ‎08-03-2016

Oh, sorry misread the path... it works. much appreciated!

ed_gleeck · ‎08-03-2016

@Jitendra Yadav it's under /usr/hdp/2.3.4.7/ranger-hive-plugin/lib/ranger-hive-plugin/ranger-hive-plugin-.... Seems like it exists....

ed_gleeck · ‎08-03-2016

We just installed ranger and turned on hdfs and hive plugin. However, hiveserver2 keeps giving out a Classnotfound exception saying that org.apache.ranger.authorization.hive.authorizer.rangerhiveauthorizerfactory could not be found. This is using HDP v 2.3.4.7 (kerberized) Ranger: 0.5.0.2.3 Any ideas?

ed_gleeck · ‎05-26-2016

We recently had some issues with Ambari and Kafka alerts. It all started when in HDP 2.3.4, every time we changed the kafka listener port from 6667 to any other port, Ambari would complain and give us an error saying that it couldn’t reach port 6667 even though the broker service is actually running in another port. Here’s the exact error: “Connection failed: [Errno 111] Connection refused to sandbox.hortonworks.com:6667” This can be quite annoying especially if you have multiple brokers and they’re all reporting CRITICAL and you just can’t seem to get rid of it. To cut the long story short, here are the steps we did to get rid of the problem. In the section below, we’ll jump into some troubleshooting tips: 1. Get the ID of the kafka broker > curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions" 2. Get the definitions and save it locally > curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47” > kafka_alerts.json 3. EDIT kafka_alerts.json Remove the href line. Change 6667.0 to your new port (e.g. 9092) (Do NOT use decimal or you get a NumberFormatException in the ambari-server.log and no Alerts) The final JSON file should look like this: { "AlertDefinition" : { "cluster_name" : "Sandbox", "component_name" : "KAFKA_BROKER", "description" : "This host-level alert is triggered if the Kafka Broker cannot be determined to be up.", "enabled" : true, "id" : 47, "ignore_host" : false, "interval" : 1, "label" : "Kafka Broker Process", "name" : "kafka_broker_process", "scope" : "HOST", "service_name" : "KAFKA", "source" : { "default_port" : 9092, "reporting" : { "critical" : { "value" : 5.0, "text" : "Connection failed: {0} to {1}:{2}" }, "warning" : { "text" : "TCP OK - {0:.3f}s response on port {1}", "value" : 1.5 }, "ok" : { "text" : "TCP OK - {0:.3f}s response on port {1}" } }, "type" : “PORT”, "uri" : "{{kafka-broker/port}}" } } 4. Upload the file Do this by running this command in the same directory where you saved kafka_alerts.json file: > curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47" -d @kafka_alerts.json It can take up to a minute for Ambari to run the metrics again. To speed things up you can force ambari to run the alert check: > curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47?run_now=true” This should solve the issue. Troubleshooting: If you're still having trouble, these suggestions/tips should help you out. When uploaded the JSON, make sure the JSON is valid. This is easy to catch as the PUT will return an error that says invalid structure. Make sure the default_port is an INTEGER when you upload. This is tricky because if you keep the decimal (ex. 6667.0), you won't get an error response, but if you look at /var/log/ambari-server/ambari-server.log, you'll get a number format exception. What's even more tricky is that ambari will start ignoring these metrics all together and you'll end up with this: Tail the /var/log/ambari-agents/amabri-agents.log file when you run the PUT commands and have a look out for these types of log entries: Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 274, in __json_to_callable source = json_definition['source'] TypeError: 'NoneType' object is unsubscriptable This means that your json is invalid, either because of a number format exception or other reasons. Correlate against the ambari-server.log to find out additional information. Manually trigger the alert and look for these types logs for validation: INFO 2016-05-25 16:48:33,355 AlertSchedulerHandler.py:374 - [AlertScheduler] Executing on-demand alert kafka_broker_process (1e0e1edc-e051-45bc-8d38-97ae0b3b83f0) This at least gives you confidence that your alert definition is valid. If instead you get these type of error: ERROR 2016-05-25 19:40:21,470 AlertSchedulerHandler.py:379 - [AlertScheduler] Unable to execute the alert outside of the job scheduler Traceback (most recent call last): File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 363, in execute_alert alert_definition = execution_command['alertDefinition'] KeyError: 'alertDefinition' Then you know something's wrong with the alert definition. If things still don't work, try removing the uri from the alert defintion. This will force ambari to look at the default_port as fall back. Ambari's alert scheduler first looks at the URI, if it's valid, it uses this, if not, it falls back to using default_port. Remember if you remove the uri be sure to remove the comma after "PORT"

Online	Offline
Last Visited	‎12-20-2024 12:13 PM

Member Since	‎01-14-2019 06:23 AM
Last Visited	‎12-20-2024 12:13 PM
Posts	28
Kudos received	14

Cloudera Community

Re: Ranger Audit DB size?

Re: Trouble with JsontoSQL processor in Nifi

Re: I am wiling to transfer my excel and Csv files...

Does Performing a Manual Failover Ever Disruptive?

Re: Hive Difference between CLI commands

Re: Hive Difference between CLI commands

Re: kerberos ticket not working after I enabled SO...

Re: Ranger Audit DB size?

Re: What's causing ClassNotFound: RangerHiveAuthor...

Re: What's causing ClassNotFound: RangerHiveAuthor...

What's causing ClassNotFound: RangerHiveAuthorizer...

Fixing Ambari-Kafka Alert Errors