Member since
01-14-2019
28
Posts
14
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1464 | 11-30-2016 04:08 PM | |
8854 | 04-22-2016 09:30 PM |
06-22-2017
06:43 PM
If i'm understanding correctly, you simply want a way to transfer from you shared folder to another linux file system on the edge node and NOT to hdfs. There are a few ways to do it. 1. Use winscp like @Jay SenSharma mentioned 2. Create an nfs share on the edge node so you could simply use your regular winodws folder to drag and drop 3. Create an ftp(s) service on the edge node so you could use ftp to transfer the file 4. Use a program like filezilla to do the transfer for you. And these are just some of the options. You may also be interested in how to upload from your workstation to, eventually hdfs. 1. After uploading to edge node and if you have the hdfs client, simply use hdfs commands as @Jay SenSharma has mentioned 2. Use an NFS gateway. This way, the hadoop file system can be displayed as a regular folder in your windows machine.. pretty cool actually. 3. Use ambari files view to upload files in your shared folder directly to hdfs 4. Use 3rd party tools to move the files for you. BI tools have hooks that can you webhdfs api to upload the files for you directly to hdfs.
... View more
02-21-2017
06:53 AM
1 Kudo
Does performing a:
hdfs haadmin -failover nn1 nn2
Ever a disruptive procedure? Suppose all services are up and running (zookeeper, zkfc, JN), it's my understanding that this should be a safe procedure and wouldn't cause jobs to fail, but wanted to know in what circumstances would this potentially be problematic.
... View more
Labels:
- Labels:
-
Apache Hadoop
12-07-2016
01:34 AM
Ahh.. actually misread your query... thought you were simply reading off a schema... @Michael Young explains it better 🙂
... View more
12-07-2016
01:31 AM
1 Kudo
Simply put, the first query only hits the metastore database and doesn't launch a map reduce job. On the other hand, the second query runs a map side mapreduce job EDIT: Interestingly enough, for the first query, hive makes some good decisions on how to read the data. A simple select * could essentially simply be fetching a file from hdfs like an hdfs get... simplified, but true.
... View more
12-06-2016
11:56 PM
1 Kudo
as @Ramesh Mani mentioned, this seems to be more authorization related. For a quick fix, try to assigning read permissions the hdfs level (hadoop fs -chmod 755 /apps/hive/warehouse). For a more valid way of doing it, go to ranger and go to your hdfs policies and make sure you have the proper permissions for hive user to access the said directory.
... View more
11-30-2016
04:08 PM
1 Kudo
@Sunile Manjee This would really depend on the cluster size and the number of jobs running. Very hard to gauge. A 10 node cluster with around 15-20 components can easily generate 1GB of audit logs PER DAY. Again depends on the cluster activity. You could use this as a baseline, but again, really hard to gauge. Then again consider this only if being forced to use DB and after strongly advising against using DB as oppose to using Solr for ranger audits 🙂
... View more
08-03-2016
05:45 PM
Oh, sorry misread the path... it works. much appreciated!
... View more
08-03-2016
03:13 PM
@Jitendra Yadav it's under /usr/hdp/2.3.4.7/ranger-hive-plugin/lib/ranger-hive-plugin/ranger-hive-plugin-.... Seems like it exists....
... View more
08-03-2016
02:53 PM
We just installed ranger and turned on hdfs and hive plugin. However, hiveserver2 keeps giving out a Classnotfound exception saying that org.apache.ranger.authorization.hive.authorizer.rangerhiveauthorizerfactory could not be found. This is using HDP v 2.3.4.7 (kerberized) Ranger: 0.5.0.2.3 Any ideas?
... View more
Labels:
- Labels:
-
Apache Hive
05-26-2016
04:18 AM
2 Kudos
We recently had some issues with Ambari and Kafka alerts. It all started when in HDP 2.3.4, every time we changed the kafka listener port from 6667 to any other port, Ambari would complain and give us an error saying that it couldn’t reach port 6667 even though the broker service is actually running in another port. Here’s the exact error:
“Connection failed: [Errno 111] Connection refused to sandbox.hortonworks.com:6667”
This can be quite annoying especially if you have multiple brokers and they’re all reporting CRITICAL and you just can’t seem to get rid of it.
To cut the long story short, here are the steps we did to get rid of the problem. In the section below, we’ll jump into some troubleshooting tips:
1. Get the ID of the kafka broker
> curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions"
2. Get the definitions and save it locally
> curl -u admin:admin -H 'X-Requested-By: ambari' -X GET "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47” > kafka_alerts.json
3. EDIT kafka_alerts.json
Remove the href line.
Change 6667.0 to your new port (e.g. 9092) (Do NOT use decimal or you get a NumberFormatException in the ambari-server.log and no Alerts)
The final JSON file should look like this:
{
"AlertDefinition" : {
"cluster_name" : "Sandbox",
"component_name" : "KAFKA_BROKER",
"description" : "This host-level alert is triggered if the Kafka Broker cannot be determined to be up.",
"enabled" : true,
"id" : 47,
"ignore_host" : false,
"interval" : 1,
"label" : "Kafka Broker Process",
"name" : "kafka_broker_process",
"scope" : "HOST",
"service_name" : "KAFKA",
"source" : {
"default_port" : 9092,
"reporting" : {
"critical" : {
"value" : 5.0,
"text" : "Connection failed: {0} to {1}:{2}"
},
"warning" : {
"text" : "TCP OK - {0:.3f}s response on port {1}",
"value" : 1.5
},
"ok" : {
"text" : "TCP OK - {0:.3f}s response on port {1}"
}
},
"type" : “PORT”,
"uri" : "{{kafka-broker/port}}"
}
}
4.
Upload the file
Do this by running this command in the same directory where you saved kafka_alerts.json file:
> curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47" -d @kafka_alerts.json
It can take up to a minute for Ambari to run the metrics again. To speed things up
you can force ambari to run the alert check:
> curl -u admin:admin -H 'X-Requested-By: ambari' -X PUT "http://localhost:8080/api/v1/clusters/Sandbox/alert_definitions/47?run_now=true”
This
should solve the issue.
Troubleshooting:
If you're still having trouble, these suggestions/tips should help you out.
When uploaded the JSON, make sure the JSON is valid. This is easy to catch as the PUT
will return an error that says invalid structure.
Make sure the default_port is an INTEGER when you upload. This is tricky because if
you keep the decimal (ex. 6667.0), you won't get an error response, but if you look at /var/log/ambari-server/ambari-server.log, you'll get a number format exception.
What's even more tricky is that ambari will start ignoring these metrics all together and you'll end up
with this:
Tail the /var/log/ambari-agents/amabri-agents.log file when you run the PUT commands and have a look out for these types of log entries:
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 274, in __json_to_callable
source = json_definition['source']
TypeError: 'NoneType' object is unsubscriptable
This means that your json is invalid, either because of a number format exception or other reasons. Correlate against the ambari-server.log to find out additional information.
Manually trigger the alert and look for these types logs for validation:
INFO 2016-05-25 16:48:33,355 AlertSchedulerHandler.py:374 - [AlertScheduler] Executing on-demand alert kafka_broker_process (1e0e1edc-e051-45bc-8d38-97ae0b3b83f0)
This at least gives you confidence that your alert definition is valid. If instead you get these type of error:
ERROR 2016-05-25 19:40:21,470 AlertSchedulerHandler.py:379 - [AlertScheduler] Unable to execute the alert outside of the job scheduler
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/ambari_agent/AlertSchedulerHandler.py", line 363, in execute_alert
alert_definition = execution_command['alertDefinition']
KeyError: 'alertDefinition'
Then you know something's wrong with the alert definition.
If things still don't work, try removing the uri from the alert defintion. This will force
ambari to look at the default_port as fall back. Ambari's alert scheduler first looks at the URI,
if it's valid, it uses this, if not, it falls back to using default_port. Remember if you remove
the uri be sure to remove the comma after "PORT"
... View more