Member since
10-04-2016
243
Posts
281
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1172 | 01-16-2018 03:38 PM | |
6141 | 11-13-2017 05:45 PM | |
3034 | 11-13-2017 12:30 AM | |
1519 | 10-27-2017 03:58 AM | |
28430 | 10-19-2017 03:17 AM |
06-22-2017
02:39 PM
2 Kudos
The best way is definitely just to increase the ulimit if possible, this is sort of an assumption we make in Spark that clusters will be able to move it around. You might be able to hack around this by decreasing the number of reducers [or cores used by each node] but this could have some performance implications for your job. In general if a node in your cluster has C assigned cores and you run a job with X reducers then Spark will open C*X files in parallel and start writing. Shuffle consolidation will help decrease the total number of files created but the number of file handles open at any time doesn't change so it won't help the ulimit problem.
... View more
06-22-2017
02:26 PM
3 Kudos
Phoenix works with JDBCTemplate and Spring, using the JDBC driver org.apache.phoenix.jdbc.PhoenixDriver. Here is an example spring implementationhttp://blog.csdn.net/eric_sunah/article/details/44494321 with connection pooling (the blog post is in Chinese but it translates well on chrome). If you are specifically looking for ORM, then you can use https://phoenix.apache.org/phoenix_orm.html
... View more
06-14-2017
07:33 PM
3 Kudos
While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge. Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references. To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order. Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing. Workaround If you still want to write queries where you want to group by derived column, you can do it using this nifty technique. Set the following property: SET hive.groupby.orderby.position.alias=true; This will allow you to group by columns based on the position in the select clause. So instead of writing your query like this: select sum(ordertotal), year(order_date) from orders group by year(order_date) You can write the group by clause using position as shown below: select sum(ordertotal), year(order_date) from orders group by 2 This will get you the desired result without having to repeat "year(order_date)" in group by clause. As always, if this answers helps you, please consider accepting it.
... View more
06-05-2017
03:32 PM
2 Kudos
If you are using PySpark, there appears to be a bug where pyspark crashes for large datasets. https://issues.apache.org/jira/browse/SPARK-12261 Since you are just trying to see sample data, you could use collect and then print. However, collect should not be used for large datasets as it brings all the data to driver node and could basically make the driver node run out of memory. This link gives a detail on how to print the rdd elements using Scala. Refer here for PySpark.
... View more
06-01-2017
07:55 PM
2 Kudos
Found the answer with the help of this KB Article. https://community.hortonworks.com/content/supportkb/48773/ambari-metric-system-fails-with-connection-failed.html
... View more
05-31-2017
12:22 AM
1 Kudo
I was using Ambari 2.1 to manage HDP 2.3.0 [AWS Community AMI] I was able to upgrade to Ambari 2.4.0.1 successfully and all agents were also upgraded and started. When I check the ambari dashboard, Ambari fails to restart Falcon and Atlas. I tried to restart a couple times but to no avail. Here are the services I have on my cluster: HDFS, YARN, MapReduce2, Tez, Hive, Pig, Oozie, Zookeeper, Falcon, Ambari Metrics, Atlas, Slider Here are the Atlas Metadata Server restart error log: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 217, in <module>
MetadataServer().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 685, in restart
self.stop(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 113, in stop
user=params.metadata_user,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'source /etc/atlas/conf/atlas-env.sh; /usr/hdp/current/atlas-server/bin/atlas_stop.py' returned 255. -bash: /etc/atlas/conf/atlas-env.sh: No such file or directory
Exception: [Errno 17] File exists: '/usr/hdp/2.3.0.0-2557/atlas/conf'
Traceback (most recent call last):
File "/usr/hdp/current/atlas-server/bin/atlas_stop.py", line 53, in <module>
returncode = main()
File "/usr/hdp/current/atlas-server/bin/atlas_stop.py", line 28, in main
confdir = mc.dirMustExist(mc.confDir(metadata_home))
File "/usr/hdp/2.3.0.0-2557/atlas/bin/atlas_config.py", line 94, in dirMustExist
os.mkdir(dirname)
OSError: [Errno 17] File exists: '/usr/hdp/2.3.0.0-2557/atlas/conf' Here are the Falcon Server restart error log: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py", line 177, in <module>
FalconServer().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 685, in restart
self.stop(env, upgrade_type=upgrade_type)
File "/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py", line 55, in stop
falcon('server', action='stop', upgrade_type=upgrade_type)
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon.py", line 251, in falcon
environment=environment_dictionary)
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/hdp/current/falcon-server/bin/falcon-stop' returned 1. Hadoop home is set, adding libraries from '/usr/hdp/current/hadoop-client/bin/hadoop classpath' into falcon classpath
/usr/hdp/current/falcon-server/bin/service-stop.sh: line 37: kill: (3090) - No such process Any help/hint is greatly appreciated.
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Atlas
-
Apache Falcon
05-30-2017
08:13 PM
1 Kudo
This article worked well for me except the following two commands:
pg_dump -U ambari ambari > ambari.sql pg_dump -U mapred ambarirca > ambarirca.sql To fix this, I used -f operator:
pg_dump -U ambari ambari -f ambari.sql pg_dump -U mapred ambarirca -f ambarirca.sql
... View more
05-18-2017
07:52 PM
2 Kudos
@ramsai janapana - use the following command: ssh root@namenode Password - hadoop As always, if this answer helps you, please don't forget to accept the answer. Thank you.
... View more
05-15-2017
03:15 PM
1 Kudo
I am using HDP 2.4.0.7.
I am trying to find the default values for the following properties for Phoenix/HBase: hbase.hconnection.threads.max
phoenix.client.connection.max.allowed.connections Since these values have not been set, I believe it will use the default value. I want to know the default values.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix