About dineshc

dineshc · ‎06-22-2017

The best way is definitely just to increase the ulimit if possible, this is sort of an assumption we make in Spark that clusters will be able to move it around. You might be able to hack around this by decreasing the number of reducers [or cores used by each node] but this could have some performance implications for your job. In general if a node in your cluster has C assigned cores and you run a job with X reducers then Spark will open C*X files in parallel and start writing. Shuffle consolidation will help decrease the total number of files created but the number of file handles open at any time doesn't change so it won't help the ulimit problem.

dineshc · ‎06-22-2017

Phoenix works with JDBCTemplate and Spring, using the JDBC driver org.apache.phoenix.jdbc.PhoenixDriver. Here is an example spring implementationhttp://blog.csdn.net/eric_sunah/article/details/44494321 with connection pooling (the blog post is in Chinese but it translates well on chrome). If you are specifically looking for ORM, then you can use https://phoenix.apache.org/phoenix_orm.html

dineshc · ‎06-14-2017

While some databases like MySql and Teradata allow using column alias in GROUP BY clause, Hive does not allow this to the best of my knowledge. Conceptually, GROUP BY processing happens before SELECT list computation, so it would be circular to allow such references. To understand this better, use the EXPLAIN feature in Hive for your query. That will give you a logical breakdown of your query and will reveal how it is being processed & in what order. Here is the link from official wiki - https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Explain The above link demonstrates group by example and you will see GROUP BY processing happens before SELECT, thereby leading to the error you are facing. Workaround If you still want to write queries where you want to group by derived column, you can do it using this nifty technique. Set the following property: SET hive.groupby.orderby.position.alias=true; This will allow you to group by columns based on the position in the select clause. So instead of writing your query like this: select sum(ordertotal), year(order_date) from orders group by year(order_date) You can write the group by clause using position as shown below: select sum(ordertotal), year(order_date) from orders group by 2 This will get you the desired result without having to repeat "year(order_date)" in group by clause. As always, if this answers helps you, please consider accepting it.

dineshc · ‎06-05-2017

If you are using PySpark, there appears to be a bug where pyspark crashes for large datasets. https://issues.apache.org/jira/browse/SPARK-12261 Since you are just trying to see sample data, you could use collect and then print. However, collect should not be used for large datasets as it brings all the data to driver node and could basically make the driver node run out of memory. This link gives a detail on how to print the rdd elements using Scala. Refer here for PySpark.

dineshc · ‎06-01-2017

Found the answer with the help of this KB Article. https://community.hortonworks.com/content/supportkb/48773/ambari-metric-system-fails-with-connection-failed.html

dineshc · ‎05-31-2017

I was using Ambari 2.1 to manage HDP 2.3.0 [AWS Community AMI] I was able to upgrade to Ambari 2.4.0.1 successfully and all agents were also upgraded and started. When I check the ambari dashboard, Ambari fails to restart Falcon and Atlas. I tried to restart a couple times but to no avail. Here are the services I have on my cluster: HDFS, YARN, MapReduce2, Tez, Hive, Pig, Oozie, Zookeeper, Falcon, Ambari Metrics, Atlas, Slider Here are the Atlas Metadata Server restart error log: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 217, in <module> MetadataServer().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 685, in restart self.stop(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/ATLAS/0.1.0.2.3/package/scripts/metadata_server.py", line 113, in stop user=params.metadata_user, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'source /etc/atlas/conf/atlas-env.sh; /usr/hdp/current/atlas-server/bin/atlas_stop.py' returned 255. -bash: /etc/atlas/conf/atlas-env.sh: No such file or directory Exception: [Errno 17] File exists: '/usr/hdp/2.3.0.0-2557/atlas/conf' Traceback (most recent call last): File "/usr/hdp/current/atlas-server/bin/atlas_stop.py", line 53, in <module> returncode = main() File "/usr/hdp/current/atlas-server/bin/atlas_stop.py", line 28, in main confdir = mc.dirMustExist(mc.confDir(metadata_home)) File "/usr/hdp/2.3.0.0-2557/atlas/bin/atlas_config.py", line 94, in dirMustExist os.mkdir(dirname) OSError: [Errno 17] File exists: '/usr/hdp/2.3.0.0-2557/atlas/conf' Here are the Falcon Server restart error log: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py", line 177, in <module> FalconServer().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 685, in restart self.stop(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon_server.py", line 55, in stop falcon('server', action='stop', upgrade_type=upgrade_type) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/FALCON/0.5.0.2.1/package/scripts/falcon.py", line 251, in falcon environment=environment_dictionary) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 273, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 71, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 93, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 141, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 294, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/hdp/current/falcon-server/bin/falcon-stop' returned 1. Hadoop home is set, adding libraries from '/usr/hdp/current/hadoop-client/bin/hadoop classpath' into falcon classpath /usr/hdp/current/falcon-server/bin/service-stop.sh: line 37: kill: (3090) - No such process Any help/hint is greatly appreciated.

dineshc · ‎05-30-2017

This article worked well for me except the following two commands: pg_dump -U ambari ambari > ambari.sql pg_dump -U mapred ambarirca > ambarirca.sql To fix this, I used -f operator: pg_dump -U ambari ambari -f ambari.sql pg_dump -U mapred ambarirca -f ambarirca.sql

dineshc · ‎05-18-2017

@ramsai janapana - use the following command: ssh root@namenode Password - hadoop As always, if this answer helps you, please don't forget to accept the answer. Thank you.

dineshc · ‎05-15-2017

Thank you !

dineshc · ‎05-15-2017

I am using HDP 2.4.0.7. I am trying to find the default values for the following properties for Phoenix/HBase: hbase.hconnection.threads.max phoenix.client.connection.max.allowed.connections Since these values have not been set, I believe it will use the default value. I want to know the default values.

Online	Offline
Last Visited	‎12-08-2021 02:51 PM

Member Since	‎10-04-2016 05:35 PM
Last Visited	‎12-08-2021 02:51 PM
Posts	243
Kudos received	276

Cloudera Community

Re: Hortonworks HDPCA Practice Exam V3 Task.

Re: Spark 1.6 - Dataframe read json throws org.apa...

Re: Service 'webhcat' check failed: RA080 Can't de...

Re: Unable to see HDFS metrics in Grafana

Re: Spark sort by key with descending order

Re: Spark (java) - too many open files

Re: Hi Everyone, I just want to ask does the sprin...

Re: Irregularities in Select query

Re: spark - spark socketexception connection reset...

Re: Upgrade ambari 2.1 to 2.4 : failed to restart ...

Upgrade ambari 2.1 to 2.4 : failed to restart Atla...

Re: Backing up the Ambari database with Postgres

Re: HDPCA AWS ssh asks for a password

Re: Default values for connections to Phoenix/HBas...

Default values for connections to Phoenix/HBase ?