About kkhan

samsheer · ‎03-21-2020

I do have similar issue to connect hive from squirrel i use the Beeline version 3.1.0.3.0.1.0-187 by connecting Hortonworks Image thru VM Here are the jars added ,but I am having connection got refused with error "Unexpected Error occurred attempting to open an SQL connection.class java.net.ConnectException: Connection refused: connect" hive-jdbc-3.1.0.3.0.1.0-187.jar hive-jdbc-3.1.0.3.0.1.0-187-sources.jar hive-jdbc-3.1.0.3.0.1.0-187-standalone.jar Jdbc URL jdbc:hive2://sandbox-hdp.hortonworks.com:2181/default Any idea how to fix?

kkhan · ‎10-04-2019

In case, you get below error, make sure you use Nifi host FQDN in API call and NOT IP address. Also, make sure DNS is configured correctly. <body><h2>HTTP ERROR 401</h2> <p>Problem accessing /nifi-api/access/kerberos. Reason: <pre> Unauthorized</pre>

munnyrahul · ‎02-16-2017

@Jay SenSharma Following the link below i have added all the resp. properties in custom core-site.xml(Ambari) But no success. http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_views_guide/content/_configuring_your_cluster_for_files_view.html i have added 4 more properties named as hadoop.proxyuser.root.groups=* hadoop.proxyuser.root.hosts=* hadoop.proxyuser.admin.groups=* hadoop.proxyuser.admin.hosts=*

bpreachuk · ‎11-04-2017

Hi @Jeff Watson. You are correct about SAS use of String datatypes. Good catch! One of my customers also had to deal with this. String datatype conversions can perform very poorly in SAS. With SAS/ACCESS to Hadoop you can set the libname option DBMAX_TEXT (added with SAS 9.4m1 release) to globally restrict the character length of all columns read into SAS. However for restricting column size SAS does specifically recommends using the VARCHAR datatype in Hive whenever possible. http://support.sas.com/documentation/cdl/en/acreldb/67473/HTML/default/viewer.htm#n1aqglg4ftdj04n1eyvh2l3367ql.htm Use Case Large Table, All Columns of Type String: Table A stored in Hive has 40 columns, all of type String, with 500M rows. By default, SAS Access converts String to $32K. So, 32K in length for char. The math for this size table yields 1.2MB row length x 500M rows. This causes the system to come to a halt - Too large to store in LASR or WORK. The following techniques can be used to work around the challenge in SAS, and they all work: Use char and varchar in Hive instead of String. Set the libname option DBMAX_TEXT to globally restrict the character length of all columns read in In Hive do "SET TBLPROPERTIES SASFMT" to add formats for SAS on schema in HIVE. Add formatting to SAS code during inbound reads example: Sequence Length 8 Informat 10. format 10. I hope this helps.

bleonhardi · ‎07-28-2016

1. Is there a way to restrict max size that users can use for Spark executor and Driver when submitting jobs on Yarn cluster? You can set an upper limit for all task ( yarn max allocation mb or similar in yarn-site.xml ). But there is no way I am aware of to specifically restrcit spark applications or applications in one queue. 2. What the best practice around determining number of executor required for a job? Its a good question. There was an interesting presentation about that. The conclusion for executor size is: "It depends but usually 10-40GB and 3-6 cores per executor is a good limit. " A max number of executors is not that easy it depends on the amount of data you want to analyze and the speed you need. So let's assume you have 4 cores per executor and he can run 8 tasks in each and you want to analyze 100GB of data and you say you want around 128MB or one block per executor so you would need a thousand tasks in total. To run them all at the same time you could go up to 100 executors for max. performance but you can also make it smaller. It would then be slower. Bottomline its not unlike a mapreduce task. If you want a rule of thumb then the upper limit is data amount / hdfs block size / number of cores per executor x 2. More will not help you much. http://www.slideshare.net/HadoopSummit/running-spark-in-production-61337353 Is there a max limit that users can be restricted to? You can use yarn to create a queue for your spark users. There is a yarn parameter user limit which allows you to restrict a single user from having more than a specific amount of a queue. user-limit = 0.25 for example would restrcit a user from taking more than 25% of the queue. Or you could give every user a queue. 3. How RM handles resource allocation if most of the resources are consumed by Spark jobs in a queue? How preemption is handled? Like with any other task in yarn? Spark is not special. Preemption with Spark will kill executors and that is not great for spark ( although it can survive it for a while. ) I would avoid preemption if I could

rlevas · ‎07-26-2016

Another source for master/slave configuration is https://web.mit.edu/kerberos/krb5-1.12/doc/admin/install_kdc.html

pacosoplas · ‎06-08-2016

Hi: its ok now, its work with this; --query "select ID_INTERNO_PE,MI_FECHA_FIN_MES,COD_NRBE_EN,COD_LINEA,ID_GRP_PD,MI_NUM_TOT_AC_ACT,MI_NUM_AC_SUS,MI_SDO_AC_P,MI_NUM_AC_P,MI_DIA_AC_P,MI_INT_DEV_ACR_D,MI_INT_DEV_DEU_D,MI_COMIS_APL_D,MI_TOT_MOV_D,MI_TOT_MOV_H,MI_TOT_IMP_MOV_D,MI_TOT_IMP_MOV_H from RDWC01.MI_CLTE_ECO_GEN where \$CONDITIONS AND COD_RL_PERS_AC = 01 AND COD_LINEA in ('01','03','04','05') AND COD_NRBE_EN = '3159' AND TRUNC(MI_FECHA_FIN_MES) >=TO_DATE('2010-01-01', 'YYYY-MM-DD')" \ NoteIf you are issuing the query wrapped with double quotes ("), you will have to use \$CONDITIONS instead of just $CONDITIONS to disallow your shell from treating it as a shell variable. For example, a double quoted query may look like: "SELECT * FROM x WHERE a='foo' AND \$CONDITIONS" Many thanks all of you.

jyadav · ‎05-18-2016

@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.

kkhan · ‎05-04-2016

Thanks Maxwell. I was using 2 days date range whereas data was only available for few hours. I was thinking it will still respond with whatever data it has. Now I ran with 2 mins range and it came back with metrics. However, this data looks very granular. Couple of questions for you: Is there a way to get 15 mins aggregate kind of metrics? Right now, it looks like every 30 second data is available. What is the use of last parameter in query (i.e. current timestamp) How does the step param(last parameter in date range query) work in time range query.

Online	Offline
Last Visited	‎01-19-2021 12:46 PM

Member Since	‎04-06-2016 08:12 PM
Last Visited	‎01-19-2021 12:46 PM
Posts	47
Kudos received	7

Cloudera Community

Re: Connecting to Hive from SquirreL SQL client

Re: Hive: hash function unique values

Re: Is there a recommended architecture/methods to...

Re: Connecting to Hive from SquirreL SQL client

Re: Use a Kerberos token for accessing NiFi REST A...

Re: Unable to open File View from Ambari UI

Re: Hive STRING vs VARCHAR Performance

Re: Is it possible to define max memory size for S...

Re: Is there a recommended architecture/methods to...

Re: sqoop import

Re: Can we change yarn.nodemanager.log-dirs value ...

Re: Not able to retrieve average load Ambari metri...