Member since
04-06-2016
47
Posts
7
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5985 | 12-02-2016 10:20 PM | |
4626 | 11-23-2016 08:59 PM | |
1152 | 07-26-2016 03:11 AM |
03-21-2020
07:27 AM
I do have similar issue to connect hive from squirrel i use the Beeline version 3.1.0.3.0.1.0-187 by connecting Hortonworks Image thru VM Here are the jars added ,but I am having connection got refused with error "Unexpected Error occurred attempting to open an SQL connection.class java.net.ConnectException: Connection refused: connect" hive-jdbc-3.1.0.3.0.1.0-187.jar hive-jdbc-3.1.0.3.0.1.0-187-sources.jar hive-jdbc-3.1.0.3.0.1.0-187-standalone.jar Jdbc URL jdbc:hive2://sandbox-hdp.hortonworks.com:2181/default Any idea how to fix?
... View more
10-04-2019
04:19 PM
In case, you get below error, make sure you use Nifi host FQDN in API call and NOT IP address. Also, make sure DNS is configured correctly. <body><h2>HTTP ERROR 401</h2>
<p>Problem accessing /nifi-api/access/kerberos. Reason:
<pre> Unauthorized</pre>
... View more
02-16-2017
09:56 AM
@Jay SenSharma Following the link below i have added all the resp. properties in custom core-site.xml(Ambari) But no success. http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_views_guide/content/_configuring_your_cluster_for_files_view.html i have added 4 more properties named as hadoop.proxyuser.root.groups=*
hadoop.proxyuser.root.hosts=*
hadoop.proxyuser.admin.groups=*
hadoop.proxyuser.admin.hosts=*
... View more
11-04-2017
12:19 PM
Hi @Jeff Watson. You are correct about SAS use of String datatypes. Good catch! One of my customers also had to deal with this. String datatype conversions can perform very poorly in SAS. With SAS/ACCESS to Hadoop you can set the libname option DBMAX_TEXT (added with SAS 9.4m1 release) to globally restrict the character length of all columns read into SAS. However for restricting column size SAS does specifically recommends using the VARCHAR datatype in Hive whenever possible. http://support.sas.com/documentation/cdl/en/acreldb/67473/HTML/default/viewer.htm#n1aqglg4ftdj04n1eyvh2l3367ql.htm Use Case
Large Table, All Columns of Type String: Table A stored in Hive has 40 columns, all of type String, with 500M rows. By default, SAS Access converts String to $32K. So, 32K in length for char. The math for this size table yields 1.2MB row length x 500M rows. This causes the system to come to a halt - Too large to store in LASR or WORK. The following techniques can be used to work around the challenge in SAS, and they all work:
Use char and varchar in Hive instead of String. Set the libname option DBMAX_TEXT to globally restrict the character length of all columns read in In Hive do "SET TBLPROPERTIES SASFMT" to add formats for SAS on schema in HIVE. Add formatting to SAS code during inbound reads
example: Sequence Length 8 Informat 10. format 10. I hope this helps.
... View more
07-28-2016
01:09 PM
3 Kudos
1. Is there a way to restrict max size that users can use for Spark executor and Driver when submitting jobs on Yarn cluster? You can set an upper limit for all task ( yarn max allocation mb or similar in yarn-site.xml ). But there is no way I am aware of to specifically restrcit spark applications or applications in one queue. 2. What the best practice around determining number of executor required for a job? Its a good question. There was an interesting presentation about that. The conclusion for executor size is: "It depends but usually 10-40GB and 3-6 cores per executor is a good limit. " A max number of executors is not that easy it depends on the amount of data you want to analyze and the speed you need. So let's assume you have 4 cores per executor and he can run 8 tasks in each and you want to analyze 100GB of data and you say you want around 128MB or one block per executor so you would need a thousand tasks in total. To run them all at the same time you could go up to 100 executors for max. performance but you can also make it smaller. It would then be slower. Bottomline its not unlike a mapreduce task. If you want a rule of thumb then the upper limit is data amount / hdfs block size / number of cores per executor x 2. More will not help you much. http://www.slideshare.net/HadoopSummit/running-spark-in-production-61337353 Is there a max limit that users can be restricted to? You can use yarn to create a queue for your spark users. There is a yarn parameter user limit which allows you to restrict a single user from having more than a specific amount of a queue. user-limit = 0.25 for example would restrcit a user from taking more than 25% of the queue. Or you could give every user a queue. 3. How RM handles resource allocation if most of the resources are consumed by Spark jobs in a queue? How preemption is handled? Like with any other task in yarn? Spark is not special. Preemption with Spark will kill executors and that is not great for spark ( although it can survive it for a while. ) I would avoid preemption if I could
... View more
07-26-2016
01:59 PM
Another source for master/slave configuration is https://web.mit.edu/kerberos/krb5-1.12/doc/admin/install_kdc.html
... View more
06-08-2016
08:00 AM
Hi: its ok now, its work with this; --query "select ID_INTERNO_PE,MI_FECHA_FIN_MES,COD_NRBE_EN,COD_LINEA,ID_GRP_PD,MI_NUM_TOT_AC_ACT,MI_NUM_AC_SUS,MI_SDO_AC_P,MI_NUM_AC_P,MI_DIA_AC_P,MI_INT_DEV_ACR_D,MI_INT_DEV_DEU_D,MI_COMIS_APL_D,MI_TOT_MOV_D,MI_TOT_MOV_H,MI_TOT_IMP_MOV_D,MI_TOT_IMP_MOV_H from RDWC01.MI_CLTE_ECO_GEN where \$CONDITIONS AND COD_RL_PERS_AC = 01 AND COD_LINEA in ('01','03','04','05') AND COD_NRBE_EN = '3159' AND TRUNC(MI_FECHA_FIN_MES) >=TO_DATE('2010-01-01', 'YYYY-MM-DD')" \
NoteIf you are issuing the query wrapped with double quotes ("), you will have to use \$CONDITIONS instead of just $CONDITIONS to disallow your shell from treating it as a shell variable. For example, a double quoted query may look like: "SELECT * FROM x WHERE a='foo' AND \$CONDITIONS"
Many thanks all of you.
... View more
05-18-2016
11:15 AM
@Saurabh Kumar Then I can only think of increasing the yarn.nodemanager.log-dirs size by adding multiple mount points. But still i'm suspecting that something else is also occupying the space.
... View more
05-04-2016
03:16 PM
Thanks Maxwell. I was using 2 days date range whereas data was only available for few hours. I was thinking it will still respond with whatever data it has. Now I ran with 2 mins range and it came back with metrics. However, this data looks very granular. Couple of questions for you:
Is there a way to get 15 mins aggregate kind of metrics? Right now, it looks like every 30 second data is available. What is the use of last parameter in query (i.e. current timestamp) How does the step param(last parameter in date range query) work in time range query.
... View more