Member since
02-24-2016
175
Posts
56
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1355 | 06-16-2017 10:40 AM | |
11710 | 05-27-2016 04:06 PM | |
1324 | 03-17-2016 01:29 PM |
06-26-2017
03:42 PM
Hi, On HDP 2.6, I have configured Spark Thrift Server for Spark 1.6.x based on community wiki with which I see the queries are executed as the end user, User when connects to Spark Thrift Server for Spark-1, using beeline, I see a new instance of the process getting listed under Resource Manager with end user ( YARN Process runs as end user) Now I am trying to configure Spark2 Thrift Server, following the official documentation .
Added hive.server2.enable.doAs=true Added spark.jars to classpath (dataNucleus jars) Set the spark.master to local Restarted Spark2 Thrift Server In my understanding Now,
Queries should run as "end user" (Queries are still running as hive user) Spark2 Thrift Server when connected with spark.master=local, should be listed under Resource Manager UI with "end user" ( I do not see it listed now) When all the JDBC connections to STS are closed, STS application should disappear. As the STS is started in local mode and for each user/connection if not shred, queries are executed by the Spark Application Master launched on behalf of the end user. Above all are not respected in Spark2 Thrift Server ( But with impersonation support in Spark-1 Thrift Server, above all three are working as expected). Attaching the screenshots for anomalies. I am not sure if I missed something here. 1 Queries still run as Hive. 2. STS is not listed under Resource Manager 3. Spark2 Thrift Server Still Runs As Hive User Thanks in advance. Regards, SS Any inputs? @cdraper, @amcbarnett, @Ana Gillan ?
... View more
Labels:
- Labels:
-
Apache Spark
06-22-2017
06:11 AM
Hi @Bala Vignesh N V, I have similar issue. have done the above settings, but this does not help. I have posted a question on HCC : https://community.hortonworks.com/questions/109365/controlling-number-of-small-files-while-inserting.html.
... View more
06-22-2017
05:51 AM
Hi, We do "insert into 'target_table' select a,b,c from x where .." kind of queries for a nightly load. This insert goes in a new partition of the target_table. Now the concern is : this inserts load hardly any data ( I would say less than 128 MB per day) but 1200 files. Each file in few KiloBytes. This is slowing down the performance. How can we make sure, this load does not generate lot of small files? I have already set : hive.merge.mapfiles and hive.merge.mapredfiles to true in custom/advanced hive-site.xml. But still the load job loads data with 1200 small files. I know why 1200 is, this is the value of maximum number of reducers/containers available in one of the hive-sites. (I do not think its a good idea to do cluster wide setting, as this can affect other jobs which can use cluster when it has free containers) What could be other way/settings, so that the hive insert do not take 1200 slots and generate lots of small files? I also have another question which is partly contrary to above : (This is relatively less important) When I reload this table by creating another table by doing select on target table, this newly created table does not contain too many small files. What could be the reason?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez
06-16-2017
10:40 AM
Well, this worked "As is" in North Virginia region. ! Earlier I was using a different region.
... View more
04-26-2017
08:32 PM
Hi @William Gonzalez, I cleared the exam on 26/March/2017,I have not had received any communication from Hortonworks about the badge? After that I wrote and cleared HDPCA on 23/April, for HDPCA I got the digital badge but not for HCA. Wrote 4 emails to certification at hortonwork dot com. Got the ticket numbers from zendesk! But unfortunately I have failed to receive any communication! Kindly help. Best regards.
... View more
04-19-2017
09:02 PM
Hi Gurus, Following the Hortonworks documentation : https://2xbbhjxc6wk3v21p62t8n4d4-wpengine.netdna-ssl.com/wp-content/uploads/2015/04/HDPCA-PracticeExamGuide.pdf. I selected HDCPA IMI., C3.4x instnace type, Created a security group with incoming traffic from all the addresses on port 5901, 9999, and 8888 (last two are not in documentation but wanted to make sure my instance runs ). Ok Now, as per the instruction I am trying to connect to the instance using VNC viewer. I copy paste the DNS name/IP from the intance's public DNS/IP columns. And use DNSName:5091 or DNSName:9999 or IP:5901 etc. Open Ports for Incoming Traffic 8888 0.0.0.0/0, ::/0 tcp 9999 0.0.0.0/0, ::/0 tcp 22 0.0.0.0/0, ::/0 tcp 5901 0.0.0.0/0, ::/0 tcp It does not work. Every time I see Cannot establish connection. Are you sure you have entered the correct network address, and port number if necessary?
... View more
Labels:
- Labels:
-
Security
01-24-2017
03:12 PM
@Prajwal Kumar Did not work for me. Removing it started giving new errors : ERROR [2017-01-24 15:10:47,037] ({pool-2-thread-2} LivyHelper.java[createSession]:128) - Error getting session for user
java.lang.Exception: Cannot start spark.
at org.apache.zeppelin.livy.LivyHelper.createSession(LivyHelper.java:117)
at org.apache.zeppelin.livy.LivySparkInterpreter.interpret(LivySparkInterpreter.java:101)
... View more
01-24-2017
02:51 PM
I am also facing this. @Edgar Daeds did you get it working? Thanks.
... View more
01-24-2017
09:40 AM
HI @jzhang , did you get a chance to look at it? Thanks.
... View more
01-23-2017
02:09 PM
I get the stracktrace in the UI, but not in the logs. Copying the Notebook UI output. %livy.spark sqlContext sqlContext.sql("show databases").show() res20: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@52c70144 zeppelinuioutput.txt
... View more