About smartninja723

smartninja723 · ‎06-26-2017

Hi, On HDP 2.6, I have configured Spark Thrift Server for Spark 1.6.x based on community wiki with which I see the queries are executed as the end user, User when connects to Spark Thrift Server for Spark-1, using beeline, I see a new instance of the process getting listed under Resource Manager with end user ( YARN Process runs as end user) Now I am trying to configure Spark2 Thrift Server, following the official documentation . Added hive.server2.enable.doAs=true Added spark.jars to classpath (dataNucleus jars) Set the spark.master to local Restarted Spark2 Thrift Server In my understanding Now, Queries should run as "end user" (Queries are still running as hive user) Spark2 Thrift Server when connected with spark.master=local, should be listed under Resource Manager UI with "end user" ( I do not see it listed now) When all the JDBC connections to STS are closed, STS application should disappear. As the STS is started in local mode and for each user/connection if not shred, queries are executed by the Spark Application Master launched on behalf of the end user. Above all are not respected in Spark2 Thrift Server ( But with impersonation support in Spark-1 Thrift Server, above all three are working as expected). Attaching the screenshots for anomalies. I am not sure if I missed something here. 1 Queries still run as Hive. 2. STS is not listed under Resource Manager 3. Spark2 Thrift Server Still Runs As Hive User Thanks in advance. Regards, SS Any inputs? @cdraper, @amcbarnett, @Ana Gillan ?

smartninja723 · ‎06-22-2017

Hi @Bala Vignesh N V, I have similar issue. have done the above settings, but this does not help. I have posted a question on HCC : https://community.hortonworks.com/questions/109365/controlling-number-of-small-files-while-inserting.html.

smartninja723 · ‎06-22-2017

Hi, We do "insert into 'target_table' select a,b,c from x where .." kind of queries for a nightly load. This insert goes in a new partition of the target_table. Now the concern is : this inserts load hardly any data ( I would say less than 128 MB per day) but 1200 files. Each file in few KiloBytes. This is slowing down the performance. How can we make sure, this load does not generate lot of small files? I have already set : hive.merge.mapfiles and hive.merge.mapredfiles to true in custom/advanced hive-site.xml. But still the load job loads data with 1200 small files. I know why 1200 is, this is the value of maximum number of reducers/containers available in one of the hive-sites. (I do not think its a good idea to do cluster wide setting, as this can affect other jobs which can use cluster when it has free containers) What could be other way/settings, so that the hive insert do not take 1200 slots and generate lots of small files? I also have another question which is partly contrary to above : (This is relatively less important) When I reload this table by creating another table by doing select on target table, this newly created table does not contain too many small files. What could be the reason?

smartninja723 · ‎06-16-2017

Well, this worked "As is" in North Virginia region. ! Earlier I was using a different region.

smartninja723 · ‎04-26-2017

Hi @William Gonzalez, I cleared the exam on 26/March/2017,I have not had received any communication from Hortonworks about the badge? After that I wrote and cleared HDPCA on 23/April, for HDPCA I got the digital badge but not for HCA. Wrote 4 emails to certification at hortonwork dot com. Got the ticket numbers from zendesk! But unfortunately I have failed to receive any communication! Kindly help. Best regards.

smartninja723 · ‎04-19-2017

Hi Gurus, Following the Hortonworks documentation : https://2xbbhjxc6wk3v21p62t8n4d4-wpengine.netdna-ssl.com/wp-content/uploads/2015/04/HDPCA-PracticeExamGuide.pdf. I selected HDCPA IMI., C3.4x instnace type, Created a security group with incoming traffic from all the addresses on port 5901, 9999, and 8888 (last two are not in documentation but wanted to make sure my instance runs ). Ok Now, as per the instruction I am trying to connect to the instance using VNC viewer. I copy paste the DNS name/IP from the intance's public DNS/IP columns. And use DNSName:5091 or DNSName:9999 or IP:5901 etc. Open Ports for Incoming Traffic 8888 0.0.0.0/0, ::/0 tcp 9999 0.0.0.0/0, ::/0 tcp 22 0.0.0.0/0, ::/0 tcp 5901 0.0.0.0/0, ::/0 tcp It does not work. Every time I see Cannot establish connection. Are you sure you have entered the correct network address, and port number if necessary?

smartninja723 · ‎01-24-2017

@Prajwal Kumar Did not work for me. Removing it started giving new errors : ERROR [2017-01-24 15:10:47,037] ({pool-2-thread-2} LivyHelper.java[createSession]:128) - Error getting session for user java.lang.Exception: Cannot start spark. at org.apache.zeppelin.livy.LivyHelper.createSession(LivyHelper.java:117) at org.apache.zeppelin.livy.LivySparkInterpreter.interpret(LivySparkInterpreter.java:101)

smartninja723 · ‎01-24-2017

I am also facing this. @Edgar Daeds did you get it working? Thanks.

smartninja723 · ‎01-24-2017

HI @jzhang , did you get a chance to look at it? Thanks.

smartninja723 · ‎01-23-2017

I get the stracktrace in the UI, but not in the logs. Copying the Notebook UI output. %livy.spark sqlContext sqlContext.sql("show databases").show() res20: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@52c70144 zeppelinuioutput.txt

Online	Offline
Last Visited	‎08-14-2019 10:39 AM

Member Since	‎02-24-2016 02:02 PM
Last Visited	‎08-14-2019 10:39 AM
Posts	175
Kudos received	56

Cloudera Community

Re: HDPCA Practice Exam VM not able to connect

Re: Can we not have HS2 and Spark Thrift Server (S...

Re: Weird error while converting RDD[CaseClass] to...

User impersonation in Apache Spark2 Thrift Server

Re: Hive Multiple Small Files

Controlling Number of small files while inserting ...

Re: HDPCA Practice Exam VM not able to connect

Re: HCA digital badge?

HDPCA Practice Exam VM not able to connect

Re: Zeppelin with kerberized Livy not working

Re: LivyServer exception

Re: Listing Databases from Spark SQL from Zeppelin...

Re: Listing Databases from Spark SQL from Zeppelin...