About dkozlowski

dkozlowski · ‎02-23-2017

@Oriane Try option "Reward User"

dkozlowski · ‎02-23-2017

Thanks @Jay SenSharma

dkozlowski · ‎02-23-2017

@Oriane I am glad you have this working now. If you believe I helped, please vote up my answer and select as best one 🙂

dkozlowski · ‎02-23-2017

@Oriane For the "prefix not found" - double-check if you have spark interpreter binded in that notebook. See my screenshot - Spark needs to be "blue"

dkozlowski · ‎02-23-2017

@Oriane Do exactly this: - in the new section type: %spark - press <Enter> button - type: sc.version - press <Enter> button Now, run it Does this help? I am asking as noticed that the copied code causing issues.

dkozlowski · ‎02-23-2017

@Oriane Can you provide the following: 1. As @Bernhard Walter already asked, can you attach the screenshot of your spark interpreter config from Zeppelin UI 2. Create a new Notebook and run the below and send the output: %sh whoami 3. Can you attach the output of $ ls -lrt /usr/hdp/current/zeppelin-server/local-repo 4. Is your cluster Kerberized?

dkozlowski · ‎02-20-2017

OK - I get this working now. If anyone interested, here you are: val df = sqlContext.sql("SELECT * from table1") val tempResult = df.filter(df("field1") > 10) tempResult.write.mode("overwrite").saveAsTable("default.new_table") val df1 = sqlContext.sql("SELECT * from default.new_table") df1.show() NOTE: the "new_table" table can but does not need to exist before writing to it

dkozlowski · ‎02-20-2017

Problem Interpreters do not work through zeppelin with no internet access. Checking out /usr/hdp/current/zeppelin-server/local-repo - the folder is empty or contains “org” folder with no jars Running hive through Zeppelin returns %jdbc(hive) show tables; ... org.apache.hive.jdbc.HiveDriver class java.lang.ClassNotFoundException ... Solution The permanent solution is currently planned to be delivered in HDP 2.6. Here is a workaround to follow: a) I have tarred /usr/hdp/current/zeppelin-server/local-repo into zeppelin-local-repo.tar.gz and located on https://drive.google.com/drive/folders/0B-YVWxQz56HubWhUdEdVWGZ1Mms?usp=sharing. As this is my google drive I can allow you to access it after receiving the request from you b) download the file into /tmp/zeppelin folder c) extract it - this should created local-repo folder. So, you will get /tmp/zeppelin/local-repo with all the subfolders in it d) copy the content of /tmp/zeppelin/local-repo into /usr/hdp/current/zeppelin-server/local-repo e) change the owner of local-repo and all its folders/file to zeppelin:hadoop (or zeppelin:hdfs <- whatever the GROUP you have) f) change the permissions local-repo and all its folders/file to 755 g) restart Zeppelin service NOTE: step a) is something you can do yourself. Just install the environment on a temporary machine with internet access and get the content of /usr/hdp/current/zeppelin-server/local-repo from there.

dkozlowski · ‎02-20-2017

I have got the following: val df = sqlContext.sql("SELECT * from table1") var tempResult = df.filter(df("field1") > 10) I have also already created another table - table2 - with the same structure as table1. How can I save/insert the result of tempResult into table2?

dkozlowski · ‎02-14-2017

@Srikanth Puli Please, have a look at this: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE For a non-partitioned table, you can issue the command: ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS; to gather column statistics of the table (Hive 0.10.0 and later). If Table1 is a partitioned table, then for basic statistics you have to specify partition specifications like above in the analyze statement. Otherwise a semantic analyzer exception will be thrown. However for column statistics, if no partition specification is given in the analyze statement, statistics for all partitions are computed. You can view the stored statistics by issuing the DESCRIBE command. Statistics are stored in the Parameters array. Suppose you issue the analyze command for the whole table Table1, then issue the command: DESCRIBE EXTENDED TABLE1; then among the output, the following would be displayed: ... , parameters:{numPartitions=4, numFiles=16, numRows=2000, totalSize=16384, ...}, .... I hope this helps.

Online	Offline
Last Visited	‎02-06-2018 06:34 AM

Member Since	‎03-25-2016 06:26 AM
Last Visited	‎02-06-2018 06:34 AM
Posts	142
Kudos received	48

Cloudera Community

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

Re: Can Kafka handle the mixture of authentication...

Re: How do I automate setting up LDAP in Ambari?

Re: Does jar files missing for spark interpreter?

Re: How to save results from dataframe into a sepa...

Re: Does jar files missing for spark interpreter?

Re: Does jar files missing for spark interpreter?

Re: Does jar files missing for spark interpreter?

Re: Does jar files missing for spark interpreter?

Re: Does jar files missing for spark interpreter?

Re: Does jar files missing for spark interpreter?

Re: How to save results from dataframe into a sepa...

Zeppelin does not get installed properly without i...

How to save results from dataframe into a separate...

Re: Hive - Get number of rows, total size resulted...