About falbani

falbani · ‎08-24-2018

@Manikandan Jeyabal The problem perhaps could be at the project level then. Could you check your pom file and make sure you have all necessary spark depenedencies. I tried this in zeppelin ui and is working fine: Also make sure you clean/build and perhaps exit eclipse just in case there is something wrong with eclipse. Finally here is a link on how to setup depenedecies eclipse: https://community.hortonworks.com/articles/147787/how-to-setup-hortonworks-repository-for-spark-on-e.html HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-24-2018

@Manikandan Jeyabal Perhaps you can try this out: import org.apache.spark.sql.Encoders case class Airlines(Airline_id: Integer, Name: String, Alias: String, IATA: String, ICAO: String, Callsign: String, Country: String, Active: String) Encoders.product[Airlines].schema Also there are some examples of use of case class in the following example: https://github.com/apache/spark/blob/master/examples/src/main/scala/org/apache/spark/examples/sql/SparkSQLExample.scala Let me know if this helps! *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-23-2018

@Quan Pham If cluster is not secured / kerberized and you have not configured any alternative authentication method like ldap then this is exactly what you will experience. You should consider securing your cluster by using kerberos authentication. You can enable kerberos using ambari. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-22-2018

@Guozhen Li In yarn client mode the client machine - Windows machine needs to have network access to any of the cluster worker nodes (on any of the executors and AM could potentially run) and vise versa, the executors should be able to connect to the driver running on the windows client machine - I think you are right that this may be due firewall or network problem. HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-21-2018

@Sudharsan Ganeshkumar If the above has helped, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-21-2018

@Sundar Gampa If the above helped please remember to login and click the "accept" link on the answer.

falbani · ‎08-17-2018

@Sundar Gampa That path looks like the spark container working directory. Am I correct? This is taken from yarn configuration property yarn.nodemanager.local-dirs Out of the box spark provides ways to copy data to this directory by using --files --jar --archive arguments when running the spark-submit command. You can read more about those here: https://spark.apache.org/docs/latest/running-on-yarn.html Having that said if you like to add the directory resdata you simply need to zip the files you like to be part of the directory and add the zip file as spark-submit ... --files resdata.zip#resdata ... HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

falbani · ‎08-17-2018

@Xiong Duan Based on the error stack I think it could be due missing configuration. It would be helpful if you share your workflow file and properties file. Have you tried using shell action instead ?

falbani · ‎08-17-2018

@Narendra Dev When you defined your collection fields, did you added the path as a filed? Or you are using dynamic fields? As an example on your core managed-schema you should have a filed like this: <field name="path" type="string" indexed="true" stored="true" required="true" multiValued="false" /> If you are using dynamic fields make sure the field type is stored otherwise it wont be retrieved when you search. And last but not least you should make sure you are passing the path as part of the json/xml that is being put to solr. HTH

falbani · ‎08-17-2018

@Sudharsan Ganeshkumar AFAIK the csv format is not compatible between spark sql and hive serde and hence the error you are getting. A solution to this problem would be to: 1. create an external table pointing to the path where you will save the csv file 2. save the csv file instead of using saveAsTable function spark.sql("CREATE EXTERNAL TABLE Student_Spark2(col1 int,col2 string) STORED AS TEXTFILE LOCATION '/path/in/hdfs'") //later save rddstudent.write.format("csv").save("/path/in/hdfs/student_spark2") HTH *** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Online	Offline
Last Visited	‎10-24-2023 05:42 PM

Member Since	‎06-09-2016 09:21 PM
Last Visited	‎10-24-2023 05:42 PM
Posts	529
Kudos received	129

Cloudera Community

Re: Dependency of HDP Atlas on Ranger

Re: Spark throws "Invalid Sync" Error when trying ...

Re: Does HS2 integration with AD impact zeppelin c...

Re: zeppelin jdbc interpreter issue when HS2 is i...

Re: Accessing hive database outside the cluster ne...

Re: spark + scala schema creattion error

Re: spark + scala schema creattion error

Re: Authentication and authorization Spark Thrift ...

Re: Spark YARN cluster + Windows client, deploy-mo...

Re: I am trying to create a table in spark. Below ...

Re: Spark create a temp directory structure on eac...

Re: Spark create a temp directory structure on eac...

Re: spark2 job on oozie

Re: SolR query and document download

Re: I am trying to create a table in spark. Below ...