About ztalas1979

ztalas1979 · ‎11-29-2017

finally i was able to to it: #code: df.write.format('com.databricks.spark.csv').mode('overwrite').option("header", "false").option("quoteMode", "ALL").save(output_path) #call: --packages com.databricks:spark-csv_2.10:1.5.0

ztalas1979 · ‎11-28-2017

I am trying to save a csv to hdfs using quotes, but it does not work. Could you please suggest what i am doing wrong code: df.write.format('com.databricks.spark.csv').mode('overwrite').option("header", "false").option("quote","\"").save(output_path) i am calling it with the following: --packages com.databricks:spark-csv_2.10:1.5.0 --repositories http://repo.hortonworks.com/content/groups/public/ also tried some other versions, but still writes it without quoting. also tested with other quote character. #output: aaa,bbb #expected output: "aaa","bbb" thanks

ztalas1979 · ‎11-26-2017

found the issue. It is exactly the same problem mentioned in the below link: https://github.com/tweag/sparkle/issues/105 went back to spark 1.6 and it works

ztalas1979 · ‎11-26-2017

https://github.com/tweag/sparkle/issues/105 it does not seem a zookeeper issue for me. here they discuss about the same error message.

ztalas1979 · ‎11-26-2017

still the same see answer above. can it cause the problem that i installed phoenix on own on sandbox? i just realized that there is a phoenix enabler at hbase on ambari. i just restart my docker image from scratch and use the built in version. is there any config i should set or enough to enable it and restart hbase? which jar i should use at call the version number one or the phoenix-client.jar one? thanks in advance checked with the built in phoenix service. the same issue

ztalas1979 · ‎11-26-2017

i changed the localhost to sandbox.hortonworks.com (that i had in the conf file for zookeper) code: import phoenix from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext(appName="Phoenix loader") sqlContext = SQLContext(sc) table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load() but still getting the same error: 17/11/26 11:27:39 INFO MetricsSystemImpl: phoenix metrics system started Traceback (most recent call last): File "/root/hdp/phoenix_test2.py", line 8, in <module> table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load() File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 165, in load File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o42.load. : java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/DataFrame; at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:117)

ztalas1979 · ‎11-26-2017

copied phoenix client and server jars to spark2 and hbase folders as well and also updated the spark conf file with classes

ztalas1979 · ‎11-26-2017

hi, I am trying to put together a data pipeline on HDP 2.6.3 sandbox.(docker) I am using pyspark with phoenix (4.7) and HBase. I have installed phoenix project from maven and successfully created a table with test records. I can see data in Hbase as well. Now i am trying to read data from the table using pyspark with the following code: import phoenix from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext(appName="Phoenix test") sqlContext = SQLContext(sc) table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "localhost:2181:/hbase-unsecure").load() phoenix ddl: CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER); UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (1, 'test_row_1',111); UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (2, 'test_row_2',111 ); call: spark-submit --verbose --class org.apache.phoenix.spark --jars /usr/hdp/current/phoenix-server/phoenix-4.7.0.2.5.0.0-1245-client.jar http://repo.hortonworks.com/content/groups/public/ --files /etc/spark2/conf/hbase-site.xml phoenix_test.py error_message: Traceback (most recent call last): File "/root/hdp/process_data.py", line 42, in <module> .format(data_source_format)\ File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 593, in save File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/java_gateway.py", line 1160, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/protocol.py", line 320, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling o55.save. : java.lang.UnsupportedOperationException: empty.tail thanks in advance

Online	Offline
Last Visited	‎12-02-2017 06:47 PM

Member Since	‎11-25-2017 03:26 PM
Last Visited	‎12-02-2017 06:47 PM
Posts	10

Cloudera Community

Re: Quote data at writing to HDFS does not work

Re: Error reading/writing to Phoenix table from py...

Re: Quote data at writing to HDFS does not work

Quote data at writing to HDFS does not work

Re: Error reading/writing to Phoenix table from py...

Re: Error reading/writing to Phoenix table from py...

Re: Error reading/writing to Phoenix table from py...

Re: Error reading/writing to Phoenix table from py...

Re: Error reading/writing to Phoenix table from py...

Error reading/writing to Phoenix table from pyspar...