Member since
11-25-2017
10
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2348 | 11-29-2017 11:57 AM | |
3735 | 11-26-2017 06:02 PM |
11-29-2017
11:57 AM
finally i was able to to it: #code:
df.write.format('com.databricks.spark.csv').mode('overwrite').option("header", "false").option("quoteMode", "ALL").save(output_path)
#call:
--packages com.databricks:spark-csv_2.10:1.5.0
... View more
11-28-2017
11:21 PM
I am trying to save a csv to hdfs using quotes, but it does not work. Could you please suggest what i am doing wrong code: df.write.format('com.databricks.spark.csv').mode('overwrite').option("header", "false").option("quote","\"").save(output_path) i am calling it with the following: --packages com.databricks:spark-csv_2.10:1.5.0 --repositories http://repo.hortonworks.com/content/groups/public/ also tried some other versions, but still writes it without quoting. also tested with other quote character. #output:
aaa,bbb
#expected output:
"aaa","bbb" thanks
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
11-26-2017
06:02 PM
found the issue. It is exactly the same problem mentioned in the below link: https://github.com/tweag/sparkle/issues/105 went back to spark 1.6 and it works
... View more
11-26-2017
03:43 PM
https://github.com/tweag/sparkle/issues/105 it does not seem a zookeeper issue for me. here they discuss about the same error message.
... View more
11-26-2017
03:25 PM
still the same see answer above. can it cause the problem that i installed phoenix on own on sandbox? i just realized that there is a phoenix enabler at hbase on ambari. i just restart my docker image from scratch and use the built in version. is there any config i should set or enough to enable it and restart hbase? which jar i should use at call the version number one or the phoenix-client.jar one? thanks in advance checked with the built in phoenix service. the same issue
... View more
11-26-2017
11:29 AM
i changed the localhost to sandbox.hortonworks.com (that i had in the conf file for zookeper) code: import phoenix
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext(appName="Phoenix loader")
sqlContext = SQLContext(sc)
table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load() but still getting the same error: 17/11/26 11:27:39 INFO MetricsSystemImpl: phoenix metrics system started
Traceback (most recent call last):
File "/root/hdp/phoenix_test2.py", line 8, in <module>
table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "sandbox-hdp.hortonworks.com:2181:/hbase-unsecure").load()
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 165, in load
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o42.load.
: java.lang.NoSuchMethodError: org.apache.spark.sql.SQLContext.createDataFrame(Lorg/apache/spark/rdd/RDD;Lorg/apache/spark/sql/types/StructType;)Lorg/apache/spark/sql/DataFrame;
at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:117)
... View more
11-26-2017
12:57 AM
copied phoenix client and server jars to spark2 and hbase folders as well and also updated the spark conf file with classes
... View more
11-26-2017
12:57 AM
hi,
I am trying to put together a data pipeline on HDP 2.6.3 sandbox.(docker) I am using pyspark with phoenix (4.7) and HBase.
I have installed phoenix project from maven and successfully created a table with test records. I can see data in Hbase as well.
Now i am trying to read data from the table using pyspark with the following code:
import phoenix
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext(appName="Phoenix test")
sqlContext = SQLContext(sc)
table = sqlContext.read.format("org.apache.phoenix.spark").option("table", "INPUT_TABLE").option("zkUrl", "localhost:2181:/hbase-unsecure").load()
phoenix ddl:
CREATE TABLE INPUT_TABLE (id BIGINT NOT NULL PRIMARY KEY, col1 VARCHAR, col2 INTEGER);
UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (1, 'test_row_1',111);
UPSERT INTO INPUT_TABLE (id, col1, col2) VALUES (2, 'test_row_2',111 );
call:
spark-submit --verbose --class org.apache.phoenix.spark --jars /usr/hdp/current/phoenix-server/phoenix-4.7.0.2.5.0.0-1245-client.jar http://repo.hortonworks.com/content/groups/public/ --files /etc/spark2/conf/hbase-site.xml phoenix_test.py
error_message: Traceback (most recent call last):
File "/root/hdp/process_data.py", line 42, in <module>
.format(data_source_format)\
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 593, in save
File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/java_gateway.py", line 1160, in __call__
answer, self.gateway_client, self.target_id, self.name)
File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/usr/lib/python2.6/site-packages/py4j-0.10.6-py2.6.egg/py4j/protocol.py", line 320, in get_return_value
format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o55.save.
: java.lang.UnsupportedOperationException: empty.tail thanks in advance
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Phoenix