Member since
03-18-2020
3
Posts
0
Kudos Received
0
Solutions
03-24-2020
09:09 AM
I am using spark 2.3.2 and using pyspark to read from the hive version CDH-5.9.0-1.cdh5.9.0.p0.23 . Here is my code;
import findspark findspark.init('C:\spark-2.3.2-bin-hadoop2.7') import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.config("hive.metastore.uris", "thrift://172.30.294.196:9083").enableHiveSupport().getOrCreate() import pandas as pd sc = spark.sparkContext from pyspark import SparkContext from pyspark.sql import SQLContext sql_sc = SQLContext(sc)
SparkContext.setSystemProperty("hive.metastore.uris", "thrift://172.30.294.196:9083") sql_sc.sql("SELECT * FROM mtsods.model_result_abt limit 10")
it shows ;
DataFrame[feature: string, profile_id: string, model_id: string, value: string, score: string, rank: string, year_d: string, taxpayer: string, it_ref_no: string]
but when i use this;
df=sql_sc.sql("SELECT * FROM mtsods.model_result_abt") df.show()
It shows error as shown below:
--------------------------------------------------------------------------- Py4JJavaError Traceback (most recent call last) C:\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw) 62 try: ---> 63 return f(*a, **kw) 64 except py4j.protocol.Py4JJavaError as e:
C:\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\protocol.py in get_return_value(answer, gateway_client, target_id, name) 327 "An error occurred while calling {0}{1}{2}.\n". --> 328 format(target_id, ".", name), value) 329 else:
Py4JJavaError: An error occurred while calling o53.showString. : java.lang.IllegalArgumentException: java.net.UnknownHostException: dclvmsbigdmd01 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:258) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:251) at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:340) at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38) at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3278) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489) at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2489) at org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3259) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:77) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3258) at org.apache.spark.sql.Dataset.head(Dataset.scala:2489) at org.apache.spark.sql.Dataset.take(Dataset.scala:2703) at org.apache.spark.sql.Dataset.showString(Dataset.scala:254) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.UnknownHostException: dclvmsbigdmd01 ... 67 more
During handling of the above exception, another exception occurred:
IllegalArgumentException Traceback (most recent call last) <ipython-input-9-1a6ce2362cd4> in <module>() ----> 1 df.show()
C:\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\dataframe.py in show(self, n, truncate, vertical) 348 """ 349 if isinstance(truncate, bool) and truncate: --> 350 print(self._jdf.showString(n, 20, vertical)) 351 else: 352 print(self._jdf.showString(n, int(truncate), vertical))
C:\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py in __call__(self, *args) 1255 answer = self.gateway_client.send_command(command) 1256 return_value = get_return_value( -> 1257 answer, self.gateway_client, self.target_id, self.name) 1258 1259 for temp_arg in temp_args:
C:\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py in deco(*a, **kw) 77 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace) 78 if s.startswith('java.lang.IllegalArgumentException: '): ---> 79 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) 80 raise 81 return deco
IllegalArgumentException: 'java.net.UnknownHostException: dclvmsbigdmd01'
Please help me resolve this issue . I will be really thankful
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Cloudera Manager
03-18-2020
01:48 AM
I changed the port no from 8888 to 9083 and it is working fine but when i tried to show the query result it shows; df.show() Traceback (most recent call last): File "<ipython-input-12-1a6ce2362cd4>", line 1, in <module> df.show() File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\dataframe.py", line 350, in show print(self._jdf.showString(n, 20, vertical)) File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py", line 79, in deco raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace) IllegalArgumentException: 'java.net.UnknownHostException: quickstart.cloudera' Can you help me regarding this @HadoopHelp
... View more
03-18-2020
12:11 AM
I am using spark 2.3.2 and i am trying to read tables from database. I established spark connection.
But i am unable to read database tables from HUE cloudera and unable to query them in pyspark as well.
Here is my code,
import findspark findspark.init('C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7') import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.config("hive.metastore.uris", "thrift://10.1.1.70:8888").enableHiveSupport().getOrCreate() #spark.catalog.listTables("tp_policy_operation") import pandas as pd sc = spark.sparkContext sc
from pyspark import SparkContext from pyspark.sql import SQLContext sql_sc = SQLContext(sc) SparkContext.setSystemProperty("hive.metastore.uris", "thrift://10.1.1.70:8888") spark.sql("SELECT * FROM tp_policy_operation")
######The error i am getting
Traceback (most recent call last):
File "<ipython-input-4-8f0aa5852b01>", line 16, in <module> spark.sql("SELECT * FROM tp_policy_operation")## Database ?
File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\session.py", line 710, in sql return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\lib\py4j-0.10.7-src.zip\py4j\java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name)
File "C:\spark-2.3.2-bin-hadoop2.7\spark-2.3.2-bin-hadoop2.7\python\pyspark\sql\utils.py", line 69, in deco raise AnalysisException(s.split(': ', 1)[1], stackTrace)
AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.thrift.transport.TTransportException;'
Kindly help me resolve the issue or guide me the changes in the code above.
... View more
Labels:
- Labels:
-
Apache Spark
-
Cloudera Hue