Member since
05-09-2022
6
Posts
0
Kudos Received
0
Solutions
06-08-2022
12:42 AM
First we created a table based on JSON dataset from hive cli using the below query CREATE EXTERNAL TABLE json10(
fruit string,
size string,
color string
)
ROW FORMAT SERDE
'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3a://json2/'
; we are able to execute select queries form hive cli on top of the table created above, but unable to execute the same from pyspark script. pyspark script from pyspark.context import SparkContext
sc=SparkContext()
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
json=hive_context.table("default.json")
hive_context.sql("select * from json").show() ERROR MESSAGE 22/06/07 15:24:38 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, b7-36.lab.archivas.com, executor 1): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301)
at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2436)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2321)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2321)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at scala.collection.immutable.List$SerializationProxy.readObject(List.scala:490)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2321)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2430)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2354)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2212)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1668)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:502)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:460)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:413)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750) we also tried providing the hive Hcatalog Jars in the pyspark script and landed in below error message python script: from pyspark.context import SparkContext
sc=SparkContext()
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
json=hive_context.table("default.json")
hive_context.sql("ADD JAR /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar")
hive_context.sql("select * from json").show() ERROR MESSAGE: 22/06/07 15:15:30 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on b7-38.lab.archivas.com:46455 in memory (size: 6.0 KB, free: 366.3 MB)
22/06/07 15:15:30 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, b7-38.lab.archivas.com, executor 1): org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:182)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:487)
at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:486)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:645)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:265)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:257)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:413)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1334)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:419)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.io.IOException: Start token not found where expected
at org.apache.hive.hcatalog.data.JsonSerDe.deserialize(JsonSerDe.java:170)
... 25 more can anyone suggest additional parameters or configuration to be set to make Json tables (created in Hive ) to work from pyspark script. Also please note that CSV & parquet dataset are working fine
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Spark
06-08-2022
12:26 AM
Resources we have in place: 7 nodes with each having 250 gb memory vcpu = 32 per each node configuration specified in spark-defaults.conf : spark.executor.memory = 100g spark.executor.memoryOverhead = 49g spark.driver.memoryOverhead=200g spark.driver.memory = 500g Query tried to execute : hive_context.sql("select * from 5mcsv CROSS JOIN 2mcsv").show(8000000) So, we are facing below issue when trying to fetch 8 million rows with the above mentioned query. However , we are not facing any issue with fetching 7 million rows Traceback (most recent call last):
File "/root/hivespark.py", line 29, in <module>
hive_context.sql("select * from 5mcsv CROSS JOIN 2mcsv").show(8000000)
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 381, in show
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o71.showString.
: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:141)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:364)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at scala.collection.TraversableOnce$class.addString(TraversableOnce.scala:357)
at scala.collection.AbstractTraversable.addString(Traversable.scala:104)
at org.apache.spark.sql.Dataset$$anonfun$showString$2.apply(Dataset.scala:330)
at org.apache.spark.sql.Dataset$$anonfun$showString$2.apply(Dataset.scala:330)
at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:74)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:330)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:750) Got below error message when tried to fetch 9 million rows 22/06/08 02:44:04 WARN hdfs.DataStreamer: Exception for BP-1037869773-172.18.105.90-1650524469800:blk_1073833296_92560
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:552)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1101)
22/06/08 02:44:04 WARN hdfs.DataStreamer: Error Recovery for BP-1037869773-172.18.105.90-1650524469800:blk_1073833296_92560 in pipeline [DatanodeInfoWithStorage[172.18.105.88:9866,DS-df61a542-f662-46db-9fc6-4c0b325e2e68,DISK], DatanodeInfoWithStorage[172.18.105.83:9866,DS-b781a5d9-5114-4807-9c91-0170578a8bb6,DISK], DatanodeInfoWithStorage[172.18.105.56:9866,DS-12f0be58-8862-4606-9b0c-0b3d6f77ce42,DISK]]: datanode 0(DatanodeInfoWithStorage[172.18.105.88:9866,DS-df61a542-f662-46db-9fc6-4c0b325e2e68,DISK]) is bad.
22/06/08 02:45:12 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on b7-38.lab.archivas.com:44696 in memory (size: 39.6 KB, free: 266.5 GB)
22/06/08 02:45:12 INFO spark.ContextCleaner: Cleaned accumulator 1
22/06/08 02:45:12 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on b7-38.lab.archivas.com:44696 in memory (size: 39.6 KB, free: 266.5 GB)
22/06/08 02:45:12 INFO spark.ContextCleaner: Cleaned accumulator 2
22/06/08 02:45:12 INFO spark.ContextCleaner: Cleaned accumulator 4
22/06/08 02:45:12 INFO spark.ContextCleaner: Cleaned accumulator 3
22/06/08 02:50:55 WARN hdfs.DataStreamer: Exception for BP-1037869773-172.18.105.90-1650524469800:blk_1073833296_92573
java.io.EOFException: Unexpected EOF while trying to read response from server
at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:552)
at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1101)
22/06/08 02:50:55 WARN hdfs.DataStreamer: Error Recovery for BP-1037869773-172.18.105.90-1650524469800:blk_1073833296_92573 in pipeline [DatanodeInfoWithStorage[172.18.105.83:9866,DS-b781a5d9-5114-4807-9c91-0170578a8bb6,DISK], DatanodeInfoWithStorage[172.18.105.56:9866,DS-12f0be58-8862-4606-9b0c-0b3d6f77ce42,DISK], DatanodeInfoWithStorage[172.18.105.84:9866,DS-aaf0af68-8eaf-4bba-99f1-5d641ddfe726,DISK]]: datanode 0(DatanodeInfoWithStorage[172.18.105.83:9866,DS-b781a5d9-5114-4807-9c91-0170578a8bb6,DISK]) is bad.
Traceback (most recent call last):
File "/root/hivespark.py", line 29, in <module>
hive_context.sql("select * from 5mcsv CROSS JOIN 2mcsv").show(9000000)
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 381, in show
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
File "/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o71.showString.
: java.lang.OutOfMemoryError: Requested array size exceeds VM limit
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:141)
at scala.collection.mutable.StringBuilder.append(StringBuilder.scala:200)
at scala.collection.TraversableOnce$$anonfun$addString$1.apply(TraversableOnce.scala:364)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) Can anyone help me with exact memory configurations based on the available resources i provided or if there is any additional parameters to be set ?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
05-11-2022
12:01 AM
Hi @aakulov, we are using on-prem(bare-metal cluster ) Cloudera manager version 7.6.1 Cloudera Runtime 7.1.7 (Parcels) we configured the AWS credentials the same way as per the links shared by you, but getting unable to load AWS credentials with directory included in s3a Url (ex:"s3a://test/directory")
... View more
05-09-2022
03:03 AM
[root@b7-40 ~]# hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] WARNING: Use "yarn jar" to launch YARN applications. SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/log4j-slf4j-impl-2.17.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/CDH-7.1.7-1.cdh7.1.7.p1000.24102687/jars/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Connecting to jdbc:hive2://b7-33.lab.archivas.com:2181,b7-40.lab.archivas.com:2181,b7-6.lab.archivas.com:2181/default;password=root;serviceDiscoveryMode=zooKeeper;ssl=true;user=root;zooKeeperNamespace=hiveserver2 22/04/29 03:00:25 [main]: WARN jdbc.HiveConnection: Failed to connect to b7-40.lab.archivas.com:10000 22/04/29 03:00:25 [main]: ERROR jdbc.Utils: Unable to read HiveServer2 configs from ZooKeeper Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport for any of the Server URI's in ZooKeeper: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target (state=08S01,code=0) Beeline version 3.1.3000.7.1.7.1000-141 by Apache Hive beeline> !connect jdbc:hive2://b7-40.lab.archivas.com:10000 Connecting to jdbc:hive2://b7-40.lab.archivas.com:10000 Enter username for jdbc:hive2://b7-40.lab.archivas.com:10000: root Enter password for jdbc:hive2://b7-40.lab.archivas.com:10000: ******** 22/04/29 03:01:22 [main]: WARN jdbc.HiveConnection: Failed to connect to b7-40.lab.archivas.com:10000 Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport with JDBC Uri: jdbc:hive2://b7-40.lab.archivas.com:10000: Invalid status 21 (state=08S01,code=0) beeline> !connect jdbc:hive2://b7-33.lab.archivas.com:10000 Connecting to jdbc:hive2://b7-33.lab.archivas.com:10000 Enter username for jdbc:hive2://b7-33.lab.archivas.com:10000: root Enter password for jdbc:hive2://b7-33.lab.archivas.com:10000: ******** 22/04/29 03:01:44 [main]: WARN jdbc.HiveConnection: Failed to connect to b7-33.lab.archivas.com:10000 Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status. Error: Could not open client transport with JDBC Uri: jdbc:hive2://b7-33.lab.archivas.com:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0) beeline> !connect jdbc:hive2://b7-40.lab.archivas.com:10002 Connecting to jdbc:hive2://b7-40.lab.archivas.com:10002 Enter username for jdbc:hive2://b7-40.lab.archivas.com:10002: root Enter password for jdbc:hive2://b7-40.lab.archivas.com:10002: ******** 22/04/29 03:02:29 [main]: WARN jdbc.HiveConnection: Failed to connect to b7-40.lab.archivas.com:10002 Unknown HS2 problem when communicating with Thrift server. Error: Could not open client transport with JDBC Uri: jdbc:hive2://b7-40.lab.archivas.com:10002: Invalid status 21 (state=08S01,code=0)
... View more
Labels:
- Labels:
-
Apache Hive
05-09-2022
02:46 AM
We specified the environment variables as below: export AWS_ACCESS_KEY_ID=xMK6bdX8iY**************************************
export AWS_SECRET_KEY=34*************************************** After connecting to a hive session , we specified the s3a credentials: set fs.s3a.endpoint=cluster.domain.*;
set fs.s3a.access.key=$$$$$$$$$$$$$$###;
set fs.s3a.secret.key=####$$$$; Tried to create a table using the below query (with directory location in s3 bucket : (s3a://test/dir2/) and received the preceding error ; even though the s3 credentials were already specified as stated above: 0: jdbc:hive2://> CREATE EXTERNAL TABLE s3dir (
. . . . . . . . > col1 int,
. . . . . . . . > col2 string,
. . . . . . . . > col3 string,
. . . . . . . . > col4 string
. . . . . . . . > )
. . . . . . . . > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
. . . . . . . . > LOCATION 's3a://test/dir2/'
. . . . . . . . > TBLPROPERTIES (
. . . . . . . . > "s3select.format" = "csv"
. . . . . . . . > );
22/05/03 03:06:32 [2199007f-0721-4e46-89b6-40cef824235c main]: WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
22/05/03 03:06:36 [HiveServer2-Background-Pool: Thread-71]: ERROR exec.Task: Failed
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)))
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1170) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1175) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.createTableNonReplaceMode(CreateTableOperation.java:140) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation.execute(CreateTableOperation.java:98) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.ddl.DDLTask.execute(DDLTask.java:82) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:357) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:330) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:109) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:749) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:504) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:498) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:166) [hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:226) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:88) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:327) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_322]
at javax.security.auth.Subject.doAs(Subject.java:422) [?:1.8.0_322]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898) [hadoop-common-3.1.1.7.1.7.1000-141.jar:?]
at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:345) [hive-service-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_322]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_322]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63918) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result$create_table_req_resultStandardScheme.read(ThriftHiveMetastore.java:63886) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_req_result.read(ThriftHiveMetastore.java:63812) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_req(ThriftHiveMetastore.java:1796) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_req(ThriftHiveMetastore.java:1783) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:3622) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.create_table_with_environment_context(SessionHiveMetaStoreClient.java:145) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1082) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:1067) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_322]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_322]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_322]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at com.sun.proxy.$Proxy35.createTable(Unknown Source) ~[?:?]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_322]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_322]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_322]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_322]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3515) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
at com.sun.proxy.$Proxy35.createTable(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:1159) ~[hive-exec-3.1.3000.7.1.7.1000-141.jar:3.1.3000.7.1.7.1000-141]
... 28 more
22/05/03 03:06:36 [HiveServer2-Background-Pool: Thread-71]: ERROR exec.Task: DDLTask failed, DDL Operation: class org.apache.hadoop.hive.ql.ddl.table.create.CreateTableOperation
org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)))
ERROR : FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY)))
Error: Error while compiling statement: FAILED: Execution Error, return code 40000 from org.apache.hadoop.hive.ql.ddl.DDLTask. MetaException(message:Got exception: java.nio.file.AccessDeniedException s3a://test/dir2: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : com.amazonaws.SdkClientException: Unable to load AWS credentials from environment variables (AWS_ACCESS_KEY_ID (or AWS_ACCESS_KEY) and AWS_SECRET_KEY (or AWS_SECRET_ACCESS_KEY))) (state=08S01,code=40000) however the same works when the .csv file is present directly in s3 bucket and not inside any directory ('s3a://test/): 0: jdbc:hive2://> CREATE EXTERNAL TABLE s3notdir (
. . . . . . . . > col1 int,
. . . . . . . . > col2 string,
. . . . . . . . > col3 string,
. . . . . . . . > col4 string
. . . . . . . . > )
. . . . . . . . > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
. . . . . . . . > LOCATION 's3a://test/'
. . . . . . . . > TBLPROPERTIES (
. . . . . . . . > "s3select.format" = "csv"
. . . . . . . . > );
OK
No rows affected (2.223 seconds)
0: jdbc:hive2://>
... View more
Labels:
- Labels:
-
Apache Hive