Member since
07-18-2017
15
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2061 | 08-05-2018 06:56 AM |
08-05-2018
06:56 AM
Solved it. Noticed that writing to Postgresql was accurate if i read parquet with second option below. parquet("/user-data/xyz/input/TABLE/*) // WRONG numbers in PostgreSQL parquet("/user-data/xyz/input/TABLE/evnt_month=*) // Correct numbers in postgreSQL If someone is aware of such problem, please comment.
... View more
07-11-2018
10:11 AM
@Felix Albani The table is having millions of records so it's very difficult to identify the missing or extra rows in PostgreSQL. Is there any known issue in spark for postgresql to not match count ?.
... View more
07-10-2018
06:56 AM
When I try to Write a Dataframe to PostgreSQL using Spark Scala, I have noticed that the count on PostgreSQL is always higher than what is get in Spark Scala. The count in spark dataframe is correct & expected. I have even tried to load the data on monthly basis in parts but the Count in postgreSQL is higher than Spark dataframe
df=sqlContext.read.option("compression","snappy").parquet("/user-data/xyz/input/TABLE/")
val connection="jdbc:postgresql://localhost:5449/adb?user=aschema&password=abc"
val prop = new java.util.Properties
prop.setProperty("driver", "org.postgresql.Driver")
df.write.mode("Overwrite").jdbc(url= connection, table = "adb.aschema.TABLE", connectionProperties = prop)
... View more
Labels:
- Labels:
-
Apache Spark
11-15-2017
05:31 AM
@Matt Burgess , this issue was resolved by downloading HDF version of Nifi 1.2.0.. Thanks
... View more
11-14-2017
05:43 PM
Hello, I am using Hive 2.3.0 with spark version 2.0.2, When i am trying to run Hive commands on Spark from hive console, The Job is getting stuck and i have to manually kill it. Following was the error message in Spark worker Log. Could you please advise if i am doing something wrong. INFO worker.Worker: Executor app-20171114093447-0000/0 finished with state KILLED exitStatus 143
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
11-10-2017
03:36 PM
I am using Apache Hive 2.3.0. Please recommend me the HDP version i should go for. I think manual setup won;t work for me.
... View more
11-10-2017
03:03 PM
@Matt Burgess My Nifi Hive NAR file is nifi-hive-nar-1.4.0.nar . Could you please tell me which NAR version i should be using to make it work.
... View more
11-09-2017
12:29 PM
2017-11-09T15:21:13,760 ERROR [pool-8-thread-29] metastore.RetryingHMSHandler: java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:601
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1000)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy21.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14155)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14139)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-11-09T15:21:13,760 INFO [pool-8-thread-29] org.apache.hadoop.hive.metastore.HiveMetaStore - 15: Done cleaning up thread local RawStore
2017-11-09T15:21:13,760 ERROR [pool-8-thread-29] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:601
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1000) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6362) ~[hive-exec-2.3.0.jar:2.3.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_45]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-exec-2.3.0.jar:2.3.0]
at com.sun.proxy.$Proxy21.lock(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14155) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14139) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) ~[hive-exec-2.3.0.jar:2.3.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_45]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_45]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.0.jar:2.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
2017-11-09T15:21:13,760 INFO [pool-8-thread-29] HiveMetaStore.audit: ugi=dwusr ip=127.0.0.1 cmd=Cleaning up thread local RawStore...
2017-11-09T15:21:13,760 INFO [pool-8-thread-29] HiveMetaStore.audit: ugi=dwusr ip=127.0.0.1 cmd=Done cleaning up thread local RawStore
2017-11-09T15:21:14,783 ERROR [pool-8-thread-32] metastore.RetryingHMSHandler: java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:601
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1000)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy21.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14155)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14139)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) I am getting above error when running PutHiveStreaming processor in Nifi Any idea what's wrong here. I am using Hive 2.3.0 and Nifi version is 1.4.0
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
11-06-2017
07:03 AM
I have 2 tables as below. CREATE EXTERNAL TABLE IF NOT EXISTS TEMP_tab(id int,mytime STRING,age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs://xxx';
CREATE TABLE IF NOT EXISTS main_TAB(id int,age int)
PARTITIONED BY (mytime STRING)
STORED AS ORC
tblproperties ("orc.compress"="ZLIB");
FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB
PARTITION (mytime)
SELECT *,substr(mytime,0,10) as mytime;
but the strange thing is Insert does not work. It has following error message
Error: org.apache.spark.sql.AnalysisException: Cannot insert into table m16 . main_TAB because the number of columns are different: need 2 columns, but query has 3 columns.; (state=,code=0) I have already set these 2 as well SET hive.exec.dynamic.partition = true SET hive.exec.dynamic.partition.mode = nonstrict
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark