Member since
07-18-2017
15
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1179 | 08-05-2018 06:56 AM |
08-05-2018
06:56 AM
Solved it. Noticed that writing to Postgresql was accurate if i read parquet with second option below. parquet("/user-data/xyz/input/TABLE/*) // WRONG numbers in PostgreSQL parquet("/user-data/xyz/input/TABLE/evnt_month=*) // Correct numbers in postgreSQL If someone is aware of such problem, please comment.
... View more
07-11-2018
10:11 AM
@Felix Albani The table is having millions of records so it's very difficult to identify the missing or extra rows in PostgreSQL. Is there any known issue in spark for postgresql to not match count ?.
... View more
07-10-2018
06:56 AM
When I try to Write a Dataframe to PostgreSQL using Spark Scala, I have noticed that the count on PostgreSQL is always higher than what is get in Spark Scala. The count in spark dataframe is correct & expected. I have even tried to load the data on monthly basis in parts but the Count in postgreSQL is higher than Spark dataframe
df=sqlContext.read.option("compression","snappy").parquet("/user-data/xyz/input/TABLE/")
val connection="jdbc:postgresql://localhost:5449/adb?user=aschema&password=abc"
val prop = new java.util.Properties
prop.setProperty("driver", "org.postgresql.Driver")
df.write.mode("Overwrite").jdbc(url= connection, table = "adb.aschema.TABLE", connectionProperties = prop)
... View more
Labels:
- Labels:
-
Apache Spark
11-15-2017
05:31 AM
@Matt Burgess , this issue was resolved by downloading HDF version of Nifi 1.2.0.. Thanks
... View more
11-14-2017
05:43 PM
Hello, I am using Hive 2.3.0 with spark version 2.0.2, When i am trying to run Hive commands on Spark from hive console, The Job is getting stuck and i have to manually kill it. Following was the error message in Spark worker Log. Could you please advise if i am doing something wrong. INFO worker.Worker: Executor app-20171114093447-0000/0 finished with state KILLED exitStatus 143
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
11-10-2017
03:36 PM
I am using Apache Hive 2.3.0. Please recommend me the HDP version i should go for. I think manual setup won;t work for me.
... View more
11-10-2017
03:03 PM
@Matt Burgess My Nifi Hive NAR file is nifi-hive-nar-1.4.0.nar . Could you please tell me which NAR version i should be using to make it work.
... View more
11-09-2017
12:29 PM
2017-11-09T15:21:13,760 ERROR [pool-8-thread-29] metastore.RetryingHMSHandler: java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:601
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1000)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy21.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14155)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14139)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-11-09T15:21:13,760 INFO [pool-8-thread-29] org.apache.hadoop.hive.metastore.HiveMetaStore - 15: Done cleaning up thread local RawStore
2017-11-09T15:21:13,760 ERROR [pool-8-thread-29] server.TThreadPoolServer: Error occurred during processing of message.
java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:601
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1000) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6362) ~[hive-exec-2.3.0.jar:2.3.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_45]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_45]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[?:1.8.0_45]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) ~[hive-exec-2.3.0.jar:2.3.0]
at com.sun.proxy.$Proxy21.lock(Unknown Source) ~[?:?]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14155) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14139) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) ~[hive-exec-2.3.0.jar:2.3.0]
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_45]
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_45]
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698) ~[hadoop-common-2.7.3.jar:?]
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118) ~[hive-exec-2.3.0.jar:2.3.0]
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) ~[hive-exec-2.3.0.jar:2.3.0]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_45]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_45]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_45]
2017-11-09T15:21:13,760 INFO [pool-8-thread-29] HiveMetaStore.audit: ugi=dwusr ip=127.0.0.1 cmd=Cleaning up thread local RawStore...
2017-11-09T15:21:13,760 INFO [pool-8-thread-29] HiveMetaStore.audit: ugi=dwusr ip=127.0.0.1 cmd=Done cleaning up thread local RawStore
2017-11-09T15:21:14,783 ERROR [pool-8-thread-32] metastore.RetryingHMSHandler: java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:601
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:1000)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:872)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:6362)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
at com.sun.proxy.$Proxy21.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14155)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:14139)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) I am getting above error when running PutHiveStreaming processor in Nifi Any idea what's wrong here. I am using Hive 2.3.0 and Nifi version is 1.4.0
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
11-09-2017
05:18 AM
Hi, Using ConvertCSVtoAvro processor, I was successfully able to convert CSV to AVRO .Now my requirement is to insert this Avro data to existing Hive table using NIFI but i am stuck here. How can i do this?. I am using below statement currently to insert the Avro file to Hive table. CREATE EXTERNAL TABLE avroTEST
ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION '/user/test/csvData/AVRO'
TBLPROPERTIES ('avro.schema.url'='hdfs:///user/hive/schemas/newavro.schema'); also please let me know if there is any other optimal way of doing this.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache NiFi
11-06-2017
07:03 AM
I have 2 tables as below. CREATE EXTERNAL TABLE IF NOT EXISTS TEMP_tab(id int,mytime STRING,age int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION 'hdfs://xxx';
CREATE TABLE IF NOT EXISTS main_TAB(id int,age int)
PARTITIONED BY (mytime STRING)
STORED AS ORC
tblproperties ("orc.compress"="ZLIB");
FROM TEMP_TAB INSERT OVERWRITE TABLE main_TAB
PARTITION (mytime)
SELECT *,substr(mytime,0,10) as mytime;
but the strange thing is Insert does not work. It has following error message
Error: org.apache.spark.sql.AnalysisException: Cannot insert into table m16 . main_TAB because the number of columns are different: need 2 columns, but query has 3 columns.; (state=,code=0) I have already set these 2 as well SET hive.exec.dynamic.partition = true SET hive.exec.dynamic.partition.mode = nonstrict
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
09-28-2017
01:00 AM
Hi, I have a strange issue. I created a Sqoop2 Job & ran it fine . On Next day i observed that the existing links,Job is no longer available. Is this a bug in Sqoop2 or i am doing something wrong?. Error message was Exception has occurred during processing command Exception: org.apache.sqoop.common.SqoopException Message: SERVER_0006:Entity requested doesn't exist - Job: oracle doesn't exist
... View more
Labels:
- Labels:
-
Apache Sqoop
07-19-2017
12:42 AM
Is there someone who as got this issue?.
... View more
07-18-2017
05:09 AM
I am trying to run a job which imports from Oracle to HDFS but getting following errors. java.lang.RuntimeException: Could not determine columns in Oracle Table. at org.apache.sqoop.connector.jdbc.oracle.OracleJdbcCommonInitializer.getSchema(OracleJdbcCommonInitializer.java:140) at org.apache.sqoop.connector.jdbc.oracle.OracleJdbcCommonInitializer.getSchema(OracleJdbcCommonInitializer.java:48) at org.apache.sqoop.driver.JobManager.getSchemaForConnector(JobManager.java:529) at org.apache.sqoop.driver.JobManager.createJobRequest(JobManager.java:426) at org.apache.sqoop.driver.JobManager.start(JobManager.java:317) at org.apache.sqoop.handler.JobRequestHandler.startJob(JobRequestHandler.java:353) at org.apache.sqoop.handler.JobRequestHandler.handleEvent(JobRequestHandler.java:114) at org.apache.sqoop.server.v1.JobServlet.handlePutRequest(JobServlet.java:84) at org.apache.sqoop.server.SqoopProtocolServlet.doPut(SqoopProtocolServlet.java:81) at javax.servlet.http.HttpServlet.service(HttpServlet.java:710) at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:808) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1669) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:594) at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticationFilter.doFilter(DelegationTokenAuthenticationFilter.java:291) at org.apache.hadoop.security.authentication.server.AuthenticationFilter.doFilter(AuthenticationFilter.java:553) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:499) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257) at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.sql.SQLSyntaxErrorException: ORA-00936: missing expression at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:447) at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:396) at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:951) at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:513) at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:227) at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:531) at oracle.jdbc.driver.T4CStatement.doOall8(T4CStatement.java:195) at oracle.jdbc.driver.T4CStatement.executeForDescribe(T4CStatement.java:876) at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1175) at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1296) at oracle.jdbc.driver.OracleStatement.executeQuery(OracleStatement.java:1498) at oracle.jdbc.driver.OracleStatementWrapper.executeQuery(OracleStatementWrapper.java:406) at org.apache.sqoop.connector.jdbc.oracle.util.OracleQueries.getTableColumns(OracleQueries.java:710) at org.apache.sqoop.connector.jdbc.oracle.util.OracleQueries.getFromTableColumns(OracleQueries.java:770) at org.apache.sqoop.connector.jdbc.oracle.util.OracleQueries.getFromTableColumnNames(OracleQueries.java:644) at org.apache.sqoop.connector.jdbc.oracle.OracleJdbcFromInitializer.getColumnNames(OracleJdbcFromInitializer.java:94) at org.apache.sqoop.connector.jdbc.oracle.OracleJdbcFromInitializer.getColumnNames(OracleJdbcFromInitializer.java:31) at org.apache.sqoop.connector.jdbc.oracle.OracleJdbcCommonInitializer.getSchema(OracleJdbcCommonInitializer.java:129) ... 29 more any idea what;s going wrong here. I don't find a clear documentation on this. Thanks,
... View more
Labels:
- Labels:
-
Apache Sqoop