Member since
07-05-2016
42
Posts
32
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3637 | 02-13-2018 10:56 PM | |
1600 | 08-25-2017 05:25 AM | |
9613 | 03-01-2017 05:01 AM | |
5099 | 12-14-2016 07:00 AM | |
1225 | 12-13-2016 05:43 PM |
12-14-2016
06:23 AM
The join happens before the aggregation. You're aggregating the result of the join, which has inflated the row count because there are duplicates.
... View more
12-14-2016
05:15 AM
Are there duplicates in the outgrn.out_id column?
... View more
12-13-2016
05:43 PM
2 Kudos
HBase column families have a time-to-live (TTL) property which, by default, is set to FOREVER. If you wanted to delete the HBase cell values a week after being inserted, you could set the TTL to 604800 (which is the number of seconds in a week: 60 * 60 * 24 * 7).
Here's an example:
Create a table where the column family has a TTL of 10 seconds: hbase(main):001:0> create 'test', {'NAME' => 'cf1', 'TTL' => 10}
0 row(s) in 2.5940 seconds
Put a record into that table: hbase(main):002:0> put 'test', 'my-row-key', 'cf1:my-col', 'my-value'
0 row(s) in 0.1420 seconds If we scan the table right away, we can see the record: hbase(main):003:0> scan 'test'
ROW COLUMN+CELL
my-row-key column=cf1:my-col, timestamp=1481650256841, value=my-value
1 row(s) in 0.0260 seconds 10 seconds later, the record has disappeared: hbase(main):004:0> scan 'test'
ROW COLUMN+CELL
0 row(s) in 0.0130 seconds So, perhaps you could use TTL to manage your data retention.
... View more
11-04-2016
03:26 AM
4 Kudos
The `SYSTEM.CATLOG` table contains a column called `KEY_SEQ`. If this contains an integer, it's a key column. E.g. SELECT column_name FROM system.catalog WHERE table_name = '{{ your table name }}' AND key_seq IS NOT NULL;
... View more
10-04-2016
12:58 PM
Thanks @Matt Burgess. That worked perfectly. Very much appreciated. @mkalyanpur: I'm running HDP 2.5.0.0 and HDF 2.0.0.0.
... View more
10-04-2016
06:32 AM
4 Kudos
I'm trying to stream data to Hive with the PutStreamingHive Nifi processor. I saw this post: https://community.hortonworks.com/questions/59411/how-to-use-puthivestreaming.html and can confirm that: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager is the transaction manager ACID transactions run compactor is enabled There are threads available for the compactor I created an ORC backed Hive table: CREATE TABLE `streaming_messages`(
`message` string,
etc...)
CLUSTERED BY (message) INTO 5 BUCKETS
STORED AS ORC
LOCATION
'hdfs://hadoop01.woolford.io:8020/apps/hive/warehouse/mydb.db/streaming_messages'
TBLPROPERTIES('transactional'='true');
I then created a `PutStreamingHive` processor that takes messages Avro messages and writes them to a Hive table: I notice that the Nifi processor uses Thrift to send data to Hive. There are some errors in nifi-app.log and hivemetastore.log. Stacktrace from nifi-app.log: 2016-10-03 23:40:25,348 ERROR [Timer-Driven Process Thread-8] o.a.n.processors.hive.PutHiveStreaming
org.apache.nifi.util.hive.HiveWriter$ConnectFailure: Failed connecting to EndPoint {metaStoreUri='thrift://10.0.1.12:9083', database='mydb', table='streaming_messages', partitionVals=[] }
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:80) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.util.hive.HiveUtils.makeHiveWriter(HiveUtils.java:45) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processors.hive.PutHiveStreaming.makeHiveWriter(PutHiveStreaming.java:827) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processors.hive.PutHiveStreaming.getOrCreateWriter(PutHiveStreaming.java:738) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processors.hive.PutHiveStreaming.lambda$onTrigger$4(PutHiveStreaming.java:462) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1880) ~[na:na]
at org.apache.nifi.controller.repository.StandardProcessSession.read(StandardProcessSession.java:1851) ~[na:na]
at org.apache.nifi.processors.hive.PutHiveStreaming.onTrigger(PutHiveStreaming.java:389) [nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27) ~[nifi-api-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1064) ~[na:na]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:136) ~[na:na]
at org.apache.nifi.controller.tasks.ContinuallyRunProcessorTask.call(ContinuallyRunProcessorTask.java:47) ~[na:na]
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:132) ~[na:na]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_77]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_77]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_77]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_77]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_77]
Caused by: org.apache.nifi.util.hive.HiveWriter$TxnBatchFailure: Failed acquiring Transaction Batch from EndPoint: {metaStoreUri='thrift://10.0.1.12:9083', database='mydb', table='streaming_messages', partitionVals=[] }
at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:255) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
at org.apache.nifi.util.hive.HiveWriter.<init>(HiveWriter.java:74) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
... 19 common frames omitted
Caused by: org.apache.hive.hcatalog.streaming.TransactionError: Unable to acquire lock on {metaStoreUri='thrift://10.0.1.12:9083', database='mydb', table='streaming_messages', partitionVals=[] }
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:578) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransaction(HiveEndPoint.java:547) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1]
at org.apache.nifi.util.hive.HiveWriter.nextTxnBatch(HiveWriter.java:252) ~[nifi-hive-processors-1.0.0.2.0.0.0-579.jar:1.0.0.2.0.0.0-579]
... 20 common frames omitted
Caused by: org.apache.thrift.transport.TTransportException: null
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) ~[hive-exec-1.2.1.jar:1.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_lock(ThriftHiveMetastore.java:3906) ~[hive-metastore-1.2.1.jar:1.2.1]
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.lock(ThriftHiveMetastore.java:3893) ~[hive-metastore-1.2.1.jar:1.2.1]
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:1863) ~[hive-metastore-1.2.1.jar:1.2.1]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_77]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_77]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_77]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_77]
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152) ~[hive-metastore-1.2.1.jar:1.2.1]
at com.sun.proxy.$Proxy148.lock(Unknown Source) ~[na:na]
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.beginNextTransactionImpl(HiveEndPoint.java:573) ~[hive-hcatalog-streaming-1.2.1.jar:1.2.1]
... 22 common frames omitted
Stacktrace from hivemetastore.log: 2016-10-03 23:40:24,322 ERROR [pool-5-thread-114]: metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(195)) - java.lang.IllegalStateException: Unexpected DataOperationType: UNSET agentInfo=Unknown txnid:98201
at org.apache.hadoop.hive.metastore.txn.TxnHandler.enqueueLockWithRetry(TxnHandler.java:938)
at org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:814)
at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5751)
at sun.reflect.GeneratedMethodAccessor28.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:139)
at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:97)
at com.sun.proxy.$Proxy12.lock(Unknown Source)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:11860)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$lock.getResult(ThriftHiveMetastore.java:11844)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:118)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Do you have any suggestions to resolve this?
... View more
Labels:
- Labels:
-
Apache NiFi
10-03-2016
04:46 PM
That worked perfectly, @Bryan Bende. Very much appreciated.
... View more
10-03-2016
03:34 PM
I have Nifi running on a single node. It's working well except for one thing: when I try to drill-down into to a data provenance event an error is returned: "Error: The cluster node identifier must be specified"
Here's my nifiproperties.txt The hostname specified in the `nifi.cluster.node.address` property appears to be valid. Q) how can I set the cluster node identifier?
... View more
Labels:
- Labels:
-
Apache NiFi
09-19-2016
09:56 PM
The command `ambari-server install-mpack --mpack=/path/to/mpack/tar.gz --purge` sounds fairly benign (i.e. you're installing a management pack), and yet the consequences are serious. In this case, the `--purge` flag means "delete all the service definitions for my existing HDP cluster".
... View more
09-11-2016
05:01 AM
1 Kudo
Documentation for the HDP 2.4 to HDP 2.5 upgrade process can be found here: http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.0.1/bk_ambari-upgrade/content/ambari_upgrade_guide.html Upgrade Ambari first, then HDP.
... View more
- « Previous
- Next »