Member since
04-29-2016
192
Posts
20
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1670 | 07-14-2017 05:01 PM | |
2829 | 06-28-2017 05:20 PM |
09-25-2019
10:23 AM
The question posted is not a hypothetical one, it is a real use case. fyi, here is another thread related to partial file consumption; - https://stackoverflow.com/questions/45379729/nifi-how-to-avoid-copying-file-that-are-partially-written that thread does not suggest the OS automatically takes care of this. The solution proposed there is to add a time wait between ListFile and FetchFile, but in our case, the requirement is to wait for an indicator file before we start file ingestion;
... View more
09-25-2019
09:21 AM
Hello All, We're using Apache NiFi 1.0.1, I know we're way behind in upgrading. Our use case is to get files from a local NiFi server mount and write to HDFS; we're using ListFile and FetchFile to achieve this. Some files are huge, so the concern is that NiFi might start to fetch files before they're completely written to the mount, which would cause partial file loads in HDFS. So the solution proposed is, the source system would send us an indicator file (located on a different directory) with a specific name; once we get that file, then we should start fetching the files with FetchFile processor. So, the question is, how do we build the NiFi dataflow in such a way that FetchFile will only start after the indicator file is received. Do you have any suggestions on how to achieve this. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache NiFi
03-14-2018
02:10 AM
@Pranay Vyas The Hive Export/Import worked well for us. Thanks.
... View more
02-21-2018
04:00 PM
Hello, We've the following Hive migration scenario where there are several variable/changes, we need to migrate Hive data from Source to Target Source Target Cluster A Cluster B HDP 2.5.3 HDP 2.6.2 Hive metastore DB - MySQL Hive metastore DB - Oracle Has 7 databases to migrate No existing data to preserve Both clusters are on the same network, both have HDP running. What's the most efficient way to migrate existing Hive data to the new cluster. Thanks.
... View more
Labels:
- Labels:
-
Apache Hive
10-23-2017
04:35 PM
@mkalyanpur how would this be different for a Kerberized HDP environment; I'm having so much trouble connecting to Kerberized HDP 2.5 and 2.6, from NiFi 1.2.0; both PutHiveStreaming and PutHiveQL are not working. For PutHiveQL here is the detail on the error I get - https://community.hortonworks.com/questions/142110/nifi-processor-puthiveql-cannot-connect-to-kerberi.html For PutHiveStreaming, I get the error below: 2017-10-23 11:32:23,841 INFO [put-hive-streaming-0] hive.metastore Trying to connect to metastore with URI thrift://server.domain.com:9083
2017-10-23 11:32:23,856 INFO [put-hive-streaming-0] hive.metastore Connected to metastore.
2017-10-23 11:32:23,885 INFO [Timer-Driven Process Thread-7] hive.metastore Trying to connect to metastore with URI thrift://server.domain.com:9083
2017-10-23 11:32:23,895 INFO [Timer-Driven Process Thread-7] hive.metastore Connected to metastore.
2017-10-23 11:32:24,730 WARN [put-hive-streaming-0] o.a.h.h.m.RetryingMetaStoreClient MetaStoreClient lost connection. Attempting to reconnect.
org.apache.thrift.TApplicationException: Internal error processing open_txns
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_open_txns(ThriftHiveMetastore.java:3834)
at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.open_txns(ThriftHiveMetastore.java:3821)
at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openTxns(HiveMetaStoreClient.java:1841)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:152)
at com.sun.proxy.$Proxy231.openTxns(Unknown Source)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl$1.run(HiveEndPoint.java:525)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.openTxnImpl(HiveEndPoint.java:522)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:504)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:461)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:345)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.access$500(HiveEndPoint.java:243)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:332)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl$2.run(HiveEndPoint.java:329)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:328)
at org.apache.nifi.util.hive.HiveWriter.lambda$nextTxnBatch$2(HiveWriter.java:259)
at org.apache.nifi.util.hive.HiveWriter.lambda$null$3(HiveWriter.java:368)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at org.apache.nifi.util.hive.HiveWriter.lambda$callWithTimeout$4(HiveWriter.java:368)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
<br> The strange thing is, using the same core-site, hdfs-site, hive-site config files and the same principal and keytab, NiFi can connect to HDFS and HBase without any issues, it's only Hive connection that errors; even using a sample java program to connect to Hive using Kerberos principal and keytab works fine. Thanks for your time.
... View more
10-20-2017
02:54 PM
@Andrew Lim thanks for clarifying further.
... View more
10-20-2017
02:52 PM
@Abdelkrim Hadjidj Thanks for clarifying. It would have been nicer to let the controller services be accessible throughout the UI, regardless of where they were created.
... View more
10-20-2017
02:44 PM
Hello, In our NiFi instance (1.2.0) we're finding that controller services created from the Controller Settings menu (top right corner in the UI) are not visible/accessible when you try to look for them through a processor; for example, after a HiveConnectionPool Controller service is created through the controller settings menu, it does not show up in PutHiveinQL's "Hive Database Connection Pooling Service" drop down values; also, when a new Controller service is created by selecting the "Create new service..." option from the dropdown values in PutHiveQL processor, that controller service does not show up in the Controller services listing (accessed through the Controller Settings menu). It seem like it is something to do with user access permissions in the UI; if yes, how can this be corrected. I'm not familiar with the user access permissions settings, our Admin handles that. Thanks.
... View more
Labels:
- Labels:
-
Apache NiFi
10-16-2017
04:33 PM
Hello, My requirement is to overwrite (or delete prior to the import) the existing data in an hcatalog table during sqoop import. It appears hive-overwrite and delete-target-dir arguments don't work for this purpose. Any suggestions on how to do this. Thanks.
... View more
Labels:
- Labels:
-
Apache HCatalog
-
Apache Sqoop
08-09-2017
04:46 PM
@mel mendoza, in my case, after splitting the files, I was doing further processing on the split files; but if your requirement is to store/write the split files, you could use PutFile or PutHDFS to write to local file system or HDFS.
... View more