Member since
04-27-2016
218
Posts
133
Kudos Received
25
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3609 | 08-31-2017 03:34 PM | |
| 7470 | 02-08-2017 03:17 AM | |
| 3295 | 01-24-2017 03:37 AM | |
| 10593 | 01-19-2017 03:57 AM | |
| 6024 | 01-17-2017 09:51 PM |
10-21-2016
02:21 PM
It seems I fixed this by using the ConvertCharacterSet processor. I will test more.
... View more
10-17-2016
02:22 PM
I didnt copy the flow.xml.gz but imported some templates which must have updated the flow.xml. Whats the fix to restart the NiFi instance?
... View more
09-19-2016
09:06 PM
1 Kudo
You can use the Jms Connection Factory Provider to specify your vendor specific details. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.jms.cf.JMSConnectionFactoryProvider/index.html
... View more
08-25-2016
08:16 PM
1 Kudo
@milind pandit loaded question. First you have to define what the unique entity is. once that solved then you can use various tools like pig to parse through data and provide you single record. This can also be done via hive by using group by statement on your natural key to provide you single record from source. Lastly you can use tools like information or talend to do the same.
... View more
08-25-2016
08:05 PM
Another option would be to pre-convert XML to JSON.
... View more
09-01-2016
10:13 PM
2 Kudos
We're going through this process now, migrating a non-trivial amount
of data from an older cluster onto a new cluster and environment. We
have a couple of requirements and constraints that limited some of the
options:
The datanodes on the 2 clusters don't have
network connectivity. Each cluster resides in it's own private
firewalled network. (As an added complication, we also use the same
hostnames in each of the two private environments.) distcp scales
requires the datanodes in the 2 clusters to be able communicate
directly. We have different security models in the two
models. The old cluster uses simple authentication. The new cluster
uses kerberos for authentication. I've found that getting some of the
tools to work with 2 different authentication models can be difficult. I
want to preserve the file metadata from the old cluster on the new
cluster - e.g. file create time, ownership, file system permissions.
Some of the options can move the data from the source cluster, but they
write 'new' files on the target cluster. The old cluster has been
running running for around 2 years so there's alot of useful information
in those file timestamps. I need to perform a near-live
migration. I have the keep the old cluster running in parallel while
migrating data and users to the new cluster. Can't just cut access to
the old cluster After trying a number of tools and combinations, inculding WebHDFS and Knox combinations. we've settled on the following:
Export the
old cluster via NFS gateways. We lock the NFS access controls to only
allow the edge servers on the new cluster to mount the HDFS NFS volume.
The edge servers in our target cluster are airflow workers running as a
grid. We've created a source NFS gateway for each target edge server
airflow worker enabling a degree of scale-out. Not as good as distcp
scale-out but better than a single point pipe.
run good
old fashioned hdfs dfs -copyFromLocal -p <old_cluster_nfs_dir>
<new_cluster_hdfs_dir>. This enables us to preserve the file
timestamps as well as ownerships. As part of managing
the migration process, we're also making use of HDFS snapshots on both
source and target to enable consistency management. Our migration jobs
take snapshots at the beginning and end of each migration job and issue
delta or difference reports to identify if data was modified and
possibly missed during the migration process. I'm expecting that some
of our larger data sets will take hours to complete, for the largest
few, possible > 24hrs. In order to perform the snapshot management
we also added some additional wrapper code. WebHDFS can be used to
create and list snapshots, but it doesn't yet have an operation for
returning a snapshot difference report. For the hive metadata,
the majority of our hive DDL exists in git/source code control. We're
actually using this migration as an opportunity to enforce this for our
production objects. For end user objects, e.g. analysts data labs,
we're exporting the DDL on the old cluster and re-playing DDL on the new
cluster - with tweeks for any reserved words collisions. We don't have HBase operating on our old cluster so I didn't have to come up with a solution for that problem.
... View more
09-22-2016
10:35 AM
Awesome! Thanks Dominika!
... View more
08-16-2016
04:04 PM
Yes, I did this, but my output directory is d:/abc/${path}
... View more
02-07-2017
02:13 AM
the bucket ofcourse created and I could access them s3 browser as well as s3 command line.
... View more
11-30-2016
01:44 PM
2016-11-29 14:50:59,544 INFO [Write-Ahead
Local State Provider Maintenance] org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@6dc5e857 checkpointed with 3
Records and 0 Swap Files in 25 milliseconds (Stop-the-world time = 11
milliseconds, Clear Edit Logs time = 9 millis), max Transaction ID 8
2016-11-29 14:51:06,659 WARN [Timer-Driven Process Thread-7]
o.apache.hadoop.hdfs.BlockReaderFactory I/O error constructing remote
block reader. java.io.IOException: An existing connection was forcibly
closed by the remote host at sun.nio.ch.SocketDispatcher.read0(Native
Method) ~[na:1.8.0_111] 2016-11-29 14:51:06,659 WARN [Timer-Driven Process Thread-7] org.apache.hadoop.hdfs.DFSClient Failed to connect to sandbox.hortonworks.com/127.0.0.1:50010
for block, add to deadNodes and continue. java.io.IOException: An
existing connection was forcibly closed by the remote host
java.io.IOException: An existing connection was forcibly closed by the
remote host at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111] 2016-11-29
14:51:06,660 WARN [Timer-Driven Process Thread-7]
org.apache.hadoop.hdfs.DFSClient Could not obtain block:
BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv No live nodes contain current block
Block locations: 172.17.0.2:50010 Dead nodes: 172.17.0.2:50010. Throwing
a BlockMissingException 2016-11-29 14:51:06,660 WARN [Timer-Driven
Process Thread-7] org.apache.hadoop.hdfs.DFSClient Could not obtain
block: BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv No live nodes contain current block
Block locations: 172.17.0.2:50010 Dead nodes: 172.17.0.2:50010. Throwing
a BlockMissingException 2016-11-29 14:51:06,660 WARN [Timer-Driven
Process Thread-7] org.apache.hadoop.hdfs.DFSClient DFS Read
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv at
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:889)
[hadoop-hdfs-2.6.2.jar:na] 2016-11-29 14:51:06,660 ERROR [Timer-Driven Process Thread-7]
o.apache.nifi.processors.hadoop.GetHDFS
GetHDFS[id=abb1f7a5-0158-1000-f1d4-ef83203b4aa1] Error retrieving file hdfs://sandbox.hortonworks.com:8020/user/admin/Data/trucks.csv
from HDFS due to
org.apache.nifi.processor.exception.FlowFileAccessException: Failed to
import data from
org.apache.hadoop.hdfs.client.HdfsDataInputStream@7bea77c5 for
StandardFlowFileRecord[uuid=34551c53-72ad-40fa-927d-5ac60fe6d83e,claim=,offset=0,name=712611918461157,size=0]
due to org.apache.nifi.processor.exception.FlowFileAccessException:
Unable to create ContentClaim due to
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv:
org.apache.nifi.processor.exception.FlowFileAccessException: Failed to
import data from
org.apache.hadoop.hdfs.client.HdfsDataInputStream@7bea77c5 for
StandardFlowFileRecord[uuid=34551c53-72ad-40fa-927d-5ac60fe6d83e,claim=,offset=0,name=712611918461157,size=0]
due to org.apache.nifi.processor.exception.FlowFileAccessException:
Unable to create ContentClaim due to
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv 2016-11-29 14:51:06,661 ERROR
[Timer-Driven Process Thread-7] o.apache.nifi.processors.hadoop.GetHDFS
org.apache.nifi.processor.exception.FlowFileAccessException: Failed to
import data from
org.apache.hadoop.hdfs.client.HdfsDataInputStream@7bea77c5 for
StandardFlowFileRecord[uuid=34551c53-72ad-40fa-927d-5ac60fe6d83e,claim=,offset=0,name=712611918461157,size=0]
due to org.apache.nifi.processor.exception.FlowFileAccessException:
Unable to create ContentClaim due to
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block:
BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv at
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2479)
~[na:na] Caused by:
org.apache.nifi.processor.exception.FlowFileAccessException: Unable to
create ContentClaim due to org.apache.hadoop.hdfs.BlockMissingException:
Could not obtain block:
BP-1464254149-172.17.0.2-1477381671113:blk_1073742577_1761
file=/user/admin/Data/trucks.csv at
org.apache.nifi.controller.repository.StandardProcessSession.importFrom(StandardProcessSession.java:2472)
~[na:na] ... 14 common frames omitted
... View more