Member since
04-27-2016
218
Posts
133
Kudos Received
25
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3609 | 08-31-2017 03:34 PM | |
| 7470 | 02-08-2017 03:17 AM | |
| 3295 | 01-24-2017 03:37 AM | |
| 10593 | 01-19-2017 03:57 AM | |
| 6024 | 01-17-2017 09:51 PM |
07-29-2016
07:37 PM
I got it resolved, the issue was related to permissions on the bucket by the certian user.
... View more
07-10-2017
05:11 PM
Is this article still valid for HDF version 3.0 which was released recently? Are there easier ways of deploying to Amazon?
... View more
06-01-2017
09:42 PM
@jeff Can you answer this? By the way, you get a better visibility by posting a question as a separate thread rather than commenting below an article.
... View more
07-29-2016
11:24 AM
Many thanks @Ashnee Sharma....
... View more
07-07-2016
08:38 PM
1 Kudo
@milind pandit This is exactly the kind of thing that Tags are for. When an entity, for example a hive table, is tagged, it will show up when you view all entities associated with that tag. With the new Atlas/Ranger integration, you can create security policies that only apply to hive tables or even hive table columns that have been tagged with that tag. This allows you control access to the hive table or hive table columns just by adding or removing the tag or adding/removing use groups to whom the tag based policy applies. For example, using Atlas and Ranger, you can easily keep track of Data Sets classified as PII and control and audit access to these data sets.
... View more
07-19-2016
04:22 PM
Could you please ask the questions in separate thread with a few more details on what you are trying to achieve? I will do my best to answer/
... View more
06-30-2016
04:54 AM
1 Kudo
@milind pandit There is no direct utility to find this. The files with different names but same content will have have same checksum. Using checksum option of hdfs , we can verify the same. For example: # hdfs dfs -ls /tmp/tst
Found 6 items
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/okay
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/pass3
-rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pre
-rw-r--r-- 3 hdfs hdfs 1064 2016-06-29 21:46 /tmp/tst/pro
-rw-r--r-- 3 hdfs hdfs 2044 2016-06-29 21:46 /tmp/tst/word
# hdfs dfs -checksum /tmp/tst/okay
/tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
# hdfs dfs -checksum /tmp/tst/pass
/tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
# hdfs dfs -checksum /tmp/tst/pre
/tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
# hdfs dfs -checksum /tmp/tst/pro
/tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
From the above, the files "/tmp/tst/okay" and "/tmp/tst/pass" are holding same content, but the filenames are different. You can see from above that both files have same checksum. Similarly for "/tmp/tst/pro" and "/tmp/tst/pre". To check the checksum of files on a folder ( in this case "/tmp/tst" ) , following can be done: # hdfs dfs -checksum /tmp/tst/*
/tmp/tst/okay MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pass MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pass3 MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7
/tmp/tst/pre MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
/tmp/tst/pro MD5-of-0MD5-of-512CRC32C 000002000000000000000000690e462cbf52c9c399fb7c0bcacef01d
/tmp/tst/word MD5-of-0MD5-of-512CRC32C 000002000000000000000000b1be3e03929521974dc321f9e7f27cc7 Also, you can use "hdfs find" to make a large search: # hdfs dfs -checksum `hdfs dfs -find /tmp -print`
The above command will list checksum of all the files. You can also run with "sort and uniq " as : hdfs dfs -checksum `hdfs dfs -find /tmp -print` | sort | uniq -c | awk '{print $2,$4}'
... View more
07-26-2017
12:39 PM
@AnjiReddy Anumolu Just to add a little more detail to the above response from @zblanco. When NiFi ingest data, that data is turned in to NiFi FlowFiles. A NiFi FlowFile consists of Attributes (Metadata) about the actual data and the physical data. The FlowFile metadata is stored in the FlowFile repository as well as JVM heap memory for faster performance. The FlowFile Attributes includes things like filename, ingest time, lineage age, filesize, what connection the FlowFile currently resides in dataflow, any user defined metadata, or processor added metadata, etc....). The physical bytes that make up the actual data content is written to claims within the NiFi content repository. A claim can contain the bytes for 1 to many ingest data files. For more info on the content repository and how claims work, see the following link: https://community.hortonworks.com/articles/82308/understanding-how-nifis-content-repository-archivi.html Thanks, Matt
... View more
10-23-2017
12:45 PM
Hi, If Firewall is disabled and still you get same issue, then add ip and host name in /etc/hosts file for ubuntu system. My issue was resolved after adding host details in /etc/hosts file
... View more
06-23-2016
03:51 PM
@milind pandit that is straight forward. Simple download the two files from the cluster and place in your nifi cluster.
... View more