Member since
07-30-2019
181
Posts
205
Kudos Received
51
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4958 | 10-19-2017 09:11 PM | |
1591 | 12-27-2016 06:46 PM | |
1236 | 09-01-2016 08:08 PM | |
1176 | 08-29-2016 04:40 PM | |
3011 | 08-24-2016 02:26 PM |
04-27-2016
08:47 PM
@Roberto Sancho
Pig is a good tool to use for ETL and data warehouse type of processing on your data. It provides an abstraction layer for the underlying processing engine (MR or Tez). You can use Tez as the execution engine to speed up processing. This Pig Tutorial has additional information.
... View more
04-26-2016
03:55 PM
@David Lays The two main options for replicating the HDFS structure are Falcon and distcp. The distcp command is not very feature rich, you give it a path in the HDFS structure and a destination cluster and it will copy everything to the same path on the destination. If the copy fails, you will need to start it again, etc. Another method for maintaining a replica of your HDFS structure is Falcon. There are more data movement options and you can more effectively manage the lifecycle of all of the data on both sides. If you're moving Hive table structures, there is some more complexity to making sure the tables are created on the DR side, but moving the actual files is done the same way
... View more
04-22-2016
04:19 PM
2 Kudos
@Hefei Li The data is stored encrypted with a copy of the encrypted decryption key (EDEK) attached to the file. No user will be able to access the contents of the O/S level files unless they get the KMS to provide an unencrypted version of the decryption key (DEK). The EDEK is stored with the file so the KMS can determine which version of the key was used to encrypt the file to provide the appropriate DEK once policy checks for access to the file have passed. At the HDFS layer, the user has to have policy access to the KMS key to unencrypt the file. The user will not be able to decrypt the file unless this policy check passes. If you uninstall Ranger and the KMS, you will start seeing errors in the HDFS logs when you try to access files in an encryption zone because the namenode will no longer be able to communicate with the KMS for keys or Ranger for key access policies to the files.
... View more
04-21-2016
06:37 PM
4 Kudos
@Artem Ervits This can definitely be done, but you'll need a different "database" (MySQL parlance) or "schema" (Oracle, DB2 parlance) for each Ambari cluster. For example, you might create an "ambari-Prod1" database or schema for the Prod1 HDP cluster and an "ambari-Test2" database/schema for the Test2 HDP cluster.
... View more
04-20-2016
02:46 PM
2 Kudos
@rbiswas Using the security features of NiFi (like HTTPS transport) is a great way to secure the data in motion. You will want to make sure that the connection from NiFi to the HDP cluster is secured as well (depending on how you do this, possibly with WebHDFS HTTPS transport or Knox). Once the data has landed, you may consider at-rest encryption utilizing the Ranger KMS to provide additional security for the data as well.
... View more
04-19-2016
01:58 PM
@Gowrisankar Periyasamy HDFS allocates space in blocks at a time and a block belongs to a file. If you have a file that takes up a partial block at the end, then that block (and its replicas) remain unfilled until an append is done to the file. If you append to the file, then the last block of the file (and its replicas) is used to hold the appended data until the block is full. For very large files (which is mostly why people use Hadoop), having a max of <blocksize>MB (plus replicas) of space unused is not too large of a concern. For example, if you have a 99.9GB file, you would allocate 799 full blocks (at 128MB/block) and have one block that was only 20% full. That equates to about 0.1% unused space for that file.
... View more
04-15-2016
01:20 PM
1 Kudo
@Alexander Check out this question thread. See if that helps?
... View more
04-13-2016
05:39 PM
1 Kudo
You can delete a service with the REST API through ambari. You will need to stop the service first, then use the following command: curl -u admin:admin -i -H 'X-Requested-By: ambari' -X DELETE http://sandbox.hortonworks.com:8080/api/v1/clusters/Sandbox/services/FALCON
... View more
04-12-2016
09:43 PM
@shannon luo What are the configuration attributes of your EvaluateJSONPath processor? Is you destination set to "flowfile-content" or "flowfile-attribute"? I have a processor set up to evaluate Twitter JSON, and the destination is set to "flowfile-attribute" with a number of attributes identified. Can you take a look at the attached image and see if your attributes are configured similarly?
... View more
04-12-2016
08:14 PM
@shannon luo You can use an EvaluateJSONPath processor to pull out the fields that you want in the flow. You will create a parameter for each field in the JSON you wish to put on the output flow file. The name will be what you want the field to be called on the output, and the value will be an expression equating to the field in the input JSON (e.g. Name = twitter.name, Value = $.user.screen_name takes the input user:screen_name value from the JSON and creates a variable called twitter.name on the output flow file). Thanks Erik
... View more