Member since
07-04-2016
40
Posts
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1302 | 09-16-2016 05:31 AM |
10-07-2021
04:28 AM
Hi @shivanageshch EMR is not part of cloudera. If you are using CDP/HDP cluster, go through the following tutorial. Livy Configuration: Add the following properties to the livy.conf file: # Use this keystore for the SSL certificate and key.
livy.keystore = <path-to-ssl_keystore>
# Specify the keystore password.
livy.keystore.password = <keystore_password>
# Specify the key password.
livy.key-password = <key_password> Access Livy Server: After enabling SSL over Livy server. Livy server should be accessible over https protocol. https://<livy host>:<livy port> References: 1. https://docs.cloudera.com/cdp-private-cloud-base/latest/security-encrypting-data-in-transit/topics/livy-configure-tls-ssl.html Was your question answered? Make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
09-19-2016
05:32 PM
2 Kudos
@Shiva Nagesh I agree with @hkropp. While you can, it does not mean you should AS-IS. You need to account for shortcomings, architecturally and resource management-wise, to not mention security concerns, bringing more services on the edge nodes than usually manageable. I get it that you have capacity on those edge nodes and would like to use them as a BURST in case of need. You could consider DOCKER containers on your EDGE SERVERS as such you can separate the true edge nodes from workers on demand. Those DOCKER containers would use a WORKER template and will be spinned-up quickly as an additional node, something similar with what you would do in a cloud.
... View more
08-04-2016
09:29 AM
Yeah see above. I think you just have to have a client like Hive that opens a TezClient, creates an Application master and then submits more DAGs to it. Specifically in Hive you have per default one Tez session per jdbc connection. So if you run multiple queries over the same jdbc connection they use the same Tez client, same Tez session and as long as the timeout is not reached the same application master. Yes I think it sounds a bit more magical than it is, the reuse is just the session mode where the client can send multiple DAGs to the same Tez AM. As said in LLAP you will have shared long running processes that can be discovered so its a bit different. http://hortonworks.com/blog/introducing-tez-sessions/
... View more
08-12-2016
01:02 PM
1 Kudo
"If it runs in the Appmaster, what exactly are "the computed input splits" that jobclient stores into HDFS while submitting the Job ??" InputSplits are simply the work assignments of a mapper. I.e. you have the inputfolder /in/file1
/in/file2 And assume file1 has 200MB and file2 100MB ( default block size 128MB ) So the InputFormat per default will generate 3 input splits ( on the appmaster its a function of InputFormat) InputSplit1: /in/file1:0:128000000
InputSplit2: /in/file1:128000001:200000000
InputSplit3:/in/file2:0:100000000 ( per default one split = 1 block but he COULD do whatever he wants. He does this for example for small files where he uses MultiFileInputSplits which span multiple files ) "And how map works if the split spans over data blocks in two different data nodes??" So the mapper comes up ( normally locally to the block ) and starts reading the file with the offset provided. HDFS by definition is global and if you read non local parts of a file he will read it over the network but local is obviously more efficient. But he COULD read anything. The HDFS API makes it transparent. So NORMALLY the InputSplit generation will be done in a way that this does not happen. So data can be read locally but its not a necessary precondition. Often maps are non local ( you can see that in the resource manager ) and then he can simply read the data over the network. The API call is identical. Reading an HDFS file in Java is the same as reading a local file. Its just an extension to the Java FileSystem API.
... View more
07-26-2016
12:58 PM
Very neatly explained.!
... View more
07-19-2016
02:33 PM
1) Maintenance mode is turned ON at a service/node level. They are turned ON to perform the following activities but not limitied to OS maintenance configuration changes Decommission a node Generally speaking, when the maintenance mode is switched ON, the alerts are suppressed and no bulk operations are performed on the node. However, the node is still listed in NN's DN list. 2) Decommissioning a DN is highly recommended when the maintenance mode is turned ON (to avoid data loss). When the DN is set to decommissioning state, NN starts copying blocks to other DN's. The DN will be decommissioned only when NN completes the copy process. This activity is performed to maintain the replication factor policy. 3) Deletion of a DN can be performed after successful completion of decommissioning a DN. At this point, DN is completely removed from the cluster and NN's list. 4) 'Rebalancer' is a manual activity performed on the cluster to rebalance the data between the under utilizied and over utilized DN's
... View more
09-02-2016
10:10 AM
@Benjamin Leonhardi "but it is written in parallel to the other two nodes in a chain" Can you explain this ? What do you mean by a chain? Are you telling its sequential ?
... View more
11-17-2017
11:24 AM
Nope, reducers don't communicate with each other and neither the mappers do. All of them runs in a separate JVM containers and don't have information of each other. AppMaster is the demon which takes care and manage these JVM based containers (Mapper/Reducer).
... View more