About joe_harvyy

moahmedhassaan · ‎07-27-2023

is this solution is fit in streaming more than puthive3ql for about 10 GB during the day???

dansteu · ‎10-04-2021

Hello Team, @ahadjidj Running this suggested command mvn clean install -Pinclude-atlas -DskipTests referrinng to the pom xml file here located: /work/nar/framework/nifi-framework-nar-1.14.0.nar-unpacked/META-INF/maven/org.apache.nifi/nifi-framework-nar/pom.xml I get this error message : [WARNING] The requested profile "include-atlas" could not be activated because it does not exist. [ERROR] Failed to execute goal org.apache.nifi:nifi-nar-maven-plugin:1.3.1:nar (default-nar) on project nifi-evtx-nar: The plugin org.apache.nifi:nifi-nar-maven-plugin:1.3.1 requires Maven vers ion 3.1.0 -> [Help 1] I got this message: The requested profile "include-atlas" could not be activated because it does not exist. Could you please provide any hint? As we need to introduce the lineage into Atlas with info coming from Nifi, this area in currently on priority as we are stucked on this. Thanks a lot! Daniele.

bkosaraju · ‎10-02-2017

Hi @Joe Harvy, Yarn/Other tenent Application not aware of any of the other tenents resource usage, this will be much bigger problem when there is swap defined, as the OS Terminates(technically "sacrifice" ) one of the process based out of age and amount of resources free up for the sacrifice. So it become much critical to organize the applications in a multi tenant Environment. there are multiple things needs to be considered while managing these kind of environments, such as memory CPU and Disk bottlenecks. Memory Usage : Interns of the Memory usage, we need to subtract the each component's maximum Heap allocation (-xmx ) and add additional resources such as 2G- for OS, 2GB -For DataNode, 2GB - Ambari Metrics etc then for HBASE additional BucketCache(off heap) + Region Server Heap Size, and similar for Accumulo and Storm etc .. After all subtracted from total memory, remaining can be allocated for Yarn, example of this has been well documented at HBASE cache configuration Here CPU usage : This is Bit tricky as, Configuration of this value upfront may not be straight forward. need to compute the SAR / Ambari Metrics information, with respect to CPU usage and allocate the remaining CPU for the Yarn. At the same time verify the load average on the host, should not be exceed too high, in cases that should be controlled with amount of parallel work happening form apps/YARN according to the priority. - this is where yarn scheduler comes handy. Disk Usage : Have a keen eye on CPU wait IO, any of the increase in that value cased by the low disk latency, better option is not share the disk for multiple purposes ( ex : for data nodes other application activities ), will result in queuing up the resources. Hope this helps!!

rkovacs · ‎06-13-2017

I mean, Ambari itself is not upgradable, but the Hadoop cluster managed by it is yes.

ahadjidj · ‎03-12-2017

Hi @Joe Harvy The easiest way to achieve this is to pull data from the unsecure cluster rather than push to the secure cluster. You can achieve this by using an output port in the unsecure cluster and a remote process group that connect to this outport in the secure cluster. Since the RPG is directed to an unsecure cluster, no need to config certs. The other approach is to configure your unsecure cluster by setting the Keystore/Truststore as you did for the secure cluster but without activating SSL. You will need also to add nodes in the secure cluster and give them the right to retrieve S2S details (see policies) Edit: I've been asked this question several times by customers so I wrote a tutorial on these two option : https://community.hortonworks.com/articles/88473/site-to-site-communication-between-secured-https-a.html

joe_harvyy · ‎01-27-2017

Thanks @Eugene Koifman Can you point to an updated complete and updated documentation/book on Hive features ? (ACID, LLAP, etc)

asreekumar · ‎02-28-2017

One another way to backup/restore specific hive tables would be to use 'show create table' to back up DDL which could be used to recreate the table. Then the saved hdfs files could be dropped to warehouse dir to finish the table restore. beeline -u jdbc:hive2://<cluster-name>:<port#>/<db-name> --outputformat=csv -e "show create table <table-name>;" > <table_name>.ddl

Online	Offline
Last Visited	‎03-24-2018 10:28 PM

Member Since	‎01-02-2017 03:29 PM
Last Visited	‎03-24-2018 10:28 PM
Posts	18
Kudos received	3

Cloudera Community

Re: From CSV to Hive via NiFi

Re: Reporting NiFi lineage to Atlas

Re: Yarn behaviour with external tools

Re: Upgrade a cluster deployed by Cloudbreak

Re: NiFi S2S between secure and unsecure clusters

Re: Hive insert and ACID

Re: Backup specific Hive table