Member since
01-02-2017
18
Posts
3
Kudos Received
0
Solutions
07-27-2023
07:42 AM
is this solution is fit in streaming more than puthive3ql for about 10 GB during the day???
... View more
10-04-2021
05:24 AM
Hello Team, @ahadjidj Running this suggested command mvn clean install -Pinclude-atlas -DskipTests referrinng to the pom xml file here located: /work/nar/framework/nifi-framework-nar-1.14.0.nar-unpacked/META-INF/maven/org.apache.nifi/nifi-framework-nar/pom.xml I get this error message : [WARNING] The requested profile "include-atlas" could not be activated because it does not exist. [ERROR] Failed to execute goal org.apache.nifi:nifi-nar-maven-plugin:1.3.1:nar (default-nar) on project nifi-evtx-nar: The plugin org.apache.nifi:nifi-nar-maven-plugin:1.3.1 requires Maven vers ion 3.1.0 -> [Help 1] I got this message: The requested profile "include-atlas" could not be activated because it does not exist. Could you please provide any hint? As we need to introduce the lineage into Atlas with info coming from Nifi, this area in currently on priority as we are stucked on this. Thanks a lot! Daniele.
... View more
10-02-2017
08:21 AM
1 Kudo
Hi @Joe Harvy, Yarn/Other tenent Application not aware of any of the other tenents resource usage, this will be much bigger problem when there is swap defined, as the OS Terminates(technically "sacrifice" ) one of the process based out of age and amount of resources free up for the sacrifice. So it become much critical to organize the applications in a multi tenant Environment. there are multiple things needs to be considered while managing these kind of environments, such as memory CPU and Disk bottlenecks. Memory Usage : Interns of the Memory usage, we need to subtract the each component's maximum Heap allocation (-xmx ) and add additional resources such as 2G- for OS, 2GB -For DataNode, 2GB - Ambari Metrics etc then for HBASE additional BucketCache(off heap) + Region Server Heap Size, and similar for Accumulo and Storm etc .. After all subtracted from total memory, remaining can be allocated for Yarn, example of this has been well documented at HBASE cache configuration Here CPU usage : This is Bit tricky as, Configuration of this value upfront may not be straight forward. need to compute the SAR / Ambari Metrics information, with respect to CPU usage and allocate the remaining CPU for the Yarn. At the same time verify the load average on the host, should not be exceed too high, in cases that should be controlled with amount of parallel work happening form apps/YARN according to the priority. - this is where yarn scheduler comes handy. Disk Usage : Have a keen eye on CPU wait IO, any of the increase in that value cased by the low disk latency, better option is not share the disk for multiple purposes ( ex : for data nodes other application activities ), will result in queuing up the resources. Hope this helps!!
... View more
06-13-2017
06:53 AM
I mean, Ambari itself is not upgradable, but the Hadoop cluster managed by it is yes.
... View more
03-12-2017
09:15 PM
2 Kudos
Hi @Joe Harvy The easiest way to achieve this is to pull data from the unsecure cluster rather than push to the secure cluster. You can achieve this by using an output port in the unsecure cluster and a remote process group that connect to this outport in the secure cluster. Since the RPG is directed to an unsecure cluster, no need to config certs. The other approach is to configure your unsecure cluster by setting the Keystore/Truststore as you did for the secure cluster but without activating SSL. You will need also to add nodes in the secure cluster and give them the right to retrieve S2S details (see policies) Edit: I've been asked this question several times by customers so I wrote a tutorial on these two option : https://community.hortonworks.com/articles/88473/site-to-site-communication-between-secured-https-a.html
... View more
01-27-2017
05:45 PM
Thanks @Eugene Koifman
Can you point to an updated complete and updated documentation/book on Hive features ? (ACID, LLAP, etc)
... View more
02-28-2017
05:35 AM
2 Kudos
One another way to backup/restore specific hive tables would be to use 'show create table' to back up DDL which could be used to recreate the table. Then the saved hdfs files could be dropped to warehouse dir to finish the table restore. beeline -u jdbc:hive2://<cluster-name>:<port#>/<db-name> --outputformat=csv -e "show create table <table-name>;" > <table_name>.ddl
... View more