Member since
07-27-2015
92
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3012 | 12-15-2019 07:05 PM |
11-06-2016
07:09 AM
@mclark Thanks for your detailed answers. Paul
... View more
10-27-2016
01:29 AM
@mclark Thanks for your detailed answers. For us, we expect there are like 2 flow files will be routed to a single Node, and other 2 flow files be routed to another Node. For load balance: This is perfect for handling the execute sql that select 100W records. To load balance the SQl flow file , the constant of 100 flow files of RPG is so large. My question 1. how to configure the 100 files to 2 files etc in one node that you mentioned. 2. how many RPG is recommend in one cluster with 3 node? 3. Is there another method to implement load balance of ExecuteSQL ? Thanks. Paul
... View more
10-26-2016
10:39 AM
@mclark Thanks for your reply. Yes, I follow your picture of nifi data flow. got the load balance smartly of data, But It seem your picture be implement in root flow. Because input port => more_table_ino... must be placed to root process group. the picture will very confused if to do more table ingest . Because the best way for me is every department or every business one process group. So could you give some advise to avoid the issue? Thanks, Paul
... View more
10-18-2016
01:33 AM
@mclark Every node distribute run different SQL that meet my requirement. I do this nifi flow picture. the GenerateTableFetch execute on primary node to keep the sql is not replicate. I try to distribute the sql statements then send it to a remote progress group, But the processor on one node got all sql statements in the queued. So I cant got effect that distribute sql statements to every node. So, could you give me some advice to implement execute different sql distribute? Thanks
... View more
10-17-2016
08:03 AM
Hi This is very emergency for me.Why should choice the cluster mode If I cannot distribute sql statement to every node in the cluster? Who can give me any idea? Thanks in advance!
... View more
10-16-2016
02:57 AM
1 Kudo
Hi: From google : If you are using NiFi 1.0 you can use the GenerateTableFetch processor. It allows you to choose the "page" (aka partition) size, and will generate SQL statements, each of which will grab one "page" of data. You can route those into ExecuteSQL and it will retrieve smaller sets of results at a time. If you have a NiFi cluster, you can route GenerateTableFetch into a Remote Process Group that points at an Input Port on the same cluster, (which will distribute the SQL statements across the cluster), then the Input Port can be connected to the ExecuteSQL. This allows you to fetch rows from a table in parallel I have three nodes on my nifi cluster, I follow the post, put GenerateTableFetch on primary node(test01) to execute, and then send to remote process group on the same cluster. then output port to ExecuteSQL. But the actual behavior is ExecuteSQL was execute just in one node (test02, or test 03, or tes01). My question is how to fetch rows from a table in parallel(test02, and test 03, and tes01) . Thanks
... View more
Labels:
- Labels:
-
Apache NiFi
08-23-2016
02:47 AM
HI we pass the jdbc url as below: ****************
<credentials>
<credential name="hs2-creds" type="hive2">
<property>
<name>hive2.server.principal</name>
<value>${jdbcPrincipal}</value>
</property>
<property>
<name>hive2.jdbc.url</name>
<value>${jdbcURL}</value>
</property>
</credential>
</credentials>
****************
<action name="drop-hive-partion" cred="hs2-creds">
<hive2 xmlns="uri:oozie:hive2-action:0.2">
<jdbc-url>${jdbcURL}</jdbc-url>
<script>drop_partion.hql</script>
<param>dateStr=${dateStr}</param>
<param>database_name=${database_name}</param>
<param>table_name=${table_name}</param>
</hive2>
<ok to="merge-file"/>
<error to="fail"/>
</action> Maybe this is the problem when we use the HA hiveserver2, and HA hiveMetaStore. After, go to one hiveserver, one hiveMetastore, the issue gone. But I dont know if this is the real reason.
... View more
08-22-2016
11:24 PM
1 Kudo
Hi, Harsh, the issue gone when i package the sunjce_provider.jar of JRE into lib folder. Thanks BR Paul
... View more
08-22-2016
12:29 AM
HI we are working with security cdh5.7 cluster. I have a java programe to access the hdfs. the Java Code: conf.addResource(path1);
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromKeytab("arch_onedata@OD.BETA",
filePath));
fs = FileSystem.get(conf); When I run the code with java main in eclipse, It is normally. When I run the code with shell script call java main, the below exception hadppened: 15:17:02,718 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory:42 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)])
15:17:02,736 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory:42 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)])
15:17:02,743 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory:42 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, valueName=Time, value=[GetGroups])
15:17:02,746 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl:231 - UgiMetrics, User and group related metrics
15:17:03,244 DEBUG org.apache.hadoop.security.SecurityUtil:110 - Setting hadoop.security.token.service.use_ip to true
15:17:03,358 DEBUG org.apache.hadoop.security.Groups:301 - Creating new Groups object
15:17:03,546 DEBUG org.apache.hadoop.util.Shell:419 - setsid exited with exit code 0
15:17:03,601 DEBUG org.apache.hadoop.security.Groups:112 - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000; warningDeltaMs=5000
15:17:03,726 DEBUG org.apache.hadoop.security.UserGroupInformation:221 - hadoop login
java.io.IOException: Login failure for arch_onedata@OD.BETA from keytab ../conf/arch_onedata.keytab: javax.security.auth.login.LoginException: Algorithm HmacMD5 not available
at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) What happened? How to resolved this issue? BR Paul
... View more
Labels:
- Labels:
-
HDFS
08-10-2016
02:36 AM
HI I worked on this issue about three days, I try to find the logs, and change the heap size to 6G of hive server2 and 8G of hive metastore. the behaivor still happens. I just run 12 oozie jobs, these is may be 3 jobs concurrency. I repeat the issue: when I work on hive2 action with oozie, sometimes the hiveserver cannot connect to hive metastore. Who can show me the way? BR Paul
... View more