About Paul Yang

Paul Yang · ‎11-06-2016

@mclark Thanks for your detailed answers. Paul

Paul Yang · ‎10-27-2016

@mclark Thanks for your detailed answers. For us, we expect there are like 2 flow files will be routed to a single Node, and other 2 flow files be routed to another Node. For load balance: This is perfect for handling the execute sql that select 100W records. To load balance the SQl flow file , the constant of 100 flow files of RPG is so large. My question 1. how to configure the 100 files to 2 files etc in one node that you mentioned. 2. how many RPG is recommend in one cluster with 3 node? 3. Is there another method to implement load balance of ExecuteSQL ? Thanks. Paul

Paul Yang · ‎10-26-2016

@mclark Thanks for your reply. Yes, I follow your picture of nifi data flow. got the load balance smartly of data, But It seem your picture be implement in root flow. Because input port => more_table_ino... must be placed to root process group. the picture will very confused if to do more table ingest . Because the best way for me is every department or every business one process group. So could you give some advise to avoid the issue? Thanks, Paul

Paul Yang · ‎10-18-2016

@mclark Every node distribute run different SQL that meet my requirement. I do this nifi flow picture. the GenerateTableFetch execute on primary node to keep the sql is not replicate. I try to distribute the sql statements then send it to a remote progress group, But the processor on one node got all sql statements in the queued. So I cant got effect that distribute sql statements to every node. So, could you give me some advice to implement execute different sql distribute? Thanks

Paul Yang · ‎10-17-2016

Hi This is very emergency for me.Why should choice the cluster mode If I cannot distribute sql statement to every node in the cluster? Who can give me any idea? Thanks in advance!

Paul Yang · ‎10-16-2016

Hi: From google : If you are using NiFi 1.0 you can use the GenerateTableFetch processor. It allows you to choose the "page" (aka partition) size, and will generate SQL statements, each of which will grab one "page" of data. You can route those into ExecuteSQL and it will retrieve smaller sets of results at a time. If you have a NiFi cluster, you can route GenerateTableFetch into a Remote Process Group that points at an Input Port on the same cluster, (which will distribute the SQL statements across the cluster), then the Input Port can be connected to the ExecuteSQL. This allows you to fetch rows from a table in parallel I have three nodes on my nifi cluster, I follow the post, put GenerateTableFetch on primary node(test01) to execute, and then send to remote process group on the same cluster. then output port to ExecuteSQL. But the actual behavior is ExecuteSQL was execute just in one node (test02, or test 03, or tes01). My question is how to fetch rows from a table in parallel(test02, and test 03, and tes01) . Thanks

Paul Yang · ‎08-23-2016

HI we pass the jdbc url as below: **************** <credentials> <credential name="hs2-creds" type="hive2"> <property> <name>hive2.server.principal</name> <value>${jdbcPrincipal}</value> </property> <property> <name>hive2.jdbc.url</name> <value>${jdbcURL}</value> </property> </credential> </credentials> **************** <action name="drop-hive-partion" cred="hs2-creds"> <hive2 xmlns="uri:oozie:hive2-action:0.2"> <jdbc-url>${jdbcURL}</jdbc-url> <script>drop_partion.hql</script> <param>dateStr=${dateStr}</param> <param>database_name=${database_name}</param> <param>table_name=${table_name}</param> </hive2> <ok to="merge-file"/> <error to="fail"/> </action> Maybe this is the problem when we use the HA hiveserver2, and HA hiveMetaStore. After, go to one hiveserver, one hiveMetastore, the issue gone. But I dont know if this is the real reason.

Paul Yang · ‎08-22-2016

Hi, Harsh, the issue gone when i package the sunjce_provider.jar of JRE into lib folder. Thanks BR Paul

Paul Yang · ‎08-22-2016

HI we are working with security cdh5.7 cluster. I have a java programe to access the hdfs. the Java Code: conf.addResource(path1); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromKeytab("arch_onedata@OD.BETA", filePath)); fs = FileSystem.get(conf); When I run the code with java main in eclipse, It is normally. When I run the code with shell script call java main, the below exception hadppened: 15:17:02,718 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory:42 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of successful kerberos logins and latency (milliseconds)]) 15:17:02,736 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory:42 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, valueName=Time, value=[Rate of failed kerberos logins and latency (milliseconds)]) 15:17:02,743 DEBUG org.apache.hadoop.metrics2.lib.MutableMetricsFactory:42 - field org.apache.hadoop.metrics2.lib.MutableRate org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with annotation @org.apache.hadoop.metrics2.annotation.Metric(about=, always=false, sampleName=Ops, type=DEFAULT, valueName=Time, value=[GetGroups]) 15:17:02,746 DEBUG org.apache.hadoop.metrics2.impl.MetricsSystemImpl:231 - UgiMetrics, User and group related metrics 15:17:03,244 DEBUG org.apache.hadoop.security.SecurityUtil:110 - Setting hadoop.security.token.service.use_ip to true 15:17:03,358 DEBUG org.apache.hadoop.security.Groups:301 - Creating new Groups object 15:17:03,546 DEBUG org.apache.hadoop.util.Shell:419 - setsid exited with exit code 0 15:17:03,601 DEBUG org.apache.hadoop.security.Groups:112 - Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000; warningDeltaMs=5000 15:17:03,726 DEBUG org.apache.hadoop.security.UserGroupInformation:221 - hadoop login java.io.IOException: Login failure for arch_onedata@OD.BETA from keytab ../conf/arch_onedata.keytab: javax.security.auth.login.LoginException: Algorithm HmacMD5 not available at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:962) What happened? How to resolved this issue? BR Paul

Paul Yang · ‎08-10-2016

HI I worked on this issue about three days, I try to find the logs, and change the heap size to 6G of hive server2 and 8G of hive metastore. the behaivor still happens. I just run 12 oozie jobs, these is may be 3 jobs concurrency. I repeat the issue: when I work on hive2 action with oozie, sometimes the hiveserver cannot connect to hive metastore. Who can show me the way? BR Paul

Online	Offline
Last Visited	‎06-19-2020 02:08 AM

Member Since	‎07-27-2015 07:07 PM
Last Visited	‎06-19-2020 02:08 AM
Posts	92
Kudos received	5

Cloudera Community

Re: The ExecuteSql processor run to hang sometime ...

Re: How to fetch rows from a table in parallel wh...

Re: How to fetch rows from a table in parallel wh...

Re: How to fetch rows from a table in parallel wh...

Re: How to fetch rows from a table in parallel wh...

Re: How to fetch rows from a table in parallel wh...

How to fetch rows from a table in parallel when n...

Re: How to resolve 'SASL negotiation failure' of ...

Re: LoginException: Algorithm HmacMD5 not availabl...

LoginException: Algorithm HmacMD5 not available wh...

Re: How to resolve 'SASL negotiation failure' of ...