About msumbul1

msumbul1 · ‎06-13-2018

Hi @Oleg Parkhomenko, The following link describe how you can secure yarn queue to be sure that only specific user can submit job to specific queue, it done with Ranger: https://community.hortonworks.com/articles/10797/apache-ranger-and-yarn-setup-security.html Normaly if you are in a kerberos environment, you should not have job running as dr who Miche

msumbul1 · ‎06-13-2018

Hi @rajat puchnanda, Based on your example, you are trying to do a "join". Nifi is not an ETL tool but more a flow manager, it allow to move data accros system and to do some very simple transformation like csv to avro. You should not do computation or join with Nifi. For you usecase it would be better to use another tools like hive, spark,... Best regards, Michel

msumbul1 · ‎06-13-2018

Hi @rajat puchnanda, If by merging you means doing an union, you can use the processor mergecontent if the two csv have the same structure. Best regards, Michel

msumbul1 · ‎06-13-2018

Hi @Oleg Parkhomenko, You should be able to kill al the queue job with this script: for app in `yarn application -list | awk '$6 == "ACCEPTED" { print $1 }'`; do yarn application -kill "$app"; done Just put in a scri[t .sh and run it wit ha user that are allow to kill app Best regards, Michel

msumbul1 · ‎06-13-2018

HI, Usually timeout happen because the cluster is undersized or no dedicated node for hbase or the ingestion is so quick that hbase need to do a lot of split of region. - Do you manage a lot of data with hbase? if yes, idd you pre-split your table? - If i was you I would also have a look to the cpu, memory and io disk usage. If you dont have anydedicated nodes for hbase other hadoop component like spark, hive, etc can have an impact. As a general best practice, you should have dedicated node for hbase with enough cpu and several disk Mest regards, Michel

msumbul1 · ‎11-15-2017

@Arti Wadhwani Do you have the answer to your question? I'm trying to do that, connection with zookeeper discovery and specifying the tez queue but it doesn't work

msumbul1 · ‎11-06-2017

Hi @Ennio Sisalli, Before running your query that save the result in HDFS, can you try to set the following parameter: set hive.cli.print.header=true; Best regards, Michel

msumbul1 · ‎10-17-2017

Hello, I try to ingest data in hive with nifi (from json data =>convert jsontosql => puthiveql) and I got this error message from the puthiveql processor: Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.ParseException:line 1:221 cannot recognize input near '?' ',' '?' in value row constructor if I look at the input flowfile of the puthiveql it has the correct insert query INSERT INTO nifilog (objectid, platform, bulletinid, bulletincategory, bulletingroupid, bulletinlevel, bulletinmessage, bulletinnodeid, bulletinsourceid, bulletinsourcename, bulletinsourcetype, bulletintimestamp) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) each flowfile has all the needed attribute: sql.args.N.type & . sql.args.N.value Any idea how to debug/solve this?

msumbul1 · ‎10-17-2017

The solution is used the "SiteToSiteBulletinReportingTask" in the reporting task. It can send all the bulletin to a nifi instance. It can be the same instance the nifi that generate it. It will send it to a specific input port in json, then you will be able to process it. It has all the attributed needed: Here an example [{"objectId":"9c8e75e6-eb5a-4a52-9d4a-a3d3b7f0c80f", "platform":"nifi", "bulletinId":305, "bulletinCategory":"Log Message", "bulletinGroupId":"24a8726b-015f-1000-ffff-ffffae66ea1c", "bulletinLevel":"ERROR", "bulletinMessage":"PutHDFS[id=24b463f8-015f-1000-ffff-ffffd09bd856] PutHDFS[id=24b463f8-015f-1000-ffff-ffffd09bd856] failed to invoke @OnScheduled method due to java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.; processor will not be scheduled to run for 30 seconds: java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.", "bulletinNodeId":"ede4721c-30fe-4879-b22e-20bfe602c615", "bulletinSourceId":"24b463f8-015f-1000-ffff-ffffd09bd856", "bulletinSourceName":"PutHDFS", "bulletinSourceType":"PROCESSOR", "bulletinTimestamp":"2017-10-17T08:16:48.945Z"},

msumbul1 · ‎10-16-2017

Hi @Abdelkrim Hadjidj, Thanks for your reply, my objective is to get the error message which can be many thing, hostnotfound, parsing error, connection refuse, etc for the same failure relationship. Michel

Online	Offline
Last Visited	‎08-28-2019 06:41 PM

Member Since	‎07-10-2017 11:42 AM
Last Visited	‎08-28-2019 06:41 PM
Posts	78
Kudos received	6

Cloudera Community

Re: Processor failures: how to get the error on th...

Re: Moving a hive database from one cluster to ano...

Re: Apache spark read in a file from hdfs as one l...

Re: Metron - Elasticseach service unavailable (mes...

Re: Yarn default queue full of unknown jobs

Re: merge too csv files in nifi

Re: merge too csv files in nifi

Re: Yarn default queue full of unknown jobs

Re: Region Servers start throwing Timeout exceptio...

Re: How to Enable Zookeeper Discovery for HiveServ...

Re: How to export table column heaers on HDFS

putHiveQL error

Re: Processor failures: how to get the error on th...

Re: Processor failures: how to get the error on th...