Member since
07-10-2017
78
Posts
6
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3135 | 10-17-2017 12:17 PM | |
4315 | 09-13-2017 12:36 PM | |
4309 | 07-14-2017 09:57 AM | |
1811 | 07-13-2017 12:52 PM |
07-03-2018
02:53 PM
Hi @Ya ko, Why not considering the new ORC. https://www.slideshare.net/Hadoop_Summit/orc-improvement-in-apache-spark-23-95295487 Then you will get the best performance when querying from hive. And yes you have to define your table with all heh fields. The slide 20 show how to specify the new orc library, you will have to just all the location setting to point where your data will be stored in hdfs. Michel
... View more
06-13-2018
02:38 PM
Hi @Oleg Parkhomenko, The following link describe how you can secure yarn queue to be sure that only specific user can submit job to specific queue, it done with Ranger: https://community.hortonworks.com/articles/10797/apache-ranger-and-yarn-setup-security.html Normaly if you are in a kerberos environment, you should not have job running as dr who Miche
... View more
06-13-2018
02:29 PM
Hi @rajat puchnanda, Based on your example, you are trying to do a "join". Nifi is not an ETL tool but more a flow manager, it allow to move data accros system and to do some very simple transformation like csv to avro. You should not do computation or join with Nifi. For you usecase it would be better to use another tools like hive, spark,... Best regards, Michel
... View more
06-13-2018
02:22 PM
Hi @Zack Riesland, Indeed increasing the the number of bucket will increase the parallelism to write to hdfs (then to the disk). If I was you I would have a look at the disk/iops usage, if you try to load a lot of data and you have only one disk it can take a long time. generally its recommended to have multiple disk per node to avoid iops congestion. Whats the exact query that you are doing to insert the data? does it contain some casting? whats the size of your data? Also as a good optimisation is to use ORC table and not avro. you the loading face it should not change a lot but when you are going to query your data that will make the difference, Michel
... View more
06-13-2018
02:09 PM
Hi @rajat puchnanda, If by merging you means doing an union, you can use the processor mergecontent if the two csv have the same structure. Best regards, Michel
... View more
06-13-2018
02:07 PM
Hi @Oleg Parkhomenko, You should be able to kill al the queue job with this script: for app in `yarn application -list | awk '$6 == "ACCEPTED" { print $1 }'`; do yarn application -kill "$app"; done Just put in a scri[t .sh and run it wit ha user that are allow to kill app Best regards, Michel
... View more
06-13-2018
02:01 PM
HI, Usually timeout happen because the cluster is undersized or no dedicated node for hbase or the ingestion is so quick that hbase need to do a lot of split of region. - Do you manage a lot of data with hbase? if yes, idd you pre-split your table? - If i was you I would also have a look to the cpu, memory and io disk usage. If you dont have anydedicated nodes for hbase other hadoop component like spark, hive, etc can have an impact. As a general best practice, you should have dedicated node for hbase with enough cpu and several disk Mest regards, Michel
... View more
12-10-2017
06:18 PM
Hi @Ashish Singh, Can you show the command that you used to submit your spark application? Michel
... View more
11-15-2017
10:16 AM
@Arti Wadhwani Do you have the answer to your question? I'm trying to do that, connection with zookeeper discovery and specifying the tez queue but it doesn't work
... View more
11-06-2017
03:51 PM
Hi @Ennio Sisalli, Before running your query that save the result in HDFS, can you try to set the following parameter: set hive.cli.print.header=true; Best regards, Michel
... View more
11-01-2017
03:19 PM
Hi @Simon Jespersen, Did you restarted the nifi once you added the new nifi property or when you modified the file? Nifi need to be restarted in order to load the new parameter. Michel
... View more
10-17-2017
12:25 PM
Hello, I try to ingest data in hive with nifi
(from json data =>convert jsontosql => puthiveql) and I got this error message from the puthiveql processor: Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.parse.ParseException:line 1:221 cannot recognize input near '?' ',' '?' in value row constructor if I look at the input flowfile of the puthiveql it has the correct insert query INSERT INTO nifilog (objectid, platform, bulletinid, bulletincategory, bulletingroupid, bulletinlevel, bulletinmessage, bulletinnodeid, bulletinsourceid, bulletinsourcename, bulletinsourcetype, bulletintimestamp) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) each flowfile has all the needed attribute: sql.args.N.type & . sql.args.N.value Any idea how to debug/solve this?
... View more
Labels:
- Labels:
-
Apache NiFi
10-17-2017
12:17 PM
The solution is used the "SiteToSiteBulletinReportingTask" in the reporting task. It can send all the bulletin to a nifi instance. It can be the same instance the nifi that generate it. It will send it to a specific input port in json, then you will be able to process it. It has all the attributed needed: Here an example [{"objectId":"9c8e75e6-eb5a-4a52-9d4a-a3d3b7f0c80f",
"platform":"nifi",
"bulletinId":305,
"bulletinCategory":"Log Message",
"bulletinGroupId":"24a8726b-015f-1000-ffff-ffffae66ea1c",
"bulletinLevel":"ERROR",
"bulletinMessage":"PutHDFS[id=24b463f8-015f-1000-ffff-ffffd09bd856] PutHDFS[id=24b463f8-015f-1000-ffff-ffffd09bd856] failed to invoke @OnScheduled method due to java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.; processor will not be scheduled to run for 30 seconds: java.lang.RuntimeException: Failed while executing one of processor's OnScheduled task.",
"bulletinNodeId":"ede4721c-30fe-4879-b22e-20bfe602c615",
"bulletinSourceId":"24b463f8-015f-1000-ffff-ffffd09bd856",
"bulletinSourceName":"PutHDFS",
"bulletinSourceType":"PROCESSOR",
"bulletinTimestamp":"2017-10-17T08:16:48.945Z"},
... View more
10-16-2017
12:50 PM
Hi @Abdelkrim Hadjidj, Thanks for your reply, my objective is to get the error message which can be many thing, hostnotfound, parsing error, connection refuse, etc for the same failure relationship. Michel
... View more
10-16-2017
12:33 PM
Hi @Gayathri Devi, I can't give you more idea than in my previous comment. Because It depends on the system specification that you have, the other load on the cluster, size of the data. size of each lines, etc... The % that I gave you is based on benchmark that I made in previous project and blog/forum that I read in the past. The best that you can do is a test. I would recommend you to do a test with compression and another without to see the impact that it have on your environnement. Moreover, be careful with Hive on top of hbase. You might have bad performance because often it start a full scan of the hbase table, which is an expensive operation. Michel
... View more
10-16-2017
12:25 PM
Hi, If a processor failed and routed the flowfile to the relationship failure. Is there an attribute "error"? If it's the case for some processor how to know if they have it? For example, the puthdfs, I don't see anything in the documentation. doc puthdfs Is there another way to have the reason of the failure attached to the flowfile, Thanks, Michel
... View more
Labels:
- Labels:
-
Apache NiFi
10-13-2017
12:59 PM
1 Kudo
Hi @Gayathri Devi, The operation of compression/decompression will increase the cpu load of around 5-10 % For the memory, is will decrease the disk space by around 70%, more over the size on the disk will be smaller then you will need less iops. Because of that you should see a general improvement of your performance. Michel
... View more
10-06-2017
08:53 AM
Hi @Gobi Subramani, Is it normal that in your code?: String node = "x.x.x.x:6667"; I think it should be an ip or hostname. Michel
... View more
09-27-2017
01:48 PM
@Hemant, You said that you were able to interact with hdfs from the host that has nifi. How did you get the ticket to interact wit hdfs? Are you able to create a ticket with the user and keytab mentionned in the configuration or the processor? (Just to be sure that the key tab is working well
... View more
09-26-2017
03:25 PM
@Hemant for the user do you have this structure: hive/FQDN@MY_REALM ?
... View more
09-26-2017
02:55 PM
For info, I think that once you configure that property, you need to restart nifi
... View more
09-26-2017
02:54 PM
Hi @Hemant, Did you configure the nifi.kerberos.krb5.file in your nifi.properties?
... View more
09-26-2017
02:39 PM
Hi @Hemant, No Nifi doesn't need to be kerberized but you need to install the kerberos client on the os (where nifi is installed) in order to be able to request a ticket. Michel
... View more
09-21-2017
07:44 AM
@nallen The pcap_replay is install as a service by default with HCP 1.2? If not, how to install it manually? Thanks
... View more
09-16-2017
12:20 PM
Hi @Rahul Gupta, Did you managed this? if yes can you accept the answer? 🙂 Thanks, Michel
... View more
09-16-2017
12:18 PM
Hi @n c, You are welcome! 🙂 . I don't think there's other object in hive (but not sure) there's the UDF for that you need to export the jar and you use for UDF in you first cluster. May I ask you to accept my answer? 🙂 Thanks! Michel
... View more
09-15-2017
01:20 PM
Hi, I saw that it's possible to use pycapa script in order to capture data and send it to kafka. Do you know if there's an easy way to directly ingest pcap file that has been generated by another system? Like a program that read the pcap file and send it to kafka? Or another manner to do it? Thanks Michel
... View more
- Tags:
- CyberSecurity
- Metron
Labels:
- Labels:
-
Apache Metron
09-14-2017
09:51 AM
Hi @Nagesh Gollapudi, Can you do a screenshot of the configuration of your different processor in Nifi? 🙂 Michel
... View more
09-14-2017
09:31 AM
Hi @Mrinmoy Choudhury, Does my answer reply to your question? If yes May I ask you to accept it 🙂
... View more
09-14-2017
09:29 AM
Hi @n c, You don't have to copy the metadata. Copy the folder structure with all the data to the new cluster. Recreate the table and don't forget to do a compute statistic on the table. This will recreate a lot of metadata in order to have the CBO working fine. There's no need for export tool because you can directly copy the data from HDFS 🙂 Michel
... View more