1973
Posts
1225
Kudos Received
124
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1843 | 04-03-2024 06:39 AM | |
| 2876 | 01-12-2024 08:19 AM | |
| 1585 | 12-07-2023 01:49 PM | |
| 2349 | 08-02-2023 07:30 AM | |
| 3241 | 03-29-2023 01:22 PM |
12-12-2016
08:01 PM
1 Kudo
The data in the content repository should be controlled through the settings in nifi.properties: # Content Repository
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=10 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=./content_repository
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false
nifi.content.viewer.url=/nifi-content-viewer/ Is the data in the archive folders (underneath each of the folders in the content_repository)? If so you can turn off archiving using the property above, or reduce the thresholds for how long to hold on to archived data. If the data is not in the archive folders then it is still deemed to be active in the flow, since multiple flowfiles are bundled together in a content claim, you can have a content claim with some flow files still active and it can't archive the whole thing until they are all no longer in the flow.
... View more
12-14-2016
04:05 PM
@Hedart - Does this answer help answer your questions? If so, can you please Accept my answer? If there are further questions you'd like to ask, please feel free to use @Tom McCuch when you ask them, so I get notified. Thanks. Tom
... View more
11-22-2017
05:34 AM
Hi, This issue was resolved by the settings as follows: hadoop.proxyuser.root.hosts=* You can also see the answer on the below comment. https://community.hortonworks.com/comments/144449/view.html
... View more
11-22-2016
11:00 PM
2 Kudos
Hivemall: Machine Learning on Hive, Pig and Spark SQL Install HiveMall https://github.com/myui/hivemall/wiki/Installation Pick latest release https://github.com/myui/hivemall/releases # Setup Your Environment $HOME/.hiverc
add jar /home/myui/tmp/hivemall-core-xxx-with-dependencies.jar;
source /home/myui/tmp/define-all.hive;
# Create a directory in HDFS for the JAR
hadoop fs -mkdir -p /apps/hivemall
hdfs dfs -chmod -R 777 /apps/hivemall
cp hivemall-core-0.4.2-rc.2-with-dependencies.jar hivemall-with-dependencies.jar
hdfs dfs -put hivemall-with-dependencies.jar /apps/hivemall/
hdfs dfs -put hivemall-with-dependencies.jar /apps/hive/warehouse/
hdfs dfs -put hivemall-core-0.4.2-rc.2-with-dependencies.jar /apps/hivemall show functions "hivemall.*";
+-----------------------------------------+--+
| tab_name |
+-----------------------------------------+--+
| hivemall.add_bias |
| hivemall.add_feature_index |
| hivemall.amplify |
| hivemall.angular_distance |
| hivemall.angular_similarity |
| hivemall.argmin_kld |
| hivemall.array_avg |
...
| hivemall.x_rank |
| hivemall.zscore |
+-----------------------------------------+--+
149 rows selected (0.054 seconds)
Once installed the hivemall database will be filled with great functions to use for general processing as well as machine learning via SQL. An example function is for Base91 encoding: select hivemall.base91(hivemall.deflate('aaaaaaaaaaaaaaaabbbbccc'));
+----------------------+--+
| _c0 |
+----------------------+--+
| AA+=kaIM|WTt!+wbGAA |
+----------------------+--+ A more useful example is I ran tokenize on messages in a Hive table that I store tweets in. select hivemall.tokenize(tweets.msg) from tweets limit 10;
| ["water","pipe","break","#TEST","#TEST","#WATERMAINBREAK","FakeMockTown","NJ","https","//t","co/hLYaJnvAdH"] |
| ["RT","@CNNNewsource","Main","water","pipe","break","causes","flooding","sinkhole","swallows","car","in","Hoboken","NJ","NE-009MO","https","//t","co/SDALHbs7kx"] |
| ["RT","@PaaSDev","#TEST","water","pipe","break","#TEST","Water","Main","Break","in","Fakeville","NJ","https","//t","co/ekbNXK1VgI"] |
| ["Water","break","on","a","mountain","run","tonight","#saopaulo","#correr","#run","sdfdf,"https","//t","co/dvND6BkXl4"] |
| ["RT","@PaaSDev","water","pipe","break","#TEST","#TEST","#WATERMAINBREAK","FakeMockTown","NJ","https","//t","co/hLYaJnvAdH"] |
| ["Route","33","In","Wilton","Closed","Due","To","Water","Main","Break","https","//t","co/UQMksljRUm","https","//t","co/HRhin2QyOk"] |
| ["water","pipe","break","nj","#TEST","#TEST","#WATERMAINBREAK","https","//t","co/kvYNTG7wHf"] |
| ["water","pipe","break","nj","#TEST","test","https","//t","co/zjgjSaNvUz"] |
| ["#TEST","#watermainbreak","water","main","break","pipe","test","nj","https","//t","co/qZEdnhlgYG"] |
| ["Customers","of","Langley","Water","and","Sewer","District","under","boil","water","advisory","-","Aiken","Standard","https","//t","co/yh3COaC70M","https","//t","co/LPRHBrtaTA"] |
10 rows selected (4.848 seconds) For more examples of usage: https://github.com/myui/hivemall/wiki/webspam-dataset
I will be using HiveMall in future projects, I am expecting to include into an NiFi workflow for process NLP and other machine learning operations.
The project has just joined Apache.
... View more
Labels:
11-22-2016
10:44 PM
3 Kudos
Download URLCrazy (http://www.morningstarsecurity.com/downloads/urlcrazy-0.5.tar.gz)
An Example Command Line Run for URLCrazy
[root@tspanndev13 security]# ./url.sh dataflowdeveloper.com
Typo Type,Typo,Valid,Pop,DNS-A,CC-A,Country-A,DNS-MX,Extn
Character Omission,daaflowdeveloper.com,true,,,?,,com,
Character Omission,dataflodeveloper.com,true,,,?,,com,
Character Omission,dataflowdeeloper.com,true,,,?,,com,
Character Omission,dataflowdeveloer.com,true,,,?,,com,
Character Omission,dataflowdevelope.com,true,,,?,,com,
Character Omission,dataflowdeveloper.cm,true,,,?,,cm,
Character Omission,dataflowdeveloper.co,false,,,?,,,
Character Omission,dataflowdeveloper.om,false,,,?,,,
Character Omission,dataflowdevelopercom,false,,,?,,,
...
Shell Script to Call From Apache NiFi
/opt/demo/security/urlcrazy-0.5/urlcrazy -i -f csv -p $@
An Example Command Line Run for NSLookup
Non-authoritative answer:
sparkdeveloper.com text = "v=spf1 ip4:00.000.0.0/24 ip4:00.000.00.0/24 ip4:11.111.111.0/19 ?all"
The Final JSON Output:
{
"path" : "./",
"execution.command" : "/opt/demo/security/url.sh",
"urlcrazy" : "Typo Type,Typo,Valid,Pop,DNS-A,CC-A,Country-A,DNS-MX,Extn\nCharacter Omission,sarkdeveloper.com,true,,,?,,com,\nCharacter Omission,spakdeveloper.com,true,,,?,,com,\nCharacter Omission,spardeveloper.com,true,,,?,,com,\nCharacter Omission,sparkdeeloper.com,true,,,?,,com,\nCharacter Omission,sparkdeveloer.com,true,,,?,,com,\nCharacter Omission,sparkdevelope.com,true,543,,?,,com,\nCharacter Omission,sparkdeveloper.cm,true,214000,,?,,cm,\nCharacter Omission,sparkdeveloper.co,false,,,?,,,\nCharacter Omission,sparkdeveloper.om,false,,,?,,,\nCharacter Omission,sparkdevelopercom,false,,,?,,,\nCharacter Omission,sparkdevelopr.com,true,,,?,,com,\nCharacter Omission,sparkdevelper.com,true,2190,,?,,com,\nCharacter Omission,sparkdeveoper.com,true,,,?,,com,\nCharacter Omission,sparkdevloper.com,true,2230,,?,,com,\nCharacter Omission,sparkdveloper.com,true,,,?,,com,\nCharacter Omission,sparkeveloper.com,true,,,?,,com,\nCharacter Omission,sprkdeveloper.com,true,,,?,,com,\nCharacter Repeat,spaarkdeveloper.com,true,,,?,,com,\nCharacter Repeat,sparkddeveloper.com,true,,,?,,com,\nCharacter Repeat,sparkdeeveloper.com,true,,,?,,com,\nCharacter Repeat,sparkdeveeloper.com,true,,,?,,com,\nCharacter Repeat,sparkdevelloper.com,true,,,?,,com,\nCharacter Repeat,sparkdevelooper.com,true,,,?,,com,\nCharacter Repeat,sparkdevelopeer.com,true,,,?,,com,\nCharacter Repeat,sparkdeveloper..com,false,,,?,,com,\nCharacter Repeat,sparkdeveloper.ccom,false,,,?,,,\nCharacter Repeat,sparkdeveloper.comm,false,,,?,,,\nCharacter Repeat,sparkdeveloper.coom,false,,,?,,,\nCharacter Repeat,sparkdeveloperr.com,true,2120,,?,,com,\nCharacter Repeat,sparkdevelopper.com,true,203,,?,,com,\nCharacter Repeat,sparkdevveloper.com,true,,,?,,com,\nCharacter Repeat,sparkkdeveloper.com,true,,,?,,com,\nCharacter Repeat,sparrkdeveloper.com,true,,,?,,com,\nCharacter Repeat,spparkdeveloper.com,true,,,?,,com,\nCharacter Repeat,ssparkdeveloper.com,true,,,?,,com,\nCharacter Swap,psarkdeveloper.com,true,,,?,,com,\nCharacter Swap,saprkdeveloper.com,true,,,?,,com,\nCharacter Swap,spakrdeveloper.com,true,,,?,,com,\nCharacter Swap,spardkeveloper.com,true,,,?,,com,\nCharacter Swap,sparkdeevloper.com,true,,,?,,com,\nCharacter Swap,sparkdeveloepr.com,true,,,?,,com,\nCharacter Swap,sparkdevelope.rcom,false,,,?,,,\nCharacter Swap,sparkdeveloper.cmo,false,,,?,,,\nCharacter Swap,sparkdeveloper.ocm,false,,,?,,,\nCharacter Swap,sparkdeveloperc.om,false,,,?,,,\nCharacter Swap,sparkdevelopre.com,true,,,?,,com,\nCharacter Swap,sparkdevelpoer.com,true,,,?,,com,\nCharacter Swap,sparkdeveolper.com,true,,,?,,com,\nCharacter Swap,sparkdevleoper.com,true,,,?,,com,\nCharacter Swap,sparkdveeloper.com,true,,,?,,com,\nCharacter Swap,sparkedveloper.com,true,,,?,,com,\nCharacter Swap,sprakdeveloper.com,true,18,,?,,com,\nCharacter Replacement,aparkdeveloper.com,true,129,,?,,com,\nCharacter Replacement,dparkdeveloper.com,true,,,?,,com,\nCharacter Replacement,soarkdeveloper.com,true,,,?,,com,\nCharacter Replacement,spaekdeveloper.com,true,,,?,,com,\nCharacter Replacement,sparjdeveloper.com,true,,,?,,com,\nCharacter Replacement,sparkdebeloper.com,true,,,?,,com,\nCharacter Replacement,sparkdeceloper.com,true,,,?,,com,\nCharacter Replacement,sparkdevekoper.com,true,,,?,,com,\nCharacter Replacement,sparkdeveliper.com,true,,,?,,com,\nCharacter Replacement,sparkdevelooer.com,true,92,,?,,com,\nCharacter Replacement,sparkdevelopee.com,true,,,?,,com,\nCharacter Replacement,sparkdeveloper.cim,false,,,?,,,\nCharacter Replacement,sparkdeveloper.con,false,,,?,,,\nCharacter Replacement,sparkdeveloper.cpm,false,,,?,,,\nCharacter Replacement,sparkdeveloper.vom,false,,,?,,,\nCharacter Replacement,sparkdeveloper.xom,false,,,?,,,\nCharacter Replacement,sparkdevelopet.com,true,,,?,,com,\nCharacter Replacement,sparkdeveloprr.com,true,,,?,,com,\nCharacter Replacement,sparkdevelopwr.com,true,,,?,,com,\nCharacter Replacement,sparkdevelpper.com,true,,,?,,com,\nCharacter Replacement,sparkdevrloper.com,true,,,?,,com,\nCharacter Replacement,sparkdevwloper.com,true,,,?,,com,\nCharacter Replacement,sparkdrveloper.com,true,,,?,,com,\nCharacter Replacement,sparkdwveloper.com,true,,,?,,com,\nCharacter Replacement,sparkfeveloper.com",
"filename" : "4963644600105857",
"execution.command.args" : "sparkdeveloper.com",
"execution.status" : "0",
"spf" : "Server:\t\t10.42.1.20\nAddress:\t10.42.1.20#53\n\nNon-authoritative answer:\nsparkdeveloper.com\ttext = \"v=spf1 ip4:38.113.1.0/24 ip4:38.113.20.0/24 ip4:65.254.224.0/19 ?all\"\n\nAuthoritative answers can be found from:\n\n",
"execution.error" : "",
"uuid" : "f13ca0f5-bac7-4da7-b5c3-8b1c145591bf",
"url" : "sparkdeveloper.com",
"enrich.dns.record0.group0" : "\"v=spf1 ip4:00.000.0.0/24 ip4:00.000.00.0/24 ip4:11.111.111.0/19 ?all\""
}
You can grab lots of different command line and REST results for augmenting existing data, tools and feeds.
An URL Crazy report is useful for intelligence on what other domains people may be squatting on that are close to yours. Often these can be used by spammers, malware and for other nefarious purposes.
... View more
Labels:
11-20-2016
02:29 AM
resolved my issue by removing snappy as suggested by JSS
... View more
01-10-2017
04:50 PM
https://help.sumologic.com/Send_Data/Sources/02Sources_for_Hosted_Collectors/HTTP_Source
... View more
01-05-2017
03:07 PM
Hi I was able to resolve the issue,the disk utilization in local directory (where logs and out files are created) in one of the node was more than the yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage setting.
I freed up some space and also set the max-disk-utilization-percentage to much higher value. Thanks Aparna
... View more
11-21-2016
05:55 AM
Yes I did. I had to change every eth0 in vora manager UI as well. Now, Vora is running fine.
... View more