Member since
05-23-2016
30
Posts
5
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
195 | 07-29-2016 01:37 AM | |
338 | 07-13-2016 01:59 AM |
10-17-2016
09:06 AM
Thanks for sharing! Sounds like good first steps, wondering if there are any other best practices out there as well.
... View more
10-11-2016
09:07 AM
Hi, I am exploring ingesting data from an organization's existing data warehouse / databases into Hadoop (HDFS & Hive). One of the concerns is ensuring data integrity e.g. number of rows, correctness of data. Are there any best practices or approaches to assure our operational users that the data imported from their source databases into Hadoop is accurate and complete? Thanks!
... View more
Labels:
08-19-2016
12:12 PM
Hi @Tamas Bihari I think by default, the Control Plane must be given access to the cluster when the instances are being created. Otherwise, the Remote Access CIDR IP field is pointless right now. Or there needs to be an option to input multiple CIDR IPs. Thanks,
KC
... View more
08-19-2016
02:10 AM
Hi @Tamas Bihari, thanks for your assistance! It seems that the error was caused by me limiting the Remote Access CIDR IP during the setup to my own IP which may have prevented Cloudbreak on the Control Plane from accessing the instances. This though appears to me to be a design flaw. Please correct me if I am mistaken. Also, the error should be thrown much earlier rather than me having to wait four hours before the job fails. Let me know if this is the right place to highlight these issues or if there is another channel I should post them to. Best regards,
KC
... View more
08-17-2016
10:20 AM
Hi @Ashnee Sharma the whole installation process is supposed to be automated by Hortonworks Cloud so I'm confused about why there would be any firewall or ssh issues. What specifically should I check for and is there a more detailed log regarding the error?
... View more
08-17-2016
01:40 AM
Great catch! Thanks a lot for your assistance, it is working now.
... View more
08-16-2016
11:00 PM
Hi all, I'm trying out the new Hortonworks Data Cloud but am encountering an error "Infrastructure creation failed. Reason: Operation timed out. Could not reach ssh connection in time". Any advice on what is causing this issue? Also, is there a means to check / track the progress of the install? The information on the UI is quite limited and it took nearly 4 hours before the job failed. Thanks!
... View more
08-16-2016
01:32 PM
Hi @Artem Ervits I came across and tried the suggestion in that page but still encountered the same errors after commenting out the kill and shutdown statements... the error appears to be caused during the submission / startup stage?
... View more
08-16-2016
08:03 AM
Hi @jeff, moving forward is the plan to support more combinations and/or flexibility in the combination setup?
... View more
08-16-2016
06:19 AM
I'm running a sample topology from storm-starter in local mode but an encountering an error 14:16:24.038 [Thread-16] ERROR o.a.s.e.EventManagerImp - {} Error when processing event
java.lang.RuntimeException: java.io.IOException: Unable to delete file: C:\Users\<UserId>\AppData\Local\Temp\ef9408c7-6ad6-432f-928c-e01ded7f4c33\supervisor\tmp\b44d290f-ff15-46cf-b39e-ab02cf93a451\stormconf.ser
at org.apache.storm.daemon.supervisor.SyncSupervisorEvent.run(SyncSupervisorEvent.java:173) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.event.EventManagerImp$1.run(EventManagerImp.java:54) [storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
Caused by: java.io.IOException: Unable to delete file: C:\Users\<UserId>\AppData\Local\Temp\ef9408c7-6ad6-432f-928c-e01ded7f4c33\supervisor\tmp\b44d290f-ff15-46cf-b39e-ab02cf93a451\stormconf.ser
at org.apache.storm.shade.org.apache.commons.io.FileUtils.forceDelete(FileUtils.java:2381) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.shade.org.apache.commons.io.FileUtils.cleanDirectory(FileUtils.java:1679) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.shade.org.apache.commons.io.FileUtils.deleteDirectory(FileUtils.java:1575) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.shade.org.apache.commons.io.FileUtils.moveDirectory(FileUtils.java:2916) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.daemon.supervisor.SyncSupervisorEvent.downloadLocalStormCode(SyncSupervisorEvent.java:354) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.daemon.supervisor.SyncSupervisorEvent.downloadStormCode(SyncSupervisorEvent.java:326) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
at org.apache.storm.daemon.supervisor.SyncSupervisorEvent.run(SyncSupervisorEvent.java:122) ~[storm-core-2.0.0-SNAPSHOT.jar:2.0.0-SNAPSHOT]
... 1 more
Would appreciate any assistance to solve this. Thanks!
... View more
Labels:
07-29-2016
01:37 AM
Ok turns out that core-site.xml, hbase-site.xml and hdfs-site.xml are embedded in the metron jars at /usr/metron/0.2.0BETA/lib so needed to update the hostname references in those files to get it working again.
... View more
07-27-2016
11:50 PM
I have Metron deployed on a single node on AWS. Recently had to update the hostname to use AWS private DNS instead of the public DNS (which changes with each reboot). I think I have got most of the services working after the update but I have still have some issues with Storm.
A sample of the storm worker logs are copied below. In particular, the o.a.h.i.Client seems to be still referring to the old public EC2 domain name but I have been unable to figure out where that config is specified. Could someone assist in pointing me to where that particular variable is stored? 2016-07-27 06:41:21.625 s.k.ZkCoordinator [INFO] Task [1/1] Deleted partition managers: []
2016-07-27 06:41:21.625 s.k.ZkCoordinator [INFO] Task [1/1] New partition managers: []
2016-07-27 06:41:21.625 s.k.ZkCoordinator [INFO] Task [1/1] Finished refreshing
2016-07-27 06:41:22.253 b.s.m.n.Server [INFO] Getting metrics for server on port 6704
2016-07-27 06:41:24.037 o.a.h.i.Client [INFO] Retrying connect to server: ec2-54-213-184-142.us-west-2.compute.amazonaws.com/54.213.184.142:8020. Already tried 32 time(s); maxRetries=45
2016-07-27 06:41:44.058 o.a.h.i.Client [INFO] Retrying connect to server: ec2-54-213-184-142.us-west-2.compute.amazonaws.com/54.213.184.142:8020. Already tried 33 time(s); maxRetries=45
2016-07-27 06:42:04.078 o.a.h.i.Client [INFO] Retrying connect to server: ec2-54-213-184-142.us-west-2.compute.amazonaws.com/54.213.184.142:8020. Already tried 34 time(s); maxRetries=45
2016-07-27 06:42:21.626 s.k.ZkCoordinator [INFO] Task [1/1] Refreshing partition manager connections
2016-07-27 06:42:21.627 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=ip-10-0-0-21.us-west-2.compute.internal:6667}}
2016-07-27 06:42:21.627 s.k.KafkaUtils [INFO] Task [1/1] assigned [Partition{host=ip-10-0-0-21.us-west-2.compute.internal:6667, partition=0}]
2016-07-27 06:42:21.628 s.k.ZkCoordinator [INFO] Task [1/1] Deleted partition managers: []
2016-07-27 06:42:21.628 s.k.ZkCoordinator [INFO] Task [1/1] New partition managers: []
2016-07-27 06:42:21.628 s.k.ZkCoordinator [INFO] Task [1/1] Finished refreshing
2016-07-27 06:42:22.254 b.s.m.n.Server [INFO] Getting metrics for server on port 6704
2016-07-27 06:42:24.104 o.a.h.i.Client [INFO] Retrying connect to server: ec2-54-213-184-142.us-west-2.compute.amazonaws.com/54.213.184.142:8020. Already tried 35 time(s); maxRetries=45
2016-07-27 06:42:44.121 o.a.h.i.Client [INFO] Retrying connect to server: ec2-54-213-184-142.us-west-2.compute.amazonaws.com/54.213.184.142:8020. Already tried 36 time(s); maxRetries=45
2016-07-27 06:43:04.139 o.a.h.i.Client [INFO] Retrying connect to server: ec2-54-213-184-142.us-west-2.compute.amazonaws.com/54.213.184.142:8020. Already tried 37 time(s); maxRetries=45
2016-07-27 06:43:21.629 s.k.ZkCoordinator [INFO] Task [1/1] Refreshing partition manager connections
2016-07-27 06:43:21.630 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=ip-10-0-0-21.us-west-2.compute.internal:6667}}
2016-07-27 06:43:21.631 s.k.KafkaUtils [INFO] Task [1/1] assigned [Partition{host=ip-10-0-0-21.us-west-2.compute.internal:6667, partition=0}]
... View more
Labels:
07-13-2016
04:11 AM
3 Kudos
You can also add an additional storm worker in Ambari -> Storm -> Configs -> supervisor.slot.ports by assigning an additional port to the list
... View more
07-13-2016
01:59 AM
1 Kudo
Managed to get this working by modifying: /metron-deployment/amazon-ec2/playbook.yml
- include: tasks/create-hosts.yml host_count=1 host_type=sensors,ambari_master,metron_kafka_topics,metron_hbase_tables,metron,ec2,pcap_server,ambari_slave,web,mysql,search,enrichment
/metron-deployment/amazon-ec2/defaults.yml
instance_type: m4.4xlarge
cluster_type: single_node_vm
/metron-deployment/roles/ambari_config/var/single_node_vm.yml
groups: hosts: "{{groups.ambari_master}}"
... View more
07-12-2016
12:56 AM
Thanks for the link Mukesh. I have already followed that link and set up the 10 node AWS cluster. I was wondering if we could reduce the number of nodes in the cluster which isn't covered in the instruction guide.
... View more
07-11-2016
09:03 AM
Hi, is it possible to deploy Metron on a single node (or maybe 3 node cluster at most) on AWS? We managed to follow the instructions to set up the 10 node AWS cluster but for exploration purposes we don't need such a large setup. Simply changing the number of hosts / roles in the playbook.yml file doesn't seem to work either as it breaks the installation at various points.
... View more
- Tags:
- CyberSecurity
- Metron
Labels:
06-20-2016
06:12 AM
Thanks @Gangadhar Kadam, i understand that this is a possible workaround but I would like to know if I want the two columns separate is that possible or do I need to do the pivot twice and join the tables?
... View more
06-17-2016
08:43 AM
In the Notebook that you are trying to run the %sh command, click on the gear icon which will bring up the list of interpreter bindings. Verify that the %sh interpreter is selected and highlighted in blue.
... View more
06-17-2016
07:04 AM
Thanks! Finally got it working
... View more
06-17-2016
01:40 AM
For example, I have a Spark DataFrame with three columns 'Domain', 'ReturnCode', and 'RequestType' Example Starting Dataframe www.google.com,200,GET
www.google.com,300,GET
www.espn.com,200,POST
I would like to pivot on Domain and get aggregate counts for the various ReturnCodes and RequestTypes. Do I have to pivot each table separately and then join them back together or is it possible to do it in one step? Desired Dataframe Domain,200,300,GET,POST
www.google.com,1,1,2,0
www.espn.com,1,0,0,1
Example of Pivot Code with Join val dfa = df.groupby('Domain').pivot('ReturnCode').count()
val dfb = df.groupby('Domain').pivot('RequestType').count()
dfa.join(dfb, Seq("ReferrerDomain", "ReferrerDomain")).show()
... View more
Labels:
06-16-2016
01:35 AM
thanks @Rajkumar Singh for the suggestion. are you able to provide some instructions or direct me to some resources on how I should shade the jar?
... View more
06-15-2016
07:30 AM
@Rajkumar Singh yeah i removed the jackson-databind jar in the flink folder, the zeppelin-spark-dependencies jar is still there. How do I determine which version / jar is being used by Zeppelin / Spark at run-time?
... View more
06-15-2016
06:25 AM
Yeah have tried that but still encountering the same error
... View more
06-15-2016
05:52 AM
Hi @Rajkumar Singh, I believe the jar in question is jackson-databind. For Spark / Zeppelin which classpath / directories should I be look at?
... View more
06-15-2016
03:32 AM
Hi, I'm experimenting with using Zeppelin / Spark to perform geo-location on IP addresses using the Maxmind GeoIP library. I am encountering a NoSuchMethodError which from reading the forums appears to be a dependency issue with the method not being in certain versions of the jackson lib. How can I go about identifying and resolving this dependency issue in Zeppelin? I load geoip2 via %dep and have removed the older versions of the jackson lib from zeppelin/lib/lib to no avail. Thanks! %dep
z.addRepo("geoip2").url("http://mvnrepository.com/artifact/com.maxmind.geoip2/geoip2/2.7.0")
z.load("com.maxmind.geoip2:geoip2:2.7.0") java.lang.NoSuchMethodError: com.fasterxml.jackson.databind.node.ArrayNode.<init>(Lcom/fasterxml/jackson/databind/node/JsonNodeFactory;Ljava/util/List;)V
... View more
Labels:
06-10-2016
02:49 AM
@Laurence Da Luz Thanks. I checked the inferred schema and verified the issue / error. It inferred a column as 'long' when there were 'floats' for some rows.
... View more
06-09-2016
10:03 AM
@Laurence Da Luz Manually defining the schema work, thanks for the suggestion. Am curious though what went wrong with the inference processor
... View more
06-09-2016
09:52 AM
@Laurence Da Luz Yes I am using the 'InferAvroSchema' Processor. Is it possible to output and view the Schema somewhere? I'll also try to manually define the schema
... View more
06-09-2016
09:49 AM
@Pierre Villard nothing other than 2016-06-09 05:50:43,503 WARN [Timer-Driven Process Thread-7] o.a.n.processors.kite.ConvertCSVToAvro ConvertCSVToAvro[id=2238d74a-0635-401d-b51c-45ca87a4cfb9] Failed to convert 1055/6031 records from CSV to Avro
... View more
06-09-2016
08:14 AM
1 Kudo
I'm using the NiFi ConvertCSVToAvro processor, with an InferAvroSchema processor upstream to obtain the schema. I'm getting an error Failed to Convert 1031/6058 records so most of the records are being converted successfully but I'm unable to figure out why the remainder rows failed. How do I go about debugging / identifying reasons for the failed rows and are there some typical reasons for the failure?
... View more
Labels: