Member since
02-10-2016
50
Posts
14
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1835 | 02-08-2017 05:53 AM | |
1769 | 02-02-2017 11:39 AM | |
5801 | 01-27-2017 06:17 PM | |
2107 | 01-27-2017 04:43 PM | |
2731 | 01-27-2017 01:57 PM |
02-09-2017
08:28 PM
No, it is not possible: "A pivot is an aggregation where one (or more in the general case) of the grouping columns has its distinct values transposed into individual columns" Source: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
... View more
02-08-2017
05:53 AM
1 Kudo
Very good question! Let's dig into Hadoop's source to find this out. The audit log uses java.net.InetAddress's toString() method to obtain a text format of the address: https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java#L7049 InetAddress's returns the information in "hostname/ip" format. If the hostname is not resolvable (reverse lookup is not working) then you get a starting slash: http://docs.oracle.com/javase/7/docs/api/java/net/InetAddress.html#toString()
... View more
02-03-2017
01:56 PM
It really depends on your use-case and latency requirements. If you need to store Storm's result into HDFS then you can use a Storm HDFS Bolt. If you only need to store the source data I'd suggest to store from Kafka or Flume. That'll result a lower latency on the Storm topology and better decoupling.
... View more
02-02-2017
12:15 PM
In Storm's nomenclature 'nimbus' is the cluster manager: http://storm.apache.org/releases/1.0.1/Setting-up-a-Storm-cluster.html Spark calls the cluster manager as 'master': http://spark.apache.org/docs/latest/spark-standalone.html
... View more
02-02-2017
11:39 AM
Hello, Both storm & spark supports local mode. In Storm you need to create a LocalCluster instance then you can submit your job onto that. You can find description and example in the links: http://storm.apache.org/releases/1.0.2/Local-mode.html https://github.com/apache/storm/blob/1.0.x-branch/examples/storm-starter/src/jvm/org/apache/storm/starter/WordCountTopology.java#L98 Spark's approach on local mode is somewhat different. The allocation is controlled through the spark-master variable which can be set to local (or local[*], local[N] where N is a number). If local is specified executors will be started on your machine. Both Storm and Spark has monitoring capabilities through a web interface. You can find details about them here: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_storm-component-guide/content/using-storm-ui.html http://spark.apache.org/docs/latest/monitoring.html Yarn is not a requirement but an option for distributed mode, both Spark & Storm is able to function on their own.
... View more
02-01-2017
08:49 AM
Currently your parsing logic is based on a state machine. That approach won't work well with the idea of Spark. In Spark you'd need to load your data to a Dataset/Dataframe (or RDD) and do operations through that datastructure. I don't think that anybody will convert your code to Spark here and learning Spark would be inevitable anyways if you'd need to maintained the ported code. The lowest hanging fruit for you would be to make a try with Pypy interpreter which is more performant than cPython: http://pypy.org/ I've noticed in your code that you are reading in the file content in one go: lines = file.readlines() It would be more efficient to iterate through the file line by line: for line in open("Virtual_Ports.log", "r") I'd also suggest to use a profiler to see where your hotspots are. Hope this helps, Tibor
... View more
01-29-2017
07:32 AM
Hi Sachin, SmartSense is available for the Hortonworks customers who has signed up for our support plan. It provides monitoring for server configurations and provides suggestions if any of the services are misconfigured. If you do not have a Hortonworks Support Contract you can disable it. It seems that your original problem has been resolved. I'd suggest closing this thread by choosing a 'best answer' for any of the answers you think solved your problem. If you have further questions please post new question so that others can easily learn from it (without the need to understand the whole history if this thread). Thanks, Tibor
... View more
01-27-2017
09:08 PM
I believe Zeppelin only supports setting spark.app.name per interpreter at the moment: As a workaround you can make a try to duplicate the default spark interpreter and give unique spark.app.name to each newly created interpreter.
... View more
01-27-2017
08:10 PM
1 Kudo
After pivoting you need to run an aggregate function (e.g. sum) to get back a DataFrame/Dataset. After aggregation you'll be able to show() the data. You can find an excellent overview of pivoting at this website: https://databricks.com/blog/2016/02/09/reshaping-data-with-pivot-in-apache-spark.html
... View more
01-27-2017
07:59 PM
Seems like you have hdp-select package installed from 2.3.4.7 release while you are trying to install 2.3.6.0. Please provide further info to bring this problem into resolution: Are you trying to update from 2.3.4.7 to 2.3.6.0 ? Or have you installed manually 2.3.4.7's hdp-select tool?
... View more