Member since
07-18-2016
94
Posts
94
Kudos Received
20
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2613 | 08-11-2017 06:04 PM | |
2468 | 08-02-2017 11:22 PM | |
10013 | 07-10-2017 03:36 PM | |
18053 | 03-17-2017 01:27 AM | |
14876 | 02-24-2017 05:35 PM |
01-02-2017
08:34 PM
1 Kudo
Hi @rathna mohan, for Phoenix 4.0 and above, you'll want to use this syntax: HADOOP_CLASSPATH=/path/to/hbase-protocol.jar:/path/to/hbase/conf hadoop jar /usr/hdp/2.4.2.0-258/phoenix/phoenix-4.4.0.2.4.2.0-258-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --view UDM_TRANS --input /home/hadoop1/sample.csv
Make sure to change the path to hbase-protocol.jar and the path to HBase conf to match your environment. Give this a try and let me know if it helps. Additional Link for Phoenix Bulk Loading: https://phoenix.apache.org/bulk_dataload.html
... View more
12-19-2016
02:23 PM
@Johnny Fugers In this context, "predicting purchase", could mean a few different things (and ways that we could go about it). For example, if you are interested in predicting whether person 1 will purchase product A, then you can look their purchase history and/or you can look at similar purchases across a segment of customers. In the first scenario, you are basically working with probabilities (i.e. If I buy peanut butter every time I go to the store, then there's a high probability that I'll buy it on my next visit). Your predictive model should also take in to consideration other factors such as time of day, month (seasonality), storesID, etc. If you create a model for every customer, this could get expensive from a compute standpoint, so that is why many organizations segment customers into groups/cohorts based on behavior similarities. Predictive models are then built against these segments. A second approach would be to use market basket analysis. For example, when customer A purchases cereal, how likely are they to purchase milk. This factors in purchases across a segment of customers to look for "baskets" of similar purchases.
... View more
12-16-2016
12:21 AM
1 Kudo
@Johnny Fugers This could go a few different ways depending on the business objective. Is there a reason why you are asking about a supervised approach? If you are open to any ML approach, here's what I'd recommend... Recommendation Engine: The first thing that comes to mind is to use a recommendation engine. In spark 1.6, there's a collaborative filtering algorithm. This algorithm makes a recommendation based on a user's behavior and their similarity to other users. Frequent Patterns: Another interesting option, would be to perform frequent pattern mining. This examples provides a way to identify frequent item sets and association rules. Clustering: There are several additional clustering algorithms in Spark, which you can leverage to identify similar customer cohorts. You could also look to identify the total value of these cohorts based on their overall spending, or you could try to identify who they are based on purchases (i.e. single adults, elderly, parents, etc.). You might also want to check out time series analysis / forecasting if you want to predict spending, growth/seasonal patterns, etc. So most of my recommendations are to use an unsupervised approach, but that is what is a good fit for this type of data and use cases similar to this. ..If you'd prefer to take a supervised approach, you could always use a decision tree, random forest, or similar and try to predict the purchase value based on store, time of day, day of week, customer, and maybe even through in a customer segment variable (which you can derive using the above approach 🙂 ).
... View more
10-27-2016
02:29 PM
@Roberto Sancho You're schema structure is close, but you need to make a few modifications, like this: import org.apache.spark.sql.types._
val data = sc.parallelize("""{"CAMPO1":"xxxx","CAMPO2":"xxx","VARIABLE":{"V1":"xxx"}}""" :: Nil)
val schema = (new StructType)
.add("CAMPO1", StringType)
.add("CAMPO2", StringType)
.add("VARIABLE", (new StructType)
.add("V1", StringType))
sqlContext.read.schema(schema).json(data).select("VARIABLE.V1").show()
Please let me know if this works for you. Thanks!
... View more
10-21-2016
12:57 PM
@jordi
You can use the PostHTTP processor to send json (or another format) to Grafana using their API. Depending on your use case, you may want to add Kafka in there as the intermediate message/routing bus, especially if you are sending that data to other endpoints/services in addition to Grafana. An alternative option would be to send the data into Solr (using the PutSolrContentStream NiFi processor). From Solr, you can use the Banana UI to create realtime dashboard. If this answers your question, can you please click the "Accept" response. 🙂 And if you have additional questions or would like more detail, please let me know.
... View more
10-21-2016
01:50 AM
2 Kudos
@jordi There are a few ways to do this within NiFi, but I would recommend using the ExecuteScript processor. You can use this processor to execute a python script (which is in the link you provided). The response from pubnub can be captured using python (as json) and the routed to the EvaluateJsonPath processor, where the json variables can be parsed and processed as part of your NiFi flow. NiFi can also route the json or parsed values to HDFS, HBase, Hive, etc using the corresponding processors. Let me know if this helps.
... View more
10-20-2016
03:21 PM
Here's example python code that you can run from within NiFi's ExecuteProcess processor: import json
import java.io
from org.apache.commons.io import IOUtils
# Get flowFile Session
flowFile = session.get()
# Open data.json file and parse json values
filename = flowFile.getAttribute('filename')
filepath = flowFile.getAttribute('absolute.path')
data = json.loads(open(filepath + filename, "r").read())
data_value1 = data["record"]["value1"]
# Calculate arbitrary new value within python
new_value = data_value1 * 100
# Add/Put values to flowFile as new attributes
if (flowFile != None):
flowFile = session.putAttribute(flowFile, "from_python_string", "python string example")
flowFile = session.putAttribute(flowFile, "from_python_number", str(new_value))
session.transfer(flowFile, REL_SUCCESS)
session.commit()
... View more
10-20-2016
03:08 PM
2 Kudos
@Aruna dadi I have an example NiFi flow that may be helpful: https://github.com/zaratsian/nifi_python_executescript It shows how to execute a python script within a NiFi flow using the ExecuteProcess processor that @Matt Burgess mentioned. There are ways to process both the flowfile and the attributes as part of your python program, but you need to add a java command to initialize the flowfile object within python. Also as @Matt Burgess mentioned, you can use python to make an HTTP request directly. I prefer to use the "requests" package, such as: import requests r = requests.get("http://yourURL.com") You can also you a POST , PUT, or DELETE, HEAD, OPTIONS...
... View more
10-18-2016
01:29 PM
@Vijay Kanth
I assume you are running scala/spark code? Also, just as an FYI, the link that you shared above is for the latest version of Spark (which is currently 2.0.1). HDP 2.3 is an older version of the Hortonworks platform and it runs Spark 1.4.1. If you want to use the latest version of Spark, you can upgrade to HDP 2.5. But you should not need to upgrade to create a dataframe within Spark. Here's spark/scala code that I used within HDP 2.3 to generate a sample dataframe: val df = sqlContext.createDataFrame(Seq(
("1111", 10000,"M"),
("2222", 20000,"F"),
("3333", 30000,"M"),
("4444", 40000,"F")
)).toDF("id", "income","gender")
df.show() Try this out and let me know if it works.
... View more
10-13-2016
04:23 PM
2 Kudos
@Deepak Subhramanian I got this to work in python 2.6 by following these steps: 1.) Download the graphframes-0.2.0-spark1.6-s_2.10.zip file from here 2.) Download the graphframes-0.2.0-spark1.6-s_2.10.jar file from here 3.) Unzip graphframes-0.2.0-spark1.6-s_2.10.zip 4.) Navigate to the python directory: cd ./graphframes-0.2.0-spark1.6-s_2.10/python
5.) Zip up the contents contained within this directory: zip mypyfiles.zip * -r 6. Launch pyspark: ./bin/pyspark --py-files mypyfiles.zip --jars graphframes-0.2.0-spark1.6-s_2.10.jar
Give that a shot - let me know how it goes.
... View more