Member since
10-14-2016
12
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2397 | 11-28-2016 01:47 PM |
10-20-2017
06:52 AM
Try spark-submit --master <master-ip>:<spark-port> to submit the job.
... View more
10-20-2017
06:34 AM
Try this code from pyspark import SparkConf, SparkContext
from pyspark.sql import SQLContext
conf1 = SparkConf().setAppName('sort_desc')
sc1 = SparkContext(conf=conf1)
sql_context = SQLContext(sc1)
csv_file_path = 'emp.csv'
employee_rdd = sc1.textFile(csv_file_path).map(lambda line: line.split(','))
print(type(employee_rdd))
employee_rdd_sorted = employee_rdd.sortByKey(ascending= False)
employee_df = employee_rdd.toDF(['dept','ctc'])
employee_df_sorted = employee_rdd_sorted.toDF(['dept','ctc'])
... View more
03-29-2017
06:21 AM
Hi , I have one query which contain many join. Now I want to create a Dataframe or Dataset from the query (not from a single table) in scala
... View more
Labels:
- Labels:
-
Apache Spark
03-16-2017
01:54 PM
Hi All, In mynifi flow i have two processor one is GetFTP and PutS3Object . consider i have one file in FTP a.txt .after the data get into the S3 the a.txt's timestamp is 12:00:00 in S3 after sometime again one file 'b.txt' is put into ftp now the S3 have two files as below but the timestamp in the S3 is changed for both a.txt and b.txt a.txt 12:01:00 b.txt 12:01:00
... View more
Labels:
- Labels:
-
Apache NiFi
11-28-2016
01:47 PM
1 Kudo
Hi Raf Mohammed if you want to do some real-time analysis on twitter do not go with hive or some traditional reporting tools. use flume for pulling data and store data in Elasticsearch and do visualization in Kibana. if you want to do some real-time analytics such as Sentiment Analysis try Flume+Spark Streaming+Elasticsearch+Kibana @Raf Mohammed
... View more