Member since
06-18-2016
52
Posts
14
Kudos Received
0
Solutions
09-06-2016
08:49 AM
Hi, I've this data in a textfile: 1 4 2 5 2 2 1 5 How can I using Spark and programming Scala can identify the rows that have the number repetead in same row? And how can I delete it? In this case I want to remove the third row... Mnay thanks!
... View more
Labels:
- Labels:
-
Apache Spark
09-05-2016
02:09 PM
I'm trying to return this:
val output = vertices.map(_.split(" ")).toArray
... View more
09-05-2016
01:57 PM
I'm trying to save my Array in HDFS. For that I've this:
array.saveAsTextFile("PATH")
but when I submit this I'm getting this error:
error: value saveAsTextFile is not a member of Array[Array[String]]
Anyone knows how to solve this?
Many thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Spark
08-25-2016
12:26 PM
Hi experts,I have a .csv file stored in HDFS and I need to do 3 steps:a) Create a parquet file format b) Load the data from .csv to the Parquet Filec) Store Parquet file in a new HDFS directoryThe first step I had completed using Apache Hive:
create external table parquet_file (ID BIGINT, Date TimeStamp, Size Int)
ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe'
STORED AS
INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat"
LOCATION '.../filedirectory';
How can I complete tasks b) and c)??? Many thanks!
... View more
Labels:
- Labels:
-
Apache Hive
08-22-2016
10:03 AM
Hi,
I need to create some graphs using PySpark to elaborate some link analysis research. I already see this link:
http://kukuruku.co/hub/algorithms/social-network-analysis-spark-graphx
But this algorithm is implemented in Scala which is very more complex to understand.
Anyone have an idea on a white paper or some tutorial that do some link analysis research using PySpark?
Thanks!
... View more
Labels:
- Labels:
-
Apache Spark
08-10-2016
02:13 PM
Hi guys,
I'm very new in using Apache PIG, and I already see a lot of Scripts using Group stament without any operator (Like Sum(X), A Group by A). Why is a good alternative to use group statement?
Thanks!
... View more
Labels:
- Labels:
-
Apache Pig
08-08-2016
04:30 PM
Hi experts,
Probably is a dummy question (but since I have 🙂 ).
I want to know how Pig read the headers from the following dataset that is stored in .csv:
ID,Name,Function
1,Johnny,Student
2,Peter,Engineer
3,Cloud,Teacher
4,Angel,Consultant
I want to have the first row as a Header of my file. There I need to put:
A = LOAD 'file' using PIGStorage(',') as (ID:Int,....etc) ?
Or I only need to put:
A = LOAD 'file' using PIGStorage(',') And only with this pache PIG already know that the first line are the headers of my table. Thanks!
... View more
Labels:
- Labels:
-
Apache Pig
07-29-2016
10:29 AM
Hi experts,
I'm using Apache PIG to make some data transformation, but I need Java Operations to do some complex cleansing activities. I already do the methods in JAVA and already put the necessary code in Pig to register the Java Code. However I don't know that type JARS I need to upload to Eclipse to make the connection between PIG and Eclipse.
There exists any "dummie" tutorial to make this interaction?
Thnaks!
... View more
Labels:
- Labels:
-
Apache Pig
07-26-2016
03:43 PM
Sunile Manjee many thanks! One more question: is possible to create a variable and use to IF statement. Example:
A = Foreach X Generate A1,A2,A3;
--Create a variable
var = Concat(A1,A2);
Split A into B IF (var == "teste");
Is possible to do this?
... View more