Member since
06-09-2016
34
Posts
2
Kudos Received
0
Solutions
07-20-2016
08:21 AM
Here's my source code 🙂 Many thanks Lester! src-data.txt
... View more
07-20-2016
07:47 AM
src-data.txt Here's my source code 🙂 Many thanks Lester!
... View more
07-19-2016
09:28 PM
I've the following code: Source = LOAD '.../MyTextFiles' using PigStorage(' ');
Data = FOREACH Source GENERATE (chararray)$1 AS Detail_ID, (chararray)$2 AS Code_Time, (chararray)$3 AS BreakLine;
Transform = FOREACH Data GENERATE $1, ToUnixTime($2,'dd/MM/yyyyHH:mm:ss','GMT'), $3;
SPLIT Transform INTO Src31 IF ToDate($2,'yyyy-MM-dd')==ToDate('2013-12-31', 'yyyy-MM-dd'),
Src01 IF ToDate($2,'yyyy-MM-dd')==ToDate('2014-01-01', 'yyyy-MM-dd');
STORE Src31 INTO '.../31_06_2016' using PigStorage(' ');
STORE Src01 INTO '.../01_07_2016' using PigStorage(' '); If run the code without the STORE Statement it fives me successfully but If I try to introduce the new data into a new tables it gives me error... anyone knows why? Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
07-01-2016
04:26 PM
Yes, I already split the files using Pig. What I want is join all this direcotires into one table in Hve for data analysis.
... View more
07-01-2016
03:45 PM
But If I want one table that aggregate all the files, I think I will need the date as column to query the table. Right?
... View more
07-01-2016
03:13 PM
I think my column partition is the date. But I need to include this information in my table to. However When I create the external table with Date column I can't use it in partition clause...
... View more
07-01-2016
03:06 PM
Hi experts, I've multiple files distributed by different directories (according to date) into my HDFS. All this files follows the same schema and only the first column (which represents the date) allows us to differentiate each text file.
I wanna to merge all this directories into one table in Hive. I don't know which column I can put as partitioned column. My text files have this schema:
- Date
- ID
- Investment
- Country
- City
The column date of each file column is what allows us to differentiate each file. Each directory aggregate multiple files from one date and have the day as name.
Which is the PARTITIONED column in this case?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
06-29-2016
01:33 PM
Hi experts,
I'm a beginner using Hadoop and was reading a book that talks about Storage Format. My source data are some text files and my question is: I need/can transform my files into Sequence File, Avro, Parquet or Optimized Row Columnar? There I take some advantage using it instead of text files?
Many thanks!!
... View more
Labels:
- Labels:
-
Apache Hadoop
06-09-2016
02:12 PM
What I read is a good choice use the same tool for all the steps inside Hadoop. If NiFi gives me that advantages I will study more about it 🙂 Basically, in your opinion I should use:
NiFi to load data into HDFS Spark to do some data transformation (or maybe load data into Hive)
Thanks! 🙂
... View more
06-09-2016
02:04 PM
Hello, I want to load some .csv files to HDFS. I already decide that I want to do, in next step, some data transformation with Spark. My question is: I've some advantage to use PIG instead Spark for load data into HDFS? Many thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Pig
-
Apache Spark
- « Previous
- Next »