About m2014227

m2014227 · ‎07-20-2016

Here's my source code 🙂 Many thanks Lester! src-data.txt

m2014227 · ‎07-20-2016

src-data.txt Here's my source code 🙂 Many thanks Lester!

m2014227 · ‎07-19-2016

I've the following code: Source = LOAD '.../MyTextFiles' using PigStorage(' '); Data = FOREACH Source GENERATE (chararray)$1 AS Detail_ID, (chararray)$2 AS Code_Time, (chararray)$3 AS BreakLine; Transform = FOREACH Data GENERATE $1, ToUnixTime($2,'dd/MM/yyyyHH:mm:ss','GMT'), $3; SPLIT Transform INTO Src31 IF ToDate($2,'yyyy-MM-dd')==ToDate('2013-12-31', 'yyyy-MM-dd'), Src01 IF ToDate($2,'yyyy-MM-dd')==ToDate('2014-01-01', 'yyyy-MM-dd'); STORE Src31 INTO '.../31_06_2016' using PigStorage(' '); STORE Src01 INTO '.../01_07_2016' using PigStorage(' '); If run the code without the STORE Statement it fives me successfully but If I try to introduce the new data into a new tables it gives me error... anyone knows why? Many thanks!

m2014227 · ‎07-01-2016

Yes, I already split the files using Pig. What I want is join all this direcotires into one table in Hve for data analysis.

m2014227 · ‎07-01-2016

But If I want one table that aggregate all the files, I think I will need the date as column to query the table. Right?

m2014227 · ‎07-01-2016

I think my column partition is the date. But I need to include this information in my table to. However When I create the external table with Date column I can't use it in partition clause...

m2014227 · ‎07-01-2016

Hi experts, I've multiple files distributed by different directories (according to date) into my HDFS. All this files follows the same schema and only the first column (which represents the date) allows us to differentiate each text file. I wanna to merge all this directories into one table in Hive. I don't know which column I can put as partitioned column. My text files have this schema: - Date - ID - Investment - Country - City The column date of each file column is what allows us to differentiate each file. Each directory aggregate multiple files from one date and have the day as name. Which is the PARTITIONED column in this case?

m2014227 · ‎06-29-2016

Hi experts, I'm a beginner using Hadoop and was reading a book that talks about Storage Format. My source data are some text files and my question is: I need/can transform my files into Sequence File, Avro, Parquet or Optimized Row Columnar? There I take some advantage using it instead of text files? Many thanks!!

m2014227 · ‎06-09-2016

What I read is a good choice use the same tool for all the steps inside Hadoop. If NiFi gives me that advantages I will study more about it 🙂 Basically, in your opinion I should use: NiFi to load data into HDFS Spark to do some data transformation (or maybe load data into Hive) Thanks! 🙂

m2014227 · ‎06-09-2016

Hello, I want to load some .csv files to HDFS. I already decide that I want to do, in next step, some data transformation with Spark. My question is: I've some advantage to use PIG instead Spark for load data into HDFS? Many thanks!

Online	Offline
Last Visited	‎10-02-2017 11:23 PM

Member Since	‎06-09-2016 01:58 PM
Last Visited	‎10-02-2017 11:23 PM
Posts	34
Kudos received	2

Cloudera Community

Re: Apache PIG - When insert STORE function it giv...

Re: Apache PIG - When insert STORE function it giv...

Apache PIG - When insert STORE function it gives m...

Re: Merge multiple directories into one table in H...

Re: Merge multiple directories into one table in H...

Re: Merge multiple directories into one table in H...

Merge multiple directories into one table in Hive

Storage format in HDFS

Re: Loading data to HDFS - Pig or Spark?

Loading data to HDFS - Pig or Spark?