Member since
06-09-2016
34
Posts
2
Kudos Received
0
Solutions
08-09-2016
01:50 PM
Hi,
I've been working with Hadoop and testing a lot of components of it ecossystem. Now I'm doing a small project that consists in two phases:
a) Data Cleansing
b) KPIs defintion
The step a) I already do in Apache PIG. Now I load the data to Apache Hive. And thus, as in all other projects that I work I only see Apache Hive as data repository.
Basically, I just used the Hive to load the data after data cleansing step and the use it as regular data source, nothing more.
Since 'm very new in Big Data/Hadoop world, I would like to know what kind of jobs/activities are normal to do using Apache Hive.
Sorry for the ignorance :)
Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
08-08-2016
09:29 PM
João Souza, if you find some article can you share here? Many thanks!
... View more
08-08-2016
08:03 AM
Thanks Ravi 🙂 Did you recommend any article that explain some methodologies to apply data modeling in Big Data?
My dimensions are big, having a lot of columns...
... View more
08-07-2016
02:22 PM
Hi experts,
I've four .CSV (three dimensions and one Fact Table) in my HDFS. I already do some data cleansing in Apache PIG and I want to put them into Hive. My question is:
There is a good idea creates the start schema in Hive or is a better idea to create one big table?
I didn't find any good article that explains which is the better way to apply data modeling in Big Data.
Many thanks!
... View more
Labels:
- Labels:
-
Apache Hive
08-02-2016
09:21 AM
Hi experts, I've the following part of script in Apache Pig: .... A = foreach Source_Data generate
(int) ID, ToString( ToDate((long) Time), 'yyyy-MM-dd hh:ss:mm') as date, (int) Code; Store A into '.../newfile';
... Now I want to create a new Script using Python UDF to guarantee that in my newfile on column Date (#1) I only have String in the format 'yyyy-MM-dd hh:ss:mm'. Is possible to do that? Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
07-27-2016
10:13 PM
Brilliant 🙂 Only one more question: How can I add a Case Statement (or a If) to my X var;
... View more
07-27-2016
04:35 PM
Hi experts,
I've a dataset with 4 columns and want to know if the column B only have numbers, if the job detect some non numeric value I want to put that value into null.
Could I do this in PIG or must be Python embed pig?
Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
07-22-2016
09:06 AM
Right Lester 🙂 Thanks!
... View more
07-21-2016
01:18 PM
Hi experts,
I've the following field:
1388481000000 as the number of milliseconds elapsed from the Unix Epoch (1970-01-01 UTC)
How can I convert to Unix TimeStamp? I'm trying to use ToUnixTime(1388481000000,'dd/MM/yyyyHH:mm:ss','GMT') but it gives me error... How can I convert into Unix Timestamp?
Many thanks!
... View more
Labels:
- Labels:
-
Apache Pig
07-21-2016
01:12 PM
Hi Suyog,
I already found the problem. I can't tranform millseconds into Unix TimeStamp... I'm trying to convert this field...
... View more