Member since
04-27-2016
60
Posts
20
Kudos Received
0
Solutions
06-15-2016
08:42 AM
I already found a solution. Here is the code: A = LOAD '/user/cloudera/Analytics/source/2013-11-01.txt' using PigStorage(' ','-tagFile'); STORE A INTO '/user/cloudera/Analytics/source/teste/2013-11-01.txt' USING PigStorage(' '); 😉
... View more
06-15-2016
06:26 AM
Hi experts, I'm trying to add a new column to my file. I want to add the filename to each row of my file. Filename is: 2016-06-15.txt The schema my file is: A B C 7 8 13 I want to obtain: Date A B C 2016-06-15 7 8 13 For that I'm using Pig with following scipt: A = LOAD 'user/cloudera/Analytics/source/file.txt' using PigStorage(' ','-tagPath'); DUMP A ; STORE A INTO 'user/cloudera/Analytics/source/file.txt' USING PigStorage(' '); But I'm getting an error and I don't have any log available 😞 Anyone can help? Many thanks!
... View more
- Tags:
- Pig
06-13-2016
01:13 PM
1 Kudo
Hi,
I've multiple files (in hDFS) with the same schema and I will aggregate all of them into Hive at only one table. Each files represents a date but I only have this info on file title.
Which is the best way to insert the file title (the date) as a new column on this files. Java? NiFi?
Thanks!
... View more
Labels:
06-09-2016
05:10 PM
Hi experts, In HDFS I've 80 text files... In the end I want analyze some relationships between each file. But my point here is:
Before I use Spark to do some data transformation, I want to load this files into Hive to do some pre-analysis and data understanding. In yuour opinion (I only experience with BI projects, not Big Data):
Should I insert all this 80 text files in 80 different tables?
Should I aggregate this 80 text files into only one Big Table?
Like I said after this step in Hive I will use Spark to do some data cleansing.
Thanks!
... View more
Labels:
06-06-2016
12:48 PM
Hi Paul, thnaks for your attention. My goal is do some Social Analysis (find patterns, etc.) that's why I want SAS too.
The subject is the relationships between a company. I've the emails, telephones, etc.
What I've:
5 months data collection (Aug, Set, Oct, Nov and Dec)
Each text file correspond to a day Each type of communication have an specific ID (imagine, email I've ID 1, Phone ID 2, etc.) Each line corresponds to an aggregation of multiple communications (separated by the department and every 30 minutes) The attributes are:
Communication ID
Time Department Email Code Phone Code Phone Duration One possible line of the text file would be:
1 10:30:87 3 12 1 10:30:22 1 10:45:21 3 12 2 10:30:22 2 12 2 10:30:22 1 12 10:30:22
So as you can see, I can have multiple Communication ID by line (that's one of my doubts to create the Hive tables).
The size of the text files are 6GB.
Many thanks for your help Paul 🙂 Hope you can understand the problem. Thanks!
... View more
06-05-2016
04:21 PM
I'm planning analyse some data using Hadoop. I've 200 text files to analyze.
I'm thinking:
Using Spark to load data into HDFS (are PIG or Sqoop better?) Create the structure in Hive, creating the tables (basically this first data model will have 200 tables, each table will be a text file) Load data into Hive (all the files) Do some data cleansing with Spark (I will need to put Spark reading from the Hive) and try to reduce the amount of data Create the new data model in Hive (now with a smaller amount of data after the cleansing in previous step) Use a Analytical Tool (like SAS, Tableau, etc.) to do some analytical operations (in this tool I will put the all the data returned in previous step) I believe that this will not be the best way to analyze big data . My goal is in the end of the process in Hadoop have a smaller data set in order to successfully integrate in SAS , for example.
What is your opinion ?
Many thanks!
... View more
Labels:
05-28-2016
11:28 PM
Yes when I think about Hadoop I'm saying to storage the data into HDFS. I don't know what type of advantage that can I take with Spark. Data cleansing?
... View more
05-25-2016
07:12 PM
There exists some use case that shows how Hadoop and Spark work together? I already read the theory but I want to see something pratical to have a better understand.
Thanks!!!
... View more
Labels:
05-20-2016
12:41 PM
I've download the cloudera-quickstart-vm-5.7.0-0-virtualbox Virtual Machine to do my Big Data Project. In my PC i've two Zips files (2GB each on) that contains my source data (there are a lot of txt files). I need to upload this files to HDFS in Virtual Machine, however I getting some troubles When I try to copy/drag the txt files to Virtual Machine. I was thinking loading the files directly on HDFS (no use Sqoop, for example), my question is: Is there a way that can I load the Source Data from my Local PC to HDFS? Java? Sqoop? There a lot of Txt Files... Thanks!
... View more
05-18-2016
05:04 PM
I thinking in this process (high-level). With 3 steps: - Get all the files from HDFS and stored into Hive (each file are a table) - Get all the tables from Hive with Spark and do some data transformations (aggregations and cleanup jobs) - Put the aggregated tables into Hive to provide access to my user:
... View more
05-18-2016
01:55 PM
I thinking in this process (high-level). With 3 steps:
- Get all the files from HDFS and stored into Hive (each file are a table)
- Get all the tables from Hive with Spark and do some data transformations (aggregations and cleanup jobs)
- Put the aggregated tables into Hive to provide access to my user:
... View more
05-18-2016
01:50 PM
Many thanks Lester 🙂 I was thinking to use Spark to get the Data from Hive and do some agregativos. Because I think that I need to return only one table to analyze and get some insights.
... View more
05-18-2016
12:56 PM
1 Kudo
Hi,
I don't know very well how we can model the data in Big Data Projects. I've a thousands of files in HDFS.
The business is the storage the records of visits in a hospital and each file contains all the records of a specific day.
I wanna export this files to Hive (because I know the structure of my data) but I've some doubts in this phase:
- Each File is a table in Hive? Probably not because I will have thousands of files;
- Join all tables into one (put the Date as PK)? In this I can query all the date more easily
- Build a Star-Schema in Hive? Hum... is this not just advisable on a Big Data Project?
What's your opinion?
Thanks!
... View more
Labels:
05-17-2016
09:04 AM
Hi Lester, many thanks for your attention 🙂 I was thinking use Sqoop to get the correct format of my data but I think it will be better in terms of simplicity and speed put the files directly on HDFS.
When I talk about segmentation, I was thiking in clusters analysis, basically divide the date into more smaller data sets. However, I think I can do that in Hive.
Many thanks!!!
... View more
05-16-2016
09:05 PM
Hello experts, I've two simple questions: In your opinion which is the best way to load data to HDFS (My source data are txt files)? Pig, Sqoop, directly in HDFS, etc.
Second question is: Is a good option use Spark to do some data transformation, segmentation?
Thanks!
... View more
05-11-2016
12:26 PM
Sean, just many thanks for your response. This machine have pySpark ???
... View more
05-11-2016
12:09 PM
Hi, There exists some free Vitual Machine to use Apache Hadoop and Spark? I need to do some taks with HDFS and Hive and next some analysis with Spark. Thanks!
... View more
04-28-2016
08:52 AM
Hi Abdelkrim, thanks for your response.
In this case I don't have a big nkowlodge about the source data, so what I'm thinking is:
-> Put Data in HDFS
-> Know the Data with Hive and Impala (simple querys and create some new tables for segmentation)
-> Apply some analysis with Spark to identify patterns between data
In your opinion, this is a good plan? :)
Thanks!
... View more
04-27-2016
08:30 PM
Hi Kirk, thank you for your brilliant response. So, the data cleansing strategy occurs with Hive and Impala, and only then we use Spark for analyze.
Thanks! 🙂
... View more
04-27-2016
05:28 PM
1 Kudo
Hi experts, I was used to the usual data warehousing process:
Source Date - ETL
Now I'm using Hadoop and I'm a bit confusing...
I have inserted the data in HDFS but now would like to understand better the data and apply some segmentations ( by profile, for example). I like to use Flume , Spark, Impala and Hive but I am not able to combine well the function of each or when I should apply each them.
Does anyone have any idea what are the usual processde Big Data before applying any kind of analytics ?
Many thanks!!!
... View more
Labels:
04-07-2016
11:10 AM
Anyone knows which is a nice reporting/platform tool to use integrated with Hadoop?
I need to build a customer journey based on data that I've in Hadoop and I don't know a good tool to do that (I try Qlikview but it don't work very weel with Big Data and Maps)
... View more
04-07-2016
09:22 AM
Hello,
I have a Oracle DW which is connect to my core to record the data. I'm planning introduce the ecosystem Hadoop in my next release. What's the best option/architecture (If anyone knows some article that talk about this, It would be great to share):
Put Hadoop connect directly to Core and record all the operational data. Basically have 2 steps (Operational Data -> Hadoop)
Put Hadoop connect to my Data Warehouse and have 3 stepS (Operational Data -> DW -> Hadoop).
I repeat, It will be great if anyone could share some articles related to this 🙂
Thanks! 🙂
... View more
Labels:
02-24-2016
03:31 PM
Is in a CSV... I've two different columns: Examples: Column A: 20160224 Column B: 15:00:34 My plan is put the column A into Date and B into Time. What I've is the Column into Date (what you see) and the column B into TimeStamp
... View more
02-24-2016
03:01 PM
1 Kudo
Is in a CSV... I've two different columns: Examples: Column A: 20160224
Column B: 15:00:34
My plan is put the column A into Date and B into Time. What I've is the Column into Date (what you see) and the column B into TimeStamp
... View more
02-24-2016
02:28 PM
1 Kudo
I've this to transform into date:
to_date(CAST(CONCAT(SUBSTR(DATA,5,4), '-', SUBSTR(DATA,3,2), '-', SUBSTR(DATA,1,2), ' ', '00:00:00') AS TIMESTAMP)) as Date
How can I put into time?
Sorry my confusing... 😞
... View more
02-24-2016
02:17 PM
1 Kudo
But without serde I can't convert to date or to time 😞
... View more
02-24-2016
12:28 PM
1 Kudo
Thanks Neeraj :)
But It's possible to convert to Date and Time my column values? What I've searched indicates to use SerDe but I don't Know how to apply Date and Times to my table...
... View more
02-24-2016
12:16 PM
2 Kudos
I have a table created in Hive with the following struture: Table 1:
Field_A String,
Field_B String,
Field_C String
Using SerDe how can I get the following schema: Table 1:
Field_A Int,
Field_B Date,
Field_C TimeStamp
I'm getting confused about this question, because I don't know If I need to create a SerDe Class in Java to achieve this... Thanks!
... View more
Labels:
02-23-2016
07:21 PM
2 Kudos
- Hello, I create a table in Hive and I set all the fields in String. However I wanna to transform to other data type using SerDe... But I don't know how to do it... I need to create a class in java to create the SerDe properties (I'm not a programmer)? Can I use some Serde Properties that already exists? Where? In the scipt that creates the Table? Sorry but I'm very confusing about this...
... View more
Labels:
- « Previous
-
- 1
- 2
- Next »