Member since
06-18-2016
52
Posts
14
Kudos Received
0
Solutions
07-26-2016
01:38 PM
Hi experts, I've the following field : ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as Time How can I obtain only the time (hh:ss:mm)? I already try:
ToString( ToDate(Time), 'HH:mm:ss.SSS')
... View more
Labels:
- Labels:
-
Apache Pig
07-19-2016
07:47 AM
1 Kudo
Hi, Recently, I've installed the cloudera-quickstart-vm-5.7.0-0-virtualbox. I need to do a project using PIG and Hive. I find errors from the beginning... For example: - I need to restart the Hue Service in Cloudera Manager every time that I begin a session in VM; - My PIG Script (which is very simple) don't exceeds the 0% of progress I put the erros that I see in Cloudera Manager and my VirtualBox settings. I don't know if the bad performance in PIG is related to this. Someone who had this problem? Its urgent! Many thanks!!!
... View more
Labels:
07-08-2016
10:14 AM
I'll need to install notebook to use Spark and Python (there exists any tutorial to do that?). After that I think I will use your idea 🙂
... View more
06-23-2016
09:38 AM
Hi, When I try to create a new direcoty it gives me the following error: Cannot perform operation. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup". SafeModeException: Cannot create directory /user/cloudera/Source_Data. Name node is in safe mode. The reported blocks 907 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 909. The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. (error 403) What can I do to solve this?
... View more
Labels:
- Labels:
-
Cloudera Hue
-
HDFS
06-21-2016
02:20 AM
Hi, My Virtual Machine from Cloudera in VirtualBox has already crash two times. As I am in the beginning of the project, it wasn't a very "tragic" problem. However, in future I will have some big solutions with Hadoop (HDFS, Pig, Hive and Spark) so my question is: How to do a backup for this solution and where I can save them to not lose my work? Many thanks! PS: The log when the VM Crash is this: The application had a problem and crashed. Unfornunately, the crash reporter is unable to submit a report for this crash. Detail: The application did not identify itself.
... View more
Labels:
06-18-2016
04:47 AM
I hive 45 text files with 5 columns and I'm using Pig to add a new column to each file based on it filename. First question: I upload all the files into HDFS manually. Do you think is a better option upload a compress file? Second question: I put my code bellow. In your opinion it is the best way to add a new column to my files? I submit this code and it taking hours processing... All of my files are in Data directory... Data = LOAD '/user/data' using PigStorage(' ','-tagFile') STORE DATA INTO '/user/data/Data_Transformation/SourceFiles' USING PigStorage(' '); Thanks!!!
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Pig
-
HDFS
06-18-2016
04:29 AM
Hi experts, There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ? I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem . I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt . I only want to use that components to do some data cleansing and transformation. I already download this virtual machine to use Spark: http://www.cloudera.com/downloads/quickstart_vms/5-7.html Can anyone help me ? Many thanks 🙂
... View more
Labels:
06-12-2016
08:53 PM
Hi experts, I've 100 text files in HFDS and I want to aggregate all of them into one big table in Hive (Having the Date as Key). How can I load this multiple files to one table created in hive?
Thanks!
... View more
Labels:
- Labels:
-
Apache Hive
06-09-2016
05:46 PM
Hi Benjamin,
Yes, I'm talking about datamining clustering. So, in your opinion even If I know the schema is a excelent choice use Spark to achieve that
... View more
06-09-2016
05:24 PM
It makes sense use Spark to divide a structured model (I know the schema of my data) into clusters?
My question is because I don't know If will take some advantage in use Python instead of SQL (Hive) to divide the data into clusters.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
- « Previous
- Next »