About Stewart12586

Stewart12586 · ‎07-26-2016

Hi experts, I've the following field : ToString( ToDate((long) Time_Interval), 'yyyy-MM-dd hh:ss:mm') as Time How can I obtain only the time (hh:ss:mm)? I already try: ToString( ToDate(Time), 'HH:mm:ss.SSS')

Stewart12586 · ‎07-19-2016

Hi, Recently, I've installed the cloudera-quickstart-vm-5.7.0-0-virtualbox. I need to do a project using PIG and Hive. I find errors from the beginning... For example: - I need to restart the Hue Service in Cloudera Manager every time that I begin a session in VM; - My PIG Script (which is very simple) don't exceeds the 0% of progress I put the erros that I see in Cloudera Manager and my VirtualBox settings. I don't know if the bad performance in PIG is related to this. Someone who had this problem? Its urgent! Many thanks!!!

Stewart12586 · ‎07-08-2016

I'll need to install notebook to use Spark and Python (there exists any tutorial to do that?). After that I think I will use your idea 🙂

Stewart12586 · ‎06-23-2016

Hi, When I try to create a new direcoty it gives me the following error: Cannot perform operation. Note: you are a Hue admin but not a HDFS superuser, "hdfs" or part of HDFS supergroup, "supergroup". SafeModeException: Cannot create directory /user/cloudera/Source_Data. Name node is in safe mode. The reported blocks 907 needs additional 2 blocks to reach the threshold 0.9990 of total blocks 909. The number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached. (error 403) What can I do to solve this?

Stewart12586 · ‎06-21-2016

Hi, My Virtual Machine from Cloudera in VirtualBox has already crash two times. As I am in the beginning of the project, it wasn't a very "tragic" problem. However, in future I will have some big solutions with Hadoop (HDFS, Pig, Hive and Spark) so my question is: How to do a backup for this solution and where I can save them to not lose my work? Many thanks! PS: The log when the VM Crash is this: The application had a problem and crashed. Unfornunately, the crash reporter is unable to submit a report for this crash. Detail: The application did not identify itself.

Stewart12586 · ‎06-18-2016

I hive 45 text files with 5 columns and I'm using Pig to add a new column to each file based on it filename. First question: I upload all the files into HDFS manually. Do you think is a better option upload a compress file? Second question: I put my code bellow. In your opinion it is the best way to add a new column to my files? I submit this code and it taking hours processing... All of my files are in Data directory... Data = LOAD '/user/data' using PigStorage(' ','-tagFile') STORE DATA INTO '/user/data/Data_Transformation/SourceFiles' USING PigStorage(' '); Thanks!!!

Stewart12586 · ‎06-18-2016

Hi experts, There exists any complete tutorial for Hadoop in Cloudera Environment that demonstrates how to use HDFS , Pig , Hive and Spark ? I have seen a lot of guides but do not correspond to practical cases and I have had some difficulties to develop a solution ... I am very new to Hadoop ecosystem . I need to deliver a prototype of a Hadoop solution at the end of July and I'm getting frightened with the constant difficulties and doubts that I have felt . I only want to use that components to do some data cleansing and transformation. I already download this virtual machine to use Spark: http://www.cloudera.com/downloads/quickstart_vms/5-7.html Can anyone help me ? Many thanks 🙂

Stewart12586 · ‎06-12-2016

Hi experts, I've 100 text files in HFDS and I want to aggregate all of them into one big table in Hive (Having the Date as Key). How can I load this multiple files to one table created in hive? Thanks!

Stewart12586 · ‎06-09-2016

Hi Benjamin, Yes, I'm talking about datamining clustering. So, in your opinion even If I know the schema is a excelent choice use Spark to achieve that

Stewart12586 · ‎06-09-2016

It makes sense use Spark to divide a structured model (I know the schema of my data) into clusters? My question is because I don't know If will take some advantage in use Python instead of SQL (Hive) to divide the data into clusters.

Online	Offline
Last Visited	‎11-21-2018 01:05 PM

Member Since	‎06-18-2016 04:27 AM
Last Visited	‎11-21-2018 01:05 PM
Posts	52
Kudos received	14

Cloudera Community

Get Time from a String ''yyyy-MM-dd hh:ss:mm' fiel...

Cloudera QuickStart VM - Performance Issues - URGE...

Re: Pig Statement its taking a long time

HDFS Error when creating a new directory

Quckstart VM Cloudera - Hadoop Solution Backup

Pig Statement its taking a long time

Real Pratical Tutorial for Hadoop using HDFS, Hive...

Aggregate multiple text files into one table in Hi...

Re: Spark and Structured Data

Spark and Structured Data