About Stewart12586

arunak · ‎08-10-2016

You are welcome @Pedro Rodgers

kkopparapu · ‎08-08-2016

Try to use: CSVExcelStorage instead of regular PigStorage, CSVExcelStorage has option to consider or skip the header row. Eg: https://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html

jyadav · ‎07-29-2016

I'm assuming you are talking about pig java UDF's here is a good tutorial for the same. https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment http://www.tutorialspoint.com/apache_pig/apache_pig_user_defined_functions.htm http://www.hadooptpoint.com/how-to-write-pig-udf-example-in-java/ For Pig Plugins: https://cwiki.apache.org/confluence/display/PIG/PigTools

sunile_manjee · ‎07-26-2016

You have to use arithmetic operation like X = FOREACH A GENERATE f2, (f2==1?1:COUNT(B)); Or embed pig in wrapper http://pig.apache.org/docs/r0.11.0/cont.html#embed-python

cjervis · ‎07-22-2016

The first thing to look at is the amount of RAM allocated to the VM. If you are using Cloudera Manager you need a minimum of 8gb of RAM. Depending on what you are doing with the VM you may need to go above the minimum.

roczei · ‎07-09-2016

Dear Stewart, Here you can read about Spark notebooks: http://www.cloudera.com/documentation/enterprise/latest/topics/spark_ipython.html Best regards, Gabor

cjervis · ‎06-23-2016

@Stewart12586, this thread may be of assistance in your situation. 🙂

Sean · ‎06-21-2016

VirtualBox has the ability to take snapshots of VMs that you can restore to at a later date.

Sean · ‎06-20-2016

The QuickStart VM includes a tutorial that will walk you through a use case where you: - ingest some data into HDFS from a relational database using Sqoop, and query it with Impala - ingest some data into HDFS from a batch of log files, ETL it with Hive, and query it with Impala - ingest some data into HDFS from a live stream of logs and index it for searching with Solr - perform link strength analysis on the data using Spark - build a dashboard in Hue - if Hue run the scripts to migrate to Cloudera Enterprise, also audit access to the data and visualize it's lineage That sounds like it will cover most of what you're looking for.

jyadav · ‎06-12-2016

@Pedro Rodgers If schema type is same on all the 100 text files then better to create a hive external table since you already have those files on HDFS. Example: If you have all the files under "/user/test/dummy/data" directory than run below command to create the external hive table and point it to the hdfs location. CREATE EXTERNAL TABLE user( userId BIGINT, type INT, level TINYINT, date String ) COMMENT 'User Infomation' PARTITIONED BY (date String) LOCATION '/user/test/dummy/data'; Then, create the folder date=2011-11-11 inside /user/test/dummy/data/ And put the data files of date 2011-11-11 into the folder, Once you done you also need to add the partition in the hive metastore. ALTER TABLE user ADD PARTITION(date='2011-11-11');

Online	Offline
Last Visited	‎11-21-2018 01:05 PM

Member Since	‎06-18-2016 04:27 AM
Last Visited	‎11-21-2018 01:05 PM
Posts	52
Kudos received	14

Cloudera Community

Re: Why should we group using Apache PIG

Re: Apache PIG - Create a Schema or the Schema is ...

Re: What JARS are needed to conect PIG and Java

Re: Get Time from a String ''yyyy-MM-dd hh:ss:mm' ...

Re: Cloudera QuickStart VM - Performance Issues - ...

Re: Pig Statement its taking a long time

Re: HDFS Error when creating a new directory

Re: Quckstart VM Cloudera - Hadoop Solution Backup

Re: Real Pratical Tutorial for Hadoop using HDFS, ...

Re: Aggregate multiple text files into one table i...