Member since
06-18-2016
52
Posts
14
Kudos Received
0
Solutions
08-08-2016
04:48 PM
1 Kudo
Try to use: CSVExcelStorage instead of regular PigStorage, CSVExcelStorage has option to consider or skip the header row. Eg: https://pig.apache.org/docs/r0.12.0/api/org/apache/pig/piggybank/storage/CSVExcelStorage.html
... View more
07-29-2016
10:47 AM
2 Kudos
I'm assuming you are talking about pig java UDF's here is a good tutorial for the same. https://cwiki.apache.org/confluence/display/PIG/How+to+set+up+Eclipse+environment http://www.tutorialspoint.com/apache_pig/apache_pig_user_defined_functions.htm http://www.hadooptpoint.com/how-to-write-pig-udf-example-in-java/ For Pig Plugins: https://cwiki.apache.org/confluence/display/PIG/PigTools
... View more
07-26-2016
04:03 PM
1 Kudo
You have to use arithmetic operation like X = FOREACH A GENERATE f2, (f2==1?1:COUNT(B)); Or embed pig in wrapper http://pig.apache.org/docs/r0.11.0/cont.html#embed-python
... View more
07-22-2016
05:42 AM
The first thing to look at is the amount of RAM allocated to the VM. If you are using Cloudera Manager you need a minimum of 8gb of RAM. Depending on what you are doing with the VM you may need to go above the minimum.
... View more
07-09-2016
12:31 AM
2 Kudos
Dear Stewart, Here you can read about Spark notebooks: http://www.cloudera.com/documentation/enterprise/latest/topics/spark_ipython.html Best regards, Gabor
... View more
06-23-2016
10:40 AM
@Stewart12586, this thread may be of assistance in your situation. 🙂
... View more
06-21-2016
08:36 AM
VirtualBox has the ability to take snapshots of VMs that you can restore to at a later date.
... View more
06-20-2016
03:40 PM
The QuickStart VM includes a tutorial that will walk you through a use case where you: - ingest some data into HDFS from a relational database using Sqoop, and query it with Impala - ingest some data into HDFS from a batch of log files, ETL it with Hive, and query it with Impala - ingest some data into HDFS from a live stream of logs and index it for searching with Solr - perform link strength analysis on the data using Spark - build a dashboard in Hue - if Hue run the scripts to migrate to Cloudera Enterprise, also audit access to the data and visualize it's lineage That sounds like it will cover most of what you're looking for.
... View more
06-12-2016
09:10 PM
4 Kudos
@Pedro Rodgers If schema type is same on all the 100 text files then better to create a hive external table since you already have those files on HDFS. Example: If you have all the files under "/user/test/dummy/data" directory than run below command to create the external hive table and point it to the hdfs location. CREATE EXTERNAL TABLE user(
userId BIGINT,
type INT,
level TINYINT,
date String
)
COMMENT 'User Infomation'
PARTITIONED BY (date String)
LOCATION '/user/test/dummy/data'; Then, create the folder date=2011-11-11 inside /user/test/dummy/data/ And put the data files of date 2011-11-11 into the folder, Once you done you also need to add the partition in the hive metastore. ALTER TABLE user ADD PARTITION(date='2011-11-11');
... View more
- « Previous
-
- 1
- 2
- Next »