About pacosoplas

pacosoplas · ‎03-29-2016

Hi: thanks after do this it worked.

pacosoplas · ‎07-15-2019

the free query form for lastmodified is not correct, because you can manage the updates rows, and also, you will get duplicate rows, so i dont recomend this solution alternative

bleonhardi · ‎03-19-2016

Good that you figured it out. You weren't using special characters in the original question. So yes the parameter needs to be in quotes if the values are not normal letters. You might also have to escape things sometimes. i.e. --hivevar "day=2016/3/01" --hivevar "regex=.*\\|" ( if you want the regex .*\| ) And if you use it in shell scripts you sometimes have to escape more. I once needed 32 \\ in oozie to have 1 backslash in a pig script.

KuldeepK · ‎03-16-2016

@Roberto Sancho - I think you can take backup of below directory and keep it in tar.gz etc. if you are using embedded oozie derby DB. /hadoop/oozie/data/oozie-db/ To connect to Derby shell: 1. Download derbytools-<version>.jar ( preferably 10.10.1.1) & put it at /usr/hdp/<version>/oozie/libtools/ 2. Stop oozie service 3. cd /usr/hdp/<version>/oozie/libtools/ 4. export CLASSPATH=derbytools-10.10.1.1.jar:derby-10.10.1.1.jar 5. java org.apache.derby.tools.ij

hkropp · ‎03-24-2016

HCatalog does not support writing into a bucketed table. HCat explicitly checks if a table is bucketed, and if so disable storing to it to avoid writing to the table in a destructive way. From HCatOutputFormat: if (sd.getBucketCols() != null && !sd.getBucketCols().isEmpty()) { throw new HCatException(ErrorType.ERROR_NOT_SUPPORTED, "Store into a partition with bucket definition from Pig/Mapreduce is not supported"); }

pacosoplas · ‎03-26-2016

finally I insert with bucker like this: CREATE EXTERNAL TABLE IF NOT EXISTS journey_importe_v2( FECHAOPRCNF date, codnrbeenf string, codnrbeenf2 string, CODTXF string, FREQ BIGINT, IMPORTE DECIMAL(9, 2) ) CLUSTERED BY (codnrbeenf) INTO 25 BUCKETS ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as ORC LOCATION '/RSI/tables/logs/importe_v2' TBLPROPERTIES ("immutable"="false","transactional"="true"); create table IF NOT EXISTS temp_journey_importe_v2 (importe STRING); LOAD DATA INPATH '/RSI/staging/output/journey_importe/${date}' OVERWRITE INTO TABLE temp_journey_importe_v2; set hive.enforce.bucketing = true; INSERT INTO TABLE journey_importe_v2 SELECT regexp_extract(importe, '^(?:([^,]*)\,?){1}', 1) FECHAOPRCNF, regexp_extract(importe, '^(?:([^,]*)\,?){2}', 1) codnrbeenf, regexp_extract(importe, '^(?:([^,]*)\,?){3}', 1) codnrbeenf2, regexp_extract(importe, '^(?:([^,]*)\,?){4}', 1) CODTXF, regexp_extract(importe, '^(?:([^,]*)\,?){5}', 1) FREQ, regexp_extract(importe, '^(?:([^,]*)\,?){6}', 1) IMPORTE from temp_journey_importe_v2; there is a better way?? how many buckets recomend me use??

aervits · ‎03-04-2016

Please post the H2O logs

aervits · ‎03-04-2016

@Roberto Sancho great question once again, what you're asking for is commonly known as "stop words". There are different ways of addressing the problem. Instead of writing my own solution, here are some suggestions for you. Write map/reduce with stop words collection, write a UDF in Python, Groovy or Java, whichever is convenient for you, some examples here in Groovy and Python, I've done some work with Apache Crunch, there's a stop words example on the front page and finally here's a couple of suggestions to do it in Pig, last one is the most simple suggestion and I am curious to try it myself. It comes from Donald Miner, a famous champion of Apache Pig.

aervits · ‎03-02-2016

@Roberto Sancho please see the following document and community article http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_cluster-planning-guide/content/ch_partitioning_chapter.html https://community.hortonworks.com/content/kbentry/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html

shishir_saxena4 · ‎03-03-2016

@Roberto Sancho You will be able to achieve much better performance by transforming files using simple peocessing in pig/Hive and create ORC Hive tables on transformed data.

Online	Offline
Last Visited	‎11-16-2019 11:43 AM

Member Since	‎09-24-2015 09:57 AM
Last Visited	‎11-16-2019 11:43 AM
Posts	527
Kudos received	136

Cloudera Community

Re: hdfs block corrupt

Re: MARIDB & MYSQL & HDP2.5

Re: kafka producer error I/O

Re: spark com.databricks.spark.csv doesnt work

Re: many alert after add new host from ambari

Re: user and groups in hue

Re: Is it possible to do an incremental import usi...

Re: parameter in beeline script

Re: oozie derby db

Re: insert from pig into hive bucket

Re: delete and update hive

Re: h2o hadoop nodes cluster

Re: pig word dictionary

Re: directory installation

Re: subproduct files into HDFS