Member since
09-24-2015
527
Posts
136
Kudos Received
19
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2201 | 06-30-2017 03:15 PM | |
3203 | 10-14-2016 10:08 AM | |
8467 | 09-07-2016 06:04 AM | |
10226 | 08-26-2016 11:27 AM | |
1501 | 08-23-2016 02:09 PM |
07-15-2019
03:31 PM
the free query form for lastmodified is not correct, because you can manage the updates rows, and also, you will get duplicate rows, so i dont recomend this solution alternative
... View more
03-19-2016
08:22 PM
3 Kudos
Good that you figured it out. You weren't using special characters in the original question. So yes the parameter needs to be in quotes if the values are not normal letters. You might also have to escape things sometimes. i.e. --hivevar "day=2016/3/01" --hivevar "regex=.*\\|" ( if you want the regex .*\| ) And if you use it in shell scripts you sometimes have to escape more. I once needed 32 \\ in oozie to have 1 backslash in a pig script.
... View more
03-16-2016
10:58 AM
7 Kudos
@Roberto Sancho - I think you can take backup of below directory and keep it in tar.gz etc. if you are using embedded oozie derby DB. /hadoop/oozie/data/oozie-db/ To connect to Derby shell: 1. Download derbytools-<version>.jar ( preferably 10.10.1.1) & put it at /usr/hdp/<version>/oozie/libtools/
2. Stop oozie service
3. cd /usr/hdp/<version>/oozie/libtools/
4. export CLASSPATH=derbytools-10.10.1.1.jar:derby-10.10.1.1.jar
5. java org.apache.derby.tools.ij
... View more
03-24-2016
07:16 PM
1 Kudo
HCatalog does not support writing into a bucketed table. HCat explicitly checks if a table is bucketed, and if so disable storing to it to avoid writing to the table in a destructive way. From HCatOutputFormat: if (sd.getBucketCols() != null && !sd.getBucketCols().isEmpty()) {
throw new HCatException(ErrorType.ERROR_NOT_SUPPORTED, "Store into a partition with bucket definition from Pig/Mapreduce is not supported");
}
... View more
03-26-2016
08:13 AM
finally I insert with bucker like this: CREATE EXTERNAL TABLE IF NOT EXISTS journey_importe_v2(
FECHAOPRCNF date,
codnrbeenf string,
codnrbeenf2 string,
CODTXF string,
FREQ BIGINT,
IMPORTE DECIMAL(9, 2)
)
CLUSTERED BY (codnrbeenf) INTO 25 BUCKETS
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
stored as ORC
LOCATION '/RSI/tables/logs/importe_v2'
TBLPROPERTIES ("immutable"="false","transactional"="true");
create table IF NOT EXISTS temp_journey_importe_v2 (importe STRING);
LOAD DATA INPATH '/RSI/staging/output/journey_importe/${date}' OVERWRITE INTO TABLE temp_journey_importe_v2;
set hive.enforce.bucketing = true;
INSERT INTO TABLE journey_importe_v2
SELECT
regexp_extract(importe, '^(?:([^,]*)\,?){1}', 1) FECHAOPRCNF,
regexp_extract(importe, '^(?:([^,]*)\,?){2}', 1) codnrbeenf,
regexp_extract(importe, '^(?:([^,]*)\,?){3}', 1) codnrbeenf2,
regexp_extract(importe, '^(?:([^,]*)\,?){4}', 1) CODTXF,
regexp_extract(importe, '^(?:([^,]*)\,?){5}', 1) FREQ,
regexp_extract(importe, '^(?:([^,]*)\,?){6}', 1) IMPORTE
from temp_journey_importe_v2;
there is a better way?? how many buckets recomend me use??
... View more
03-04-2016
02:18 AM
@Roberto Sancho great question once again, what you're asking for is commonly known as "stop words". There are different ways of addressing the problem. Instead of writing my own solution, here are some suggestions for you. Write map/reduce with stop words collection, write a UDF in Python, Groovy or Java, whichever is convenient for you, some examples here in Groovy and Python, I've done some work with Apache Crunch, there's a stop words example on the front page and finally here's a couple of suggestions to do it in Pig, last one is the most simple suggestion and I am curious to try it myself. It comes from Donald Miner, a famous champion of Apache Pig.
... View more
03-02-2016
05:17 PM
@Roberto Sancho please see the following document and community article http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_cluster-planning-guide/content/ch_partitioning_chapter.html https://community.hortonworks.com/content/kbentry/16763/cheat-sheet-and-tips-for-a-custom-install-of-horto.html
... View more
03-03-2016
05:10 PM
1 Kudo
@Roberto Sancho You will be able to achieve much better performance by transforming files using simple peocessing in pig/Hive and create ORC Hive tables on transformed data.
... View more