About pacosoplas

pacosoplas · ‎02-23-2016

Hi: Thanks for the information, finally, i used this: $0#'VARIABLE'#'IMP-NOMINAL-F' Many thanks

pacosoplas · ‎02-23-2016

Hi: I need to reed this json but i cant print or access to the multilevel field just I can read the fisrt level, look: for example i need to read "VARIABLE":{"TIPINC-F") my json {"NUM-PARTICION-F":"001","NOMBRE-REGLA-F":"SAI_TIP_INC_TRN","FECHA-OPRCN-F":"2015-12-06 00:00:01","COD-NRBE-EN-F":"9998","COD-NRBE-EN-FSC-F":"9998","COD-INTERNO-UO-F":"0001","COD-INTERNO-UO-FSC-F":"0001","COD-CSB-OF-F":"0001","COD-CENT-UO-F":"","ID-INTERNO-TERM-TN-F":"A0299989","ID-INTERNO-EMPL-EP-F":"99999989","CANAL":"01","NUM-SEC-F":"764","COD-TX-F":"SAI01COU","COD-TX-DI-F":"TUX","ID-EMPL-AUT-F":"U028765","FECHA-CTBLE-F":"2015-12-07","COD-IDENTIFICACION-F":"","IDENTIFICACION-F":"","VALOR-IMP-F":"0.00","VARIABLE":{"TIPINC-F":"0","PERFIL-CAJ-F":"0","PERFIL-COM-F":"0","PERFIL-TAR-F":"0","RESPONSABLE-F":"0","RESP-EXCEP-F":"0","EXCEPCION-F":"0","STD-CHAR-01-F":"1","STD-DEC-1-F":"0","STD-DEC-2-F":"0"} } my code: A = LOAD '/RSI/staging/input/logs/log.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad'); B = FOREACH A GENERATE (CHARARRAY) $0#'FECHA-OPRCN-F' AS fecha, (CHARARRAY) $0#'COD-NRBE-EN-F' AS entidad, (CHARARRAY) $0#'COD-INTERNO-UO-FSC-F' AS ofi , (CHARARRAY) $0#'COD-TX-F' AS ope; my output (2015-12-06 00:06:40,9998,0001,DVI82OOU,) (2015-12-06 00:06:42,9998,0001,DVI95COU,) (2015-12-06 00:06:49,3191,9204,BDPPM1ZJ,) (2015-12-06 00:06:49,3076,9554,STR03CON,) (2015-12-06 00:06:53,3008,9521,BDPPM1RJ,)

pacosoplas · ‎02-18-2016

I mean the files are here: /user/dangulo/tables_pig/year=2016/month=01 And the table that i create was like this: CREATE EXTERNAL TABLE journey_v4_externa( CODTF string, CODNRBEENF string, FECHAOPRCNF timestamp, FRECUENCIA int) PARTITIONED BY (year string,month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as Avro LOCATION '/user/dangulo/tables_pig' TBLPROPERTIES ("immutable"="false","avro.compress"="zlib"); so, i dont know why there is not data into the table.

pacosoplas · ‎02-18-2016

Yes i know, for that reason after drop and create I dont know why the table is empty, because the files are there. thanks

pacosoplas · ‎02-18-2016

Hi: i create external table: CREATE EXTERNAL TABLE journey_v4_externa( CODTF string, CODNRBEENF string, FECHAOPRCNF timestamp, FRECUENCIA int) PARTITIONED BY (year string,month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as Avro LOCATION '/user/dangulo/tables_pig' TBLPROPERTIES ("immutable"="false","avro.compress"="zlib"); then insert the date, then y drop the table, but the files still in HDFS and the i RE-create the table but the table is empty, what can i do to insert the data again??? thanks.

pacosoplas · ‎02-18-2016

Hi: I want to delete one column to Hive table, my table is like that: CREATE TABLE journey_v4( CODTF string, CODNRBEENF string, FECHAOPRCNF timestamp, FRECUENCIA int) PARTITIONED BY (year string,month string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' stored as Avro TBLPROPERTIES ("immutable"="false","avro.compress"="zlib","immutable"="false"); and then i added new column: ALTER TABLE journey_v4 ADD COLUMNS (EXTRA string); then i want to delete de column EXTRA to go back to the original table, but it doesnt have effect ALTER TABLE journey_v4 REPLACE COLUMNS (CODTF string, CODNRBEENF string, FECHAOPRCNF timestamp,FRECUENCIA in); Anny suggestions???? Thanks

pacosoplas · ‎02-18-2016

Hi: Yes i sued the compress but still log time the reducer tasks, i think its in the merge final tasks, the shuffle and sort i think its fine. Ill try the combiner class from R. Ill inform you. Many thanks

pacosoplas · ‎02-17-2016

Hi, yes i used the compress map output, i forget to comment, but still i didnt use combinner class, ill try and ill tell you. Many many thanks.

pacosoplas · ‎02-17-2016

Hi: I change this parameter and now the job finished after 32 minutes but, Still I dont know why from 96% to 100 % the reducer long time llok: 16/02/17 17:46:32 INFO mapreduce.Job: Running job: job_1455727501370_0001 16/02/17 17:46:39 INFO mapreduce.Job: Job job_1455727501370_0001 running in uber mode : false 16/02/17 17:46:39 INFO mapreduce.Job: map 0% reduce 0% . 16/02/17 17:53:29 INFO mapreduce.Job: map 100% reduce 92% 16/02/17 17:53:31 INFO mapreduce.Job: map 100% reduce 93% 16/02/17 17:53:46 INFO mapreduce.Job: map 100% reduce 96% "and now after 30 minute will finifhed" The parameter i changed are: mapreduce.job.reduce.slowstart.completedmaps=0,8 mapreduce.reduce.shuffle.parallelcopies mapreduce.reduce.shuffle.input.buffer.percent mapreduce.reduce.shuffle.merge.percent FROM RStudio rmr.options(backend.parameters = list( hadoop = list(D = "mapreduce.map.memory.mb=4096", D = "mapreduce.job.reduces=7", D="mapreduce.reduce.memory.mb=5120" Any more parameter that it can help me??? Thanks

pacosoplas · ‎02-17-2016

this -Dmapred.reduce.tasks=x is for mapreduce1 iam using mapreduce2 and yarn and i dont know how to change this parameter. anny suggestion?? Thanks

Online	Offline
Last Visited	‎11-16-2019 11:43 AM

Member Since	‎09-24-2015 09:57 AM
Last Visited	‎11-16-2019 11:43 AM
Posts	527
Kudos received	136

Cloudera Community

Re: hdfs block corrupt

Re: MARIDB & MYSQL & HDP2.5

Re: kafka producer error I/O

Re: spark com.databricks.spark.csv doesnt work

Re: many alert after add new host from ambari

Re: pig and json

pig and json

Re: Replace column to hive

Re: Replace column to hive

Re: Replace column to hive

Replace column to hive

Re: physical memory limits

Re: physical memory limits

Re: physical memory limits

Re: reducer tasks long time