About vmurakami

vmurakami · ‎06-24-2018

Hey @Vinicius Leal! So we're namesake hum! Cool name hehe 🙂 Guess you're from Brazil? Regarding your issue, you can try to use the LATERAL VIEW for Array typo on Hive. So I took the liberty to make a test, here's: 1 ) Create the table in Hive CREATE EXTERNAL TABLE `tb_pdde_teste`( url STRING ,id STRING ,nome STRING ,nome_estendido STRING ,descricao STRING ,inicio STRING ,final STRING ,formatacao STRING ,data_atualizacao STRING ,aditividade STRING ,url_origem STRING ,tempo_aditividade STRING ,portal_dados_abertos STRING ,disponibilizacao struct<disponibilizacao:STRING, dias:STRING> ,estado struct<estado:STRING> ,fonte_gestora struct<fonte_gestora_url:STRING,fonte_gestora_id:STRING,fonte_gestora_nome:STRING,fonte_gestora_descricao:STRING ,fonte_gestora_tipo:STRING,orgao_primeiro_escalao:struct<fonte_gestora_orgao_nome:STRING,fonte_gestora_orgao_descricao:STRING>> ,fonte_provedora struct<fonte_provedora_url:STRING,fonte_provedora_id:STRING,fonte_provedora_nome:STRING,fonte_provedora_descricao:STRING,fonte_provedora_tipo:STRING, orgao_primeiro_escalao:struct<fonte_provedora_orgao_nome:STRING,fonte_provedora_orgao_descricao:STRING>> ,grupo_informacao struct<grupo_informacao_url:STRING, grupo_informacao_id:STRING, grupo_informacao_nome:STRING, grupo_informacao_palavras_chave:array<STRING>> ,base_territorial struct<base_territorial:STRING> ,periodicidade struct<periodicidade:STRING> ,multiplicador struct<multiplicador_id:STRING,multiplicador_nome:STRING> ,produto struct<produto_nome:STRING> ,publicacao struct<status_publicacao:STRING> ,unidade_medida struct<unidade_medida_url:STRING,unidade_medida_id:STRING,unidade_medida_nome:STRING,unidade_medida:STRING> ,orgao_primeiro_escalao struct<orgao_primeiro_escalao_nome:STRING,orgao_primeiro_escalao_descricao:STRING> ,valores array<struct<valor:STRING, municipio_ibge:STRING,ano:STRING>> ) COMMENT 'Dados do Programa Dinheiro Direto na Escola' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/hive/dados_pig/pdde_teste'; 2) Get the JSON and feed HDFS [hive@c1123-node3 ~]$ curl -X GET http://api.pgi.gov.br/api/1/serie/1678.json > 1678.json [root@c1123-node3 ~]# su - hdfs [hdfs@c1123-node3 ~]$ hdfs dfs -mkdir /pig_scripts/ [hdfs@c1123-node3 ~]$ hdfs dfs -chown -R pig:hadoop /pig_scripts/ [hdfs@c1123-node3 ~]$ hdfs dfs -chmod 777 /pig_scripts [hdfs@c1123-node3 ~]$ exit [root@c1123-node3 ~]# su - hive [hive@c1123-node3 ~]$ hdfs dfs -put 1678.json /pig_scripts 3) Run the Pig job using Hcatalog to throw the values onto the hive table [hive@c1123-node3 ~]$ pig -useHCatalog grunt> A = LOAD '/pig_scripts/1678.json' Using JsonLoader('url: chararray, id :chararray,nome :chararray,nome_estendido:chararray,descricao:chararray,inicio:chararray,final:chararray, formatacao:chararray, data_atualizacao:chararray, aditividade:chararray,url_origem: chararray,tempo_aditividade:chararray,portal_dados_abertos: chararray, disponibilizacao:tuple(disponibilizacao:chararray,dias:chararray),estado:tuple(estado:chararray),fonte_gestora:tuple(fonte_gestora_url:chararray,fonte_gestora_id:chararray,fonte_gestora_nome:chararray,fonte_gestora_descricao: chararray, fonte_gestora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_gestora_orgao_nome:chararray,fonte_gestora_orgao_descricao:chararray)), fonte_provedora:tuple(fonte_provedora_url:chararray,fonte_provedora_id:chararray,fonte_provedora_nome:chararray,fonte_provedora_descricao: chararray, fonte_provedora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_provedora_orgao_nome:chararray,fonte_provedora_orgao_descricao:chararray)), grupo_informacao:tuple(grupo_informacao_url:chararray,grupo_informacao_id:chararray,grupo_informacao_nome:chararray,grupo_informacao_palavras_chave:{(chararray)}), base_territorial:tuple(base_territorial:chararray),periodicidade:tuple(periodicidade: chararray),multiplicador:tuple(multiplicador_id:chararray,multiplicador_nome:chararray ), produto:tuple(produto_nome:chararray),publicacao:tuple(status_publicacao:chararray), unidade_medida:tuple(unidade_medida_url:chararray,unidade_medida_id:chararray,unidade_medida_nome:chararray,unidade_medida:chararray),orgao_primeiro_escalao:tuple(orgao_primeiro_escalao_nome:chararray,orgao_primeiro_escalao_descricao:chararray),valores:{tuple(valor: chararray,municipio_ibge : chararray,ano : chararray)}'); STORE A INTO 'tb_pdde_teste' USING org.apache.hive.hcatalog.pig.HCatStorer(); 4) Break the values from valores attrib on the query hive> select x from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1; OK {"valor":"40200","municipio_ibge":"120001","ano":"2003"} Time taken: 0.162 seconds, Fetched: 1 row(s) hive> select x.valor, x.municipio_ibge from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1; OK 40200 120001 Time taken: 0.11 seconds, Fetched: 1 row(s) PS: I made some changes compared to your code, like: - Added to valores attrib a tuple after the {} array declaration. - Added the HCatStorer to save the result from Pig directly onto Hive - Matched all fields from the JSON file and created the full DDL on Hive - Used the concept of LATERAL VIEW coz we're using a single position in the Array typo with a lot of Struct values inside of the data. Hope this helps!

abhinav_phutela · ‎06-22-2018

What should be the permission of the hdfs-site.xml and core-site.xml files which I'm supposed to copy in /home/nifi folder? And I'm able to telnet to datanode and namenode both (50075 and 50070)

vscherbakov · ‎06-20-2018

Files on FTP server. I got them with ListFTP and FetchFTP. Then I use RouteText(I think) to filter them by name. And after it I need to parsing data with divide into parts(with SplitContent, I'll try). And unload data on sql server with PutDatabaseRecord.

vmurakami · ‎06-19-2018

Awesome @Kishalay Biswas! As the issue is resolved, hence it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue.

vmurakami · ‎06-20-2018

Hi @Suresh Dendukuri! Glad to hear that was helpful 🙂 So, backing to your new problem, i'd kindly ask to you, to open a new question in HCC (cause separating different issues helps other HCC user to search for a specific problem) 🙂 But, just lemme get better into your problem, the query listed isn't working? If so, does Nifi showing error? Or just the result isn't the expected? Thanks

linehrr · ‎04-11-2019

this looks like you have data skew issue, meaning your group by key is skewed, resulting in unbalanced data between partitions. you can inspect your key distribution, if skewness is real, you need to change key or add salt into the groupby so data can be evenly distributed.

karthik_chandra · ‎06-26-2018

Thank you very much @Vinicius Higa Murakami

vmurakami · ‎06-13-2018

Hi @Marc Vázquez! Could you check which command does Confluent stack runs for zk? [root@node2 ~]# ps -ef | grep -i zookeeper 1001 3802 1 0 Jun12 ? 00:00:56 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.log.file=zookeeper-zookeeper-server-node2.log -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp #Thousands of libs... -Xmx1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/hdf/current/zookeeper-server/conf/zoo.cfg ps: did you set this cluster using the confluent.sh? If so, i made a research at their code and it should exists an directory for zk logs 😞 https://github.com/confluentinc/confluent-cli/blob/master/src/oss/confluent.sh#L414 Or if you prefer, you can try to set it manually, here's my example of log4j.properties for ZK. [root@node2 ~]# cat /etc/zookeeper/conf/log4j.properties # # # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. # # # # # ZooKeeper Logging Configuration # # DEFAULT: console appender only log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE # Example with rolling log file #log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE # Example with rolling log file and tracing #log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE # # Log INFO level and above messages to the console # log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender log4j.appender.CONSOLE.Threshold=INFO log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add ROLLINGFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender log4j.appender.ROLLINGFILE.Threshold=DEBUG log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log # Max log file size of 10MB log4j.appender.ROLLINGFILE.MaxFileSize=10MB # uncomment the next line to limit number of backup files #log4j.appender.ROLLINGFILE.MaxBackupIndex=10 log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n # # Add TRACEFILE to rootLogger to get log file output # Log DEBUG level and above messages to a log file log4j.appender.TRACEFILE=org.apache.log4j.FileAppender log4j.appender.TRACEFILE.Threshold=TRACE log4j.appender.TRACEFILE.File=zookeeper_trace.log log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout ### Notice we are including log4j's NDC here (%x) log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n And don't forget to fill the zookeeper-env.sh [root@node2 ~]# cat /etc/zookeeper/conf/zookeeper-env.sh export JAVA_HOME=/usr/jdk64/jdk1.8.0_112 export ZOOKEEPER_HOME=/usr/hdf/current/zookeeper-server export ZOO_LOG_DIR=/var/log/zookeeper export ZOOPIDFILE=/var/run/zookeeper/zookeeper_server.pid export SERVER_JVMFLAGS=-Xmx1024m export JAVA=$JAVA_HOME/bin/java export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/* Hope this helps!

pateljay · ‎06-14-2018

@Felix Albani I tried using putty also. But it also doesn't work for me. Regards, Jay.

elemesilva · ‎06-12-2018

Hello @Rahul Kumar. Could you check your /etc/hosts? I saw you are using "localhost" as your broker/bootstrap. Try to change it to your "hostname" instead of "localhost". Some softwares doesn´t use "loopback" as a listener port by default. Besides that, if you are using the native Kafka from HortonWorks, the default port for brokers is 6667 or 6668 (when using Kerberos) and not 9092. I hope this works.

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: Load Json Nested Data in Pig

Re: NiFi: Hadoop Configuration Error

Re: Divide file in nifi

Re: Instance type m3.2xlarge missing in AWS while ...

Re: Please help me to get count of particular fie...

Re: spark job shuffle write super slow

Re: Non-DFS storage occupied in Hadoop mount in Li...

Re: confluent 4.1.1 zookeeper logs

Re: How to Upload/Download file from Windows to Ho...

Re: kafka consumer not showing the consumed messag...