Member since
05-07-2018
331
Posts
45
Kudos Received
35
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7036 | 09-12-2018 10:09 PM | |
2736 | 09-10-2018 02:07 PM | |
9319 | 09-08-2018 05:47 AM | |
3078 | 09-08-2018 12:05 AM | |
4103 | 08-15-2018 10:44 PM |
06-24-2018
08:44 AM
1 Kudo
Hey @Vinicius Leal! So we're namesake hum! Cool name hehe 🙂 Guess you're from Brazil? Regarding your issue, you can try to use the LATERAL VIEW for Array typo on Hive. So I took the liberty to make a test, here's: 1 ) Create the table in Hive CREATE EXTERNAL TABLE `tb_pdde_teste`(
url STRING
,id STRING
,nome STRING
,nome_estendido STRING
,descricao STRING
,inicio STRING
,final STRING
,formatacao STRING
,data_atualizacao STRING
,aditividade STRING
,url_origem STRING
,tempo_aditividade STRING
,portal_dados_abertos STRING
,disponibilizacao struct<disponibilizacao:STRING, dias:STRING>
,estado struct<estado:STRING>
,fonte_gestora struct<fonte_gestora_url:STRING,fonte_gestora_id:STRING,fonte_gestora_nome:STRING,fonte_gestora_descricao:STRING
,fonte_gestora_tipo:STRING,orgao_primeiro_escalao:struct<fonte_gestora_orgao_nome:STRING,fonte_gestora_orgao_descricao:STRING>>
,fonte_provedora struct<fonte_provedora_url:STRING,fonte_provedora_id:STRING,fonte_provedora_nome:STRING,fonte_provedora_descricao:STRING,fonte_provedora_tipo:STRING, orgao_primeiro_escalao:struct<fonte_provedora_orgao_nome:STRING,fonte_provedora_orgao_descricao:STRING>>
,grupo_informacao struct<grupo_informacao_url:STRING, grupo_informacao_id:STRING, grupo_informacao_nome:STRING, grupo_informacao_palavras_chave:array<STRING>>
,base_territorial struct<base_territorial:STRING>
,periodicidade struct<periodicidade:STRING>
,multiplicador struct<multiplicador_id:STRING,multiplicador_nome:STRING>
,produto struct<produto_nome:STRING>
,publicacao struct<status_publicacao:STRING>
,unidade_medida struct<unidade_medida_url:STRING,unidade_medida_id:STRING,unidade_medida_nome:STRING,unidade_medida:STRING>
,orgao_primeiro_escalao struct<orgao_primeiro_escalao_nome:STRING,orgao_primeiro_escalao_descricao:STRING>
,valores array<struct<valor:STRING, municipio_ibge:STRING,ano:STRING>>
)
COMMENT 'Dados do Programa Dinheiro Direto na Escola'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/dados_pig/pdde_teste'; 2) Get the JSON and feed HDFS [hive@c1123-node3 ~]$ curl -X GET http://api.pgi.gov.br/api/1/serie/1678.json > 1678.json
[root@c1123-node3 ~]# su - hdfs
[hdfs@c1123-node3 ~]$ hdfs dfs -mkdir /pig_scripts/
[hdfs@c1123-node3 ~]$ hdfs dfs -chown -R pig:hadoop /pig_scripts/
[hdfs@c1123-node3 ~]$ hdfs dfs -chmod 777 /pig_scripts
[hdfs@c1123-node3 ~]$ exit
[root@c1123-node3 ~]# su - hive
[hive@c1123-node3 ~]$ hdfs dfs -put 1678.json /pig_scripts
3) Run the Pig job using Hcatalog to throw the values onto the hive table [hive@c1123-node3 ~]$ pig -useHCatalog grunt> A = LOAD '/pig_scripts/1678.json' Using JsonLoader('url: chararray, id :chararray,nome :chararray,nome_estendido:chararray,descricao:chararray,inicio:chararray,final:chararray, formatacao:chararray, data_atualizacao:chararray, aditividade:chararray,url_origem: chararray,tempo_aditividade:chararray,portal_dados_abertos: chararray, disponibilizacao:tuple(disponibilizacao:chararray,dias:chararray),estado:tuple(estado:chararray),fonte_gestora:tuple(fonte_gestora_url:chararray,fonte_gestora_id:chararray,fonte_gestora_nome:chararray,fonte_gestora_descricao: chararray, fonte_gestora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_gestora_orgao_nome:chararray,fonte_gestora_orgao_descricao:chararray)), fonte_provedora:tuple(fonte_provedora_url:chararray,fonte_provedora_id:chararray,fonte_provedora_nome:chararray,fonte_provedora_descricao: chararray, fonte_provedora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_provedora_orgao_nome:chararray,fonte_provedora_orgao_descricao:chararray)), grupo_informacao:tuple(grupo_informacao_url:chararray,grupo_informacao_id:chararray,grupo_informacao_nome:chararray,grupo_informacao_palavras_chave:{(chararray)}), base_territorial:tuple(base_territorial:chararray),periodicidade:tuple(periodicidade: chararray),multiplicador:tuple(multiplicador_id:chararray,multiplicador_nome:chararray ), produto:tuple(produto_nome:chararray),publicacao:tuple(status_publicacao:chararray), unidade_medida:tuple(unidade_medida_url:chararray,unidade_medida_id:chararray,unidade_medida_nome:chararray,unidade_medida:chararray),orgao_primeiro_escalao:tuple(orgao_primeiro_escalao_nome:chararray,orgao_primeiro_escalao_descricao:chararray),valores:{tuple(valor: chararray,municipio_ibge : chararray,ano : chararray)}');
STORE A INTO 'tb_pdde_teste' USING org.apache.hive.hcatalog.pig.HCatStorer();
4) Break the values from valores attrib on the query hive> select x from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1;
OK
{"valor":"40200","municipio_ibge":"120001","ano":"2003"}
Time taken: 0.162 seconds, Fetched: 1 row(s)
hive> select x.valor, x.municipio_ibge from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1;
OK
40200 120001
Time taken: 0.11 seconds, Fetched: 1 row(s) PS: I made some changes compared to your code, like: - Added to valores attrib a tuple after the {} array declaration. - Added the HCatStorer to save the result from Pig directly onto Hive - Matched all fields from the JSON file and created the full DDL on Hive - Used the concept of LATERAL VIEW coz we're using a single position in the Array typo with a lot of Struct values inside of the data. Hope this helps!
... View more
06-22-2018
08:52 AM
What should be the permission of the hdfs-site.xml and core-site.xml files which I'm supposed to copy in /home/nifi folder? And I'm able to telnet to datanode and namenode both (50075 and 50070)
... View more
06-20-2018
12:26 PM
Files on FTP server. I got them with ListFTP and FetchFTP. Then I use RouteText(I think) to filter them by name. And after it I need to parsing data with divide into parts(with SplitContent, I'll try). And unload data on sql server with PutDatabaseRecord.
... View more
06-19-2018
02:24 AM
Awesome @Kishalay Biswas! As the issue is resolved, hence it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue.
... View more
06-20-2018
06:18 PM
Hi @Suresh Dendukuri! Glad to hear that was helpful 🙂 So, backing to your new problem, i'd kindly ask to you, to open a new question in HCC (cause separating different issues helps other HCC user to search for a specific problem) 🙂 But, just lemme get better into your problem, the query listed isn't working? If so, does Nifi showing error? Or just the result isn't the expected? Thanks
... View more
04-11-2019
09:11 PM
this looks like you have data skew issue, meaning your group by key is skewed, resulting in unbalanced data between partitions. you can inspect your key distribution, if skewness is real, you need to change key or add salt into the groupby so data can be evenly distributed.
... View more
06-26-2018
05:39 AM
Thank you very much @Vinicius Higa Murakami
... View more
06-13-2018
03:36 PM
Hi @Marc Vázquez! Could you check which command does Confluent stack runs for zk? [root@node2 ~]# ps -ef | grep -i zookeeper 1001 3802 1 0 Jun12 ? 00:00:56 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.log.file=zookeeper-zookeeper-server-node2.log -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp #Thousands of libs... -Xmx1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/hdf/current/zookeeper-server/conf/zoo.cfg
ps: did you set this cluster using the confluent.sh? If so, i made a research at their code and it should exists an directory for zk logs 😞 https://github.com/confluentinc/confluent-cli/blob/master/src/oss/confluent.sh#L414 Or if you prefer, you can try to set it manually, here's my example of log4j.properties for ZK. [root@node2 ~]# cat /etc/zookeeper/conf/log4j.properties
#
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#
#
#
#
# ZooKeeper Logging Configuration
#
# DEFAULT: console appender only
log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE
# Example with rolling log file
#log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
# Example with rolling log file and tracing
#log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE
#
# Log INFO level and above messages to the console
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=INFO
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
#
# Add ROLLINGFILE to rootLogger to get log file output
# Log DEBUG level and above messages to a log file
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=DEBUG
log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log
# Max log file size of 10MB
log4j.appender.ROLLINGFILE.MaxFileSize=10MB
# uncomment the next line to limit number of backup files
#log4j.appender.ROLLINGFILE.MaxBackupIndex=10
log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
#
# Add TRACEFILE to rootLogger to get log file output
# Log DEBUG level and above messages to a log file
log4j.appender.TRACEFILE=org.apache.log4j.FileAppender
log4j.appender.TRACEFILE.Threshold=TRACE
log4j.appender.TRACEFILE.File=zookeeper_trace.log
log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x)
log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n
And don't forget to fill the zookeeper-env.sh [root@node2 ~]# cat /etc/zookeeper/conf/zookeeper-env.sh
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export ZOOKEEPER_HOME=/usr/hdf/current/zookeeper-server
export ZOO_LOG_DIR=/var/log/zookeeper
export ZOOPIDFILE=/var/run/zookeeper/zookeeper_server.pid
export SERVER_JVMFLAGS=-Xmx1024m
export JAVA=$JAVA_HOME/bin/java
export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/*
Hope this helps!
... View more
06-14-2018
07:56 AM
@Felix Albani I tried using putty also. But it also doesn't work for me. Regards, Jay.
... View more
06-12-2018
05:30 PM
1 Kudo
Hello @Rahul Kumar. Could you check your /etc/hosts? I saw you are using "localhost" as your broker/bootstrap. Try to change it to your "hostname" instead of "localhost". Some softwares doesn´t use "loopback" as a listener port by default. Besides that, if you are using the native Kafka from HortonWorks, the default port for brokers is 6667 or 6668 (when using Kerberos) and not 9092. I hope this works.
... View more
- « Previous
- Next »