Member since 
    
	
		
		
		05-07-2018
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                331
            
            
                Posts
            
        
                45
            
            
                Kudos Received
            
        
                35
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 9611 | 09-12-2018 10:09 PM | |
| 3738 | 09-10-2018 02:07 PM | |
| 11506 | 09-08-2018 05:47 AM | |
| 4089 | 09-08-2018 12:05 AM | |
| 4931 | 08-15-2018 10:44 PM | 
			
    
	
		
		
		06-24-2018
	
		
		08:44 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hey @Vinicius Leal!  So we're namesake hum! Cool name hehe 🙂 Guess you're from Brazil?   Regarding your issue, you can try to use the LATERAL VIEW for Array typo on Hive.   So I took the liberty to make a test, here's:  1 ) Create the table in Hive  CREATE EXTERNAL TABLE `tb_pdde_teste`( 
url STRING
,id STRING
,nome STRING
,nome_estendido STRING
,descricao STRING
,inicio STRING
,final STRING
,formatacao STRING
,data_atualizacao STRING
,aditividade STRING
,url_origem STRING
,tempo_aditividade STRING
,portal_dados_abertos STRING
,disponibilizacao struct<disponibilizacao:STRING, dias:STRING>
,estado struct<estado:STRING>
,fonte_gestora struct<fonte_gestora_url:STRING,fonte_gestora_id:STRING,fonte_gestora_nome:STRING,fonte_gestora_descricao:STRING
,fonte_gestora_tipo:STRING,orgao_primeiro_escalao:struct<fonte_gestora_orgao_nome:STRING,fonte_gestora_orgao_descricao:STRING>>
,fonte_provedora struct<fonte_provedora_url:STRING,fonte_provedora_id:STRING,fonte_provedora_nome:STRING,fonte_provedora_descricao:STRING,fonte_provedora_tipo:STRING, orgao_primeiro_escalao:struct<fonte_provedora_orgao_nome:STRING,fonte_provedora_orgao_descricao:STRING>>
,grupo_informacao struct<grupo_informacao_url:STRING, grupo_informacao_id:STRING, grupo_informacao_nome:STRING, grupo_informacao_palavras_chave:array<STRING>>
,base_territorial struct<base_territorial:STRING>
,periodicidade struct<periodicidade:STRING>
,multiplicador struct<multiplicador_id:STRING,multiplicador_nome:STRING>
,produto struct<produto_nome:STRING>
,publicacao struct<status_publicacao:STRING>
,unidade_medida struct<unidade_medida_url:STRING,unidade_medida_id:STRING,unidade_medida_nome:STRING,unidade_medida:STRING>
,orgao_primeiro_escalao struct<orgao_primeiro_escalao_nome:STRING,orgao_primeiro_escalao_descricao:STRING>
,valores array<struct<valor:STRING, municipio_ibge:STRING,ano:STRING>>
) 
COMMENT 'Dados do Programa Dinheiro Direto na Escola' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/dados_pig/pdde_teste';  2) Get the JSON and feed HDFS  [hive@c1123-node3 ~]$ curl -X GET http://api.pgi.gov.br/api/1/serie/1678.json > 1678.json
[root@c1123-node3 ~]# su - hdfs
[hdfs@c1123-node3 ~]$ hdfs dfs -mkdir /pig_scripts/
[hdfs@c1123-node3 ~]$ hdfs dfs -chown -R pig:hadoop /pig_scripts/
[hdfs@c1123-node3 ~]$ hdfs dfs -chmod 777 /pig_scripts
[hdfs@c1123-node3 ~]$ exit
[root@c1123-node3 ~]# su - hive
[hive@c1123-node3 ~]$ hdfs dfs -put 1678.json /pig_scripts
  3) Run the Pig job using Hcatalog to throw the values onto the hive table  [hive@c1123-node3 ~]$ pig -useHCatalog   grunt> A = LOAD '/pig_scripts/1678.json' Using JsonLoader('url: chararray, id :chararray,nome :chararray,nome_estendido:chararray,descricao:chararray,inicio:chararray,final:chararray, formatacao:chararray, data_atualizacao:chararray, aditividade:chararray,url_origem: chararray,tempo_aditividade:chararray,portal_dados_abertos: chararray, disponibilizacao:tuple(disponibilizacao:chararray,dias:chararray),estado:tuple(estado:chararray),fonte_gestora:tuple(fonte_gestora_url:chararray,fonte_gestora_id:chararray,fonte_gestora_nome:chararray,fonte_gestora_descricao: chararray, fonte_gestora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_gestora_orgao_nome:chararray,fonte_gestora_orgao_descricao:chararray)), fonte_provedora:tuple(fonte_provedora_url:chararray,fonte_provedora_id:chararray,fonte_provedora_nome:chararray,fonte_provedora_descricao: chararray, fonte_provedora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_provedora_orgao_nome:chararray,fonte_provedora_orgao_descricao:chararray)), grupo_informacao:tuple(grupo_informacao_url:chararray,grupo_informacao_id:chararray,grupo_informacao_nome:chararray,grupo_informacao_palavras_chave:{(chararray)}), base_territorial:tuple(base_territorial:chararray),periodicidade:tuple(periodicidade: chararray),multiplicador:tuple(multiplicador_id:chararray,multiplicador_nome:chararray ), produto:tuple(produto_nome:chararray),publicacao:tuple(status_publicacao:chararray), unidade_medida:tuple(unidade_medida_url:chararray,unidade_medida_id:chararray,unidade_medida_nome:chararray,unidade_medida:chararray),orgao_primeiro_escalao:tuple(orgao_primeiro_escalao_nome:chararray,orgao_primeiro_escalao_descricao:chararray),valores:{tuple(valor: chararray,municipio_ibge : chararray,ano : chararray)}'); 
STORE A INTO 'tb_pdde_teste' USING org.apache.hive.hcatalog.pig.HCatStorer();
  4) Break the values from valores attrib on the query  hive> select x from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1; 
OK
{"valor":"40200","municipio_ibge":"120001","ano":"2003"}
Time taken: 0.162 seconds, Fetched: 1 row(s)
hive> select x.valor, x.municipio_ibge from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1; 
OK
40200	120001
Time taken: 0.11 seconds, Fetched: 1 row(s)    PS: I made some changes compared to your code, like:  - Added to valores attrib a tuple after the {} array declaration.  - Added the HCatStorer to save the result from Pig directly onto Hive  - Matched all fields from the JSON file and created the full DDL on Hive  - Used the concept of LATERAL VIEW coz we're using a single position in the Array typo with a lot of Struct values inside of the data.    Hope this helps! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-22-2018
	
		
		08:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 What should be the permission of the hdfs-site.xml and core-site.xml files which I'm supposed to copy in /home/nifi folder?   And I'm able to telnet to datanode and namenode both (50075 and 50070) 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-20-2018
	
		
		12:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Files on FTP server. I got them with ListFTP and FetchFTP.  Then I use RouteText(I think) to filter them by name.  And after it I need to parsing data with divide into parts(with SplitContent, I'll try).  And unload data on sql server with PutDatabaseRecord. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-19-2018
	
		
		02:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Awesome @Kishalay Biswas!  As the issue is resolved, hence it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-20-2018
	
		
		06:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Suresh Dendukuri!   Glad to hear that was helpful 🙂   So, backing to your new problem, i'd kindly ask to you, to open a new question in HCC (cause separating different issues helps other HCC user to search for a specific problem) 🙂   But, just lemme get better into your problem, the query listed isn't working? If so, does Nifi showing error? Or just the result isn't the expected?   Thanks  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-11-2019
	
		
		09:11 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 this looks like you have data skew issue, meaning your group by key is skewed, resulting in unbalanced data between partitions. you can inspect your key distribution, if skewness is real, you need to change key or add salt into the groupby so data can be evenly distributed.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-26-2018
	
		
		05:39 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thank you very much @Vinicius Higa Murakami 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-13-2018
	
		
		03:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Marc Vázquez!  Could you check which command does Confluent stack runs for zk?   [root@node2 ~]# ps -ef | grep -i zookeeper   1001      3802     1  0 Jun12 ?        00:00:56 /usr/jdk64/jdk1.8.0_112/bin/java -Dzookeeper.log.dir=/var/log/zookeeper -Dzookeeper.log.file=zookeeper-zookeeper-server-node2.log -Dzookeeper.root.logger=INFO,ROLLINGFILE -cp #Thousands of libs...   -Xmx1024m -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.local.only=false org.apache.zookeeper.server.quorum.QuorumPeerMain /usr/hdf/current/zookeeper-server/conf/zoo.cfg
  ps: did you set this cluster using the confluent.sh? If so, i made a research at their code and it should exists an directory for zk logs 😞   https://github.com/confluentinc/confluent-cli/blob/master/src/oss/confluent.sh#L414  Or if you prefer, you can try to set it manually, here's my example of log4j.properties for ZK.  [root@node2 ~]# cat /etc/zookeeper/conf/log4j.properties 
#
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
#
#
#
# ZooKeeper Logging Configuration
#
# DEFAULT: console appender only
log4j.rootLogger=INFO, CONSOLE, ROLLINGFILE
# Example with rolling log file
#log4j.rootLogger=DEBUG, CONSOLE, ROLLINGFILE
# Example with rolling log file and tracing
#log4j.rootLogger=TRACE, CONSOLE, ROLLINGFILE, TRACEFILE
#
# Log INFO level and above messages to the console
#
log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.Threshold=INFO
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
#
# Add ROLLINGFILE to rootLogger to get log file output
#    Log DEBUG level and above messages to a log file
log4j.appender.ROLLINGFILE=org.apache.log4j.RollingFileAppender
log4j.appender.ROLLINGFILE.Threshold=DEBUG
log4j.appender.ROLLINGFILE.File=/var/log/zookeeper/zookeeper.log
# Max log file size of 10MB
log4j.appender.ROLLINGFILE.MaxFileSize=10MB
# uncomment the next line to limit number of backup files
#log4j.appender.ROLLINGFILE.MaxBackupIndex=10
log4j.appender.ROLLINGFILE.layout=org.apache.log4j.PatternLayout
log4j.appender.ROLLINGFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L] - %m%n
#
# Add TRACEFILE to rootLogger to get log file output
#    Log DEBUG level and above messages to a log file
log4j.appender.TRACEFILE=org.apache.log4j.FileAppender
log4j.appender.TRACEFILE.Threshold=TRACE
log4j.appender.TRACEFILE.File=zookeeper_trace.log
log4j.appender.TRACEFILE.layout=org.apache.log4j.PatternLayout
### Notice we are including log4j's NDC here (%x)
log4j.appender.TRACEFILE.layout.ConversionPattern=%d{ISO8601} - %-5p [%t:%C{1}@%L][%x] - %m%n
  And don't forget to fill the zookeeper-env.sh  [root@node2 ~]# cat /etc/zookeeper/conf/zookeeper-env.sh 
export JAVA_HOME=/usr/jdk64/jdk1.8.0_112
export ZOOKEEPER_HOME=/usr/hdf/current/zookeeper-server
export ZOO_LOG_DIR=/var/log/zookeeper
export ZOOPIDFILE=/var/run/zookeeper/zookeeper_server.pid
export SERVER_JVMFLAGS=-Xmx1024m
export JAVA=$JAVA_HOME/bin/java
export CLASSPATH=$CLASSPATH:/usr/share/zookeeper/*
  Hope this helps! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-14-2018
	
		
		07:56 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Felix Albani   I tried using putty also. But it also doesn't work for me.  Regards,  Jay. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-12-2018
	
		
		05:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hello @Rahul Kumar. Could you check your /etc/hosts? I saw you are using "localhost" as your broker/bootstrap. Try to change it to your "hostname" instead of "localhost".   Some softwares doesn´t use "loopback" as a listener port by default.   Besides that, if you are using the native Kafka from HortonWorks, the default port for brokers is 6667 or 6668 (when using Kerberos) and not 9092.   I hope this works. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
- Next »
 
         
					
				













