About vmurakami

vmurakami · ‎06-27-2018

Hey @Mathi Murugan! Hm, could you share with us more details? Is your environment secure (kerberos, ranger, knox, hdfs acl)? If it isn't, then try to add permission to your user through chmod (to have write/read permissions) or take the ownership through chown, would be smtg like this: hdfs dfs -chmod - R 777 /apps/hive/warehouse/salaries --TABLE SALARIES hdfs dfs -chown -R hive:hadoop /apps/hive/warehouse/salaries --Changing the ownership from root to hive Just remember that you'll need to run these commands as hdfs user or have the same permission as hdfs. Hope this helps!

vmurakami · ‎06-27-2018

Hi @Prathamesh H! Instead of using show tblproperties, try to use: hive> desc formatted salaries; OK # col_name data_type comment gender string age int salary double zip int # Detailed Table Information Database: default Owner: root CreateTime: Wed May 16 15:23:57 UTC 2018 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries Table Type: EXTERNAL_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} EXTERNAL TRUE numFiles 1 numRows 50 rawDataSize 732 totalSize 781 transient_lastDdlTime 1529819960 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format , Time taken: 0.609 seconds, Fetched: 36 row(s) hive> exit; [hive@node3 ~]$ hdfs dfs -ls hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries Found 1 items -rwxrwxrwx 3 root root 781 2018-05-16 04:53 hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries/salaries.txt PS: and it's always a good idea to run msck repair table <table> --if your table is external analyze table <table> compute statistics --specially if your table has a lot of inserts More details on: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE Hope this helps!

vmurakami · ‎06-27-2018

Hey @Souveek Ray! What about this? select date_add(add_months(current_timestamp,-1),-9); Hope this helps!

vmurakami · ‎06-27-2018

Hi @Sami Ahmad! Looking at your sqoop command, it seems that your issue comes from the different types used in --split-by (number column) and --boundary-query (timestamp column). If you take a look at the documentation (at 7.2.2. Selecting the Data to Import). https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html By default sqoop will use query select min(<split-by>), max(<split-by>) from <table name> to find out boundaries for creating splits. In some cases this query is not the most optimal so you can specify any arbitrary query returning two numeric columns using --boundary-query argument. Guess if you change your split-by column to the same column as boundary-query it should work (or vice-versa) 🙂 Hope this helps!

vmurakami · ‎06-26-2018

Hi @Raj ji! Not sure If I get you right, but if you need to call an sh from a remote machine, you can use remote process group and at that RPG, use ExecuteStreamCommand, here's an example. PS: to enable RPG, you'll need to configure site2site configs on Nifi, here's a documentation. https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_user-guide/content/configure-site-to-site-client-nifi-instance.html Hope this helps!

vmurakami · ‎06-26-2018

You don't really need to turn off the vectorization, just need to clean any date outside of the range specified for vectorization 🙂

vmurakami · ‎06-26-2018

Hey @Jalender! Actually, it should work fine indeed. Basically, you will face issues only when your datetime value is out of the range specified on that link, 1677-09-20 and 2262-04-11. So if you have any datetime more or less of this range, you'll certainly hit that fail msg. Hope this helps!

vmurakami · ‎06-26-2018

Good to know @VISHAL SINGH! Please allow me to ask you, if this issue got solved, it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue. 🙂

vmurakami · ‎06-26-2018

Hi @Jalender! Looking at your logs, it seems that Hive with Vectorization cannot process a date value. Guess you're hitting this -> https://issues.apache.org/jira/browse/HIVE-9862 A clearly and further explanation below (on Limitations part): https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution Could you confirm if there is any date out of the bounds? Hope this helps!

vmurakami · ‎06-24-2018

Hey @Vinicius Leal! So we're namesake hum! Cool name hehe 🙂 Guess you're from Brazil? Regarding your issue, you can try to use the LATERAL VIEW for Array typo on Hive. So I took the liberty to make a test, here's: 1 ) Create the table in Hive CREATE EXTERNAL TABLE `tb_pdde_teste`( url STRING ,id STRING ,nome STRING ,nome_estendido STRING ,descricao STRING ,inicio STRING ,final STRING ,formatacao STRING ,data_atualizacao STRING ,aditividade STRING ,url_origem STRING ,tempo_aditividade STRING ,portal_dados_abertos STRING ,disponibilizacao struct<disponibilizacao:STRING, dias:STRING> ,estado struct<estado:STRING> ,fonte_gestora struct<fonte_gestora_url:STRING,fonte_gestora_id:STRING,fonte_gestora_nome:STRING,fonte_gestora_descricao:STRING ,fonte_gestora_tipo:STRING,orgao_primeiro_escalao:struct<fonte_gestora_orgao_nome:STRING,fonte_gestora_orgao_descricao:STRING>> ,fonte_provedora struct<fonte_provedora_url:STRING,fonte_provedora_id:STRING,fonte_provedora_nome:STRING,fonte_provedora_descricao:STRING,fonte_provedora_tipo:STRING, orgao_primeiro_escalao:struct<fonte_provedora_orgao_nome:STRING,fonte_provedora_orgao_descricao:STRING>> ,grupo_informacao struct<grupo_informacao_url:STRING, grupo_informacao_id:STRING, grupo_informacao_nome:STRING, grupo_informacao_palavras_chave:array<STRING>> ,base_territorial struct<base_territorial:STRING> ,periodicidade struct<periodicidade:STRING> ,multiplicador struct<multiplicador_id:STRING,multiplicador_nome:STRING> ,produto struct<produto_nome:STRING> ,publicacao struct<status_publicacao:STRING> ,unidade_medida struct<unidade_medida_url:STRING,unidade_medida_id:STRING,unidade_medida_nome:STRING,unidade_medida:STRING> ,orgao_primeiro_escalao struct<orgao_primeiro_escalao_nome:STRING,orgao_primeiro_escalao_descricao:STRING> ,valores array<struct<valor:STRING, municipio_ibge:STRING,ano:STRING>> ) COMMENT 'Dados do Programa Dinheiro Direto na Escola' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/user/hive/dados_pig/pdde_teste'; 2) Get the JSON and feed HDFS [hive@c1123-node3 ~]$ curl -X GET http://api.pgi.gov.br/api/1/serie/1678.json > 1678.json [root@c1123-node3 ~]# su - hdfs [hdfs@c1123-node3 ~]$ hdfs dfs -mkdir /pig_scripts/ [hdfs@c1123-node3 ~]$ hdfs dfs -chown -R pig:hadoop /pig_scripts/ [hdfs@c1123-node3 ~]$ hdfs dfs -chmod 777 /pig_scripts [hdfs@c1123-node3 ~]$ exit [root@c1123-node3 ~]# su - hive [hive@c1123-node3 ~]$ hdfs dfs -put 1678.json /pig_scripts 3) Run the Pig job using Hcatalog to throw the values onto the hive table [hive@c1123-node3 ~]$ pig -useHCatalog grunt> A = LOAD '/pig_scripts/1678.json' Using JsonLoader('url: chararray, id :chararray,nome :chararray,nome_estendido:chararray,descricao:chararray,inicio:chararray,final:chararray, formatacao:chararray, data_atualizacao:chararray, aditividade:chararray,url_origem: chararray,tempo_aditividade:chararray,portal_dados_abertos: chararray, disponibilizacao:tuple(disponibilizacao:chararray,dias:chararray),estado:tuple(estado:chararray),fonte_gestora:tuple(fonte_gestora_url:chararray,fonte_gestora_id:chararray,fonte_gestora_nome:chararray,fonte_gestora_descricao: chararray, fonte_gestora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_gestora_orgao_nome:chararray,fonte_gestora_orgao_descricao:chararray)), fonte_provedora:tuple(fonte_provedora_url:chararray,fonte_provedora_id:chararray,fonte_provedora_nome:chararray,fonte_provedora_descricao: chararray, fonte_provedora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_provedora_orgao_nome:chararray,fonte_provedora_orgao_descricao:chararray)), grupo_informacao:tuple(grupo_informacao_url:chararray,grupo_informacao_id:chararray,grupo_informacao_nome:chararray,grupo_informacao_palavras_chave:{(chararray)}), base_territorial:tuple(base_territorial:chararray),periodicidade:tuple(periodicidade: chararray),multiplicador:tuple(multiplicador_id:chararray,multiplicador_nome:chararray ), produto:tuple(produto_nome:chararray),publicacao:tuple(status_publicacao:chararray), unidade_medida:tuple(unidade_medida_url:chararray,unidade_medida_id:chararray,unidade_medida_nome:chararray,unidade_medida:chararray),orgao_primeiro_escalao:tuple(orgao_primeiro_escalao_nome:chararray,orgao_primeiro_escalao_descricao:chararray),valores:{tuple(valor: chararray,municipio_ibge : chararray,ano : chararray)}'); STORE A INTO 'tb_pdde_teste' USING org.apache.hive.hcatalog.pig.HCatStorer(); 4) Break the values from valores attrib on the query hive> select x from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1; OK {"valor":"40200","municipio_ibge":"120001","ano":"2003"} Time taken: 0.162 seconds, Fetched: 1 row(s) hive> select x.valor, x.municipio_ibge from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1; OK 40200 120001 Time taken: 0.11 seconds, Fetched: 1 row(s) PS: I made some changes compared to your code, like: - Added to valores attrib a tuple after the {} array declaration. - Added the HCatStorer to save the result from Pig directly onto Hive - Matched all fields from the JSON file and created the full DDL on Hive - Used the concept of LATERAL VIEW coz we're using a single position in the Array typo with a lot of Struct values inside of the data. Hope this helps!

Online	Offline
Last Visited	‎12-23-2018 04:33 AM

Member Since	‎05-07-2018 06:05 PM
Last Visited	‎12-23-2018 04:33 AM
Posts	331
Kudos received	45

Cloudera Community

Re: Minifi not connecting to Nifi - remote instanc...

Re: getsnmp attribute

Re: XML and Hive parsing error with Serde.

Re: Ranger and HDFS over SSL

Re: livy2 zepplin issue

Re: How to change the ownership of a existing tabl...

Re: Unable to check size of Hive Table

Re: Date Interval in Hive

Re: sqoop import inconsistent datatype issue

Re: Executing Shell script to a remote machine in ...

Re: Hive Insert query to table(have partitions) fa...

Re: Hive Insert query to table(have partitions) fa...

Re: Kafka (0.10.0.1) Mirror Maker is not working, ...

Re: Hive Insert query to table(have partitions) fa...

Re: Load Json Nested Data in Pig