Member since
05-07-2018
331
Posts
45
Kudos Received
35
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 9636 | 09-12-2018 10:09 PM | |
| 3761 | 09-10-2018 02:07 PM | |
| 11554 | 09-08-2018 05:47 AM | |
| 4098 | 09-08-2018 12:05 AM | |
| 4942 | 08-15-2018 10:44 PM |
06-27-2018
11:46 PM
Hey @Mathi Murugan! Hm, could you share with us more details? Is your environment secure (kerberos, ranger, knox, hdfs acl)? If it isn't, then try to add permission to your user through chmod (to have write/read permissions) or take the ownership through chown, would be smtg like this: hdfs dfs -chmod - R 777 /apps/hive/warehouse/salaries --TABLE SALARIES hdfs dfs -chown -R hive:hadoop /apps/hive/warehouse/salaries --Changing the ownership from root to hive Just remember that you'll need to run these commands as hdfs user or have the same permission as hdfs. Hope this helps!
... View more
06-27-2018
11:33 PM
Hi @Prathamesh H! Instead of using show tblproperties, try to use: hive> desc formatted salaries;
OK
# col_name data_type comment
gender string
age int
salary double
zip int
# Detailed Table Information
Database: default
Owner: root
CreateTime: Wed May 16 15:23:57 UTC 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
EXTERNAL TRUE
numFiles 1
numRows 50
rawDataSize 732
totalSize 781
transient_lastDdlTime 1529819960
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim ,
serialization.format ,
Time taken: 0.609 seconds, Fetched: 36 row(s)
hive> exit;
[hive@node3 ~]$ hdfs dfs -ls hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries
Found 1 items
-rwxrwxrwx 3 root root 781 2018-05-16 04:53 hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries/salaries.txt PS: and it's always a good idea to run msck repair table <table> --if your table is external analyze table <table> compute statistics --specially if your table has a lot of inserts More details on: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE Hope this helps!
... View more
06-27-2018
11:23 PM
1 Kudo
Hey @Souveek Ray! What about this? select date_add(add_months(current_timestamp,-1),-9); Hope this helps!
... View more
06-27-2018
06:09 AM
Hi @Sami Ahmad! Looking at your sqoop command, it seems that your issue comes from the different types used in --split-by (number column) and --boundary-query (timestamp column). If you take a look at the documentation (at 7.2.2. Selecting the Data to Import). https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html By default sqoop will use query select min(<split-by>), max(<split-by>) from <table name> to find out boundaries for creating splits. In some cases this query is not the most optimal so you can specify any arbitrary query returning two numeric columns using --boundary-query argument. Guess if you change your split-by column to the same column as boundary-query it should work (or vice-versa) 🙂 Hope this helps!
... View more
06-26-2018
05:47 PM
Hi @Raj ji! Not sure If I get you right, but if you need to call an sh from a remote machine, you can use remote process group and at that RPG, use ExecuteStreamCommand, here's an example. PS: to enable RPG, you'll need to configure site2site configs on Nifi, here's a documentation. https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.1.1/bk_user-guide/content/configure-site-to-site-client-nifi-instance.html Hope this helps!
... View more
06-26-2018
03:13 PM
You don't really need to turn off the vectorization, just need to clean any date outside of the range specified for vectorization 🙂
... View more
06-26-2018
03:12 PM
Hey @Jalender! Actually, it should work fine indeed. Basically, you will face issues only when your datetime value is out of the range specified on that link, 1677-09-20 and 2262-04-11. So if you have any datetime more or less of this range, you'll certainly hit that fail msg. Hope this helps!
... View more
06-26-2018
05:21 AM
Good to know @VISHAL SINGH! Please allow me to ask you, if this issue got solved, it will be also great if you can mark this HCC thread as Answered by clicking on the "Accept" Button. That way other HCC users can quickly find the solution when they encounter the same issue. 🙂
... View more
06-26-2018
04:50 AM
1 Kudo
Hi @Jalender! Looking at your logs, it seems that Hive with Vectorization cannot process a date value. Guess you're hitting this -> https://issues.apache.org/jira/browse/HIVE-9862 A clearly and further explanation below (on Limitations part): https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution Could you confirm if there is any date out of the bounds? Hope this helps!
... View more
06-24-2018
08:44 AM
1 Kudo
Hey @Vinicius Leal! So we're namesake hum! Cool name hehe 🙂 Guess you're from Brazil? Regarding your issue, you can try to use the LATERAL VIEW for Array typo on Hive. So I took the liberty to make a test, here's: 1 ) Create the table in Hive CREATE EXTERNAL TABLE `tb_pdde_teste`(
url STRING
,id STRING
,nome STRING
,nome_estendido STRING
,descricao STRING
,inicio STRING
,final STRING
,formatacao STRING
,data_atualizacao STRING
,aditividade STRING
,url_origem STRING
,tempo_aditividade STRING
,portal_dados_abertos STRING
,disponibilizacao struct<disponibilizacao:STRING, dias:STRING>
,estado struct<estado:STRING>
,fonte_gestora struct<fonte_gestora_url:STRING,fonte_gestora_id:STRING,fonte_gestora_nome:STRING,fonte_gestora_descricao:STRING
,fonte_gestora_tipo:STRING,orgao_primeiro_escalao:struct<fonte_gestora_orgao_nome:STRING,fonte_gestora_orgao_descricao:STRING>>
,fonte_provedora struct<fonte_provedora_url:STRING,fonte_provedora_id:STRING,fonte_provedora_nome:STRING,fonte_provedora_descricao:STRING,fonte_provedora_tipo:STRING, orgao_primeiro_escalao:struct<fonte_provedora_orgao_nome:STRING,fonte_provedora_orgao_descricao:STRING>>
,grupo_informacao struct<grupo_informacao_url:STRING, grupo_informacao_id:STRING, grupo_informacao_nome:STRING, grupo_informacao_palavras_chave:array<STRING>>
,base_territorial struct<base_territorial:STRING>
,periodicidade struct<periodicidade:STRING>
,multiplicador struct<multiplicador_id:STRING,multiplicador_nome:STRING>
,produto struct<produto_nome:STRING>
,publicacao struct<status_publicacao:STRING>
,unidade_medida struct<unidade_medida_url:STRING,unidade_medida_id:STRING,unidade_medida_nome:STRING,unidade_medida:STRING>
,orgao_primeiro_escalao struct<orgao_primeiro_escalao_nome:STRING,orgao_primeiro_escalao_descricao:STRING>
,valores array<struct<valor:STRING, municipio_ibge:STRING,ano:STRING>>
)
COMMENT 'Dados do Programa Dinheiro Direto na Escola'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION '/user/hive/dados_pig/pdde_teste'; 2) Get the JSON and feed HDFS [hive@c1123-node3 ~]$ curl -X GET http://api.pgi.gov.br/api/1/serie/1678.json > 1678.json
[root@c1123-node3 ~]# su - hdfs
[hdfs@c1123-node3 ~]$ hdfs dfs -mkdir /pig_scripts/
[hdfs@c1123-node3 ~]$ hdfs dfs -chown -R pig:hadoop /pig_scripts/
[hdfs@c1123-node3 ~]$ hdfs dfs -chmod 777 /pig_scripts
[hdfs@c1123-node3 ~]$ exit
[root@c1123-node3 ~]# su - hive
[hive@c1123-node3 ~]$ hdfs dfs -put 1678.json /pig_scripts
3) Run the Pig job using Hcatalog to throw the values onto the hive table [hive@c1123-node3 ~]$ pig -useHCatalog grunt> A = LOAD '/pig_scripts/1678.json' Using JsonLoader('url: chararray, id :chararray,nome :chararray,nome_estendido:chararray,descricao:chararray,inicio:chararray,final:chararray, formatacao:chararray, data_atualizacao:chararray, aditividade:chararray,url_origem: chararray,tempo_aditividade:chararray,portal_dados_abertos: chararray, disponibilizacao:tuple(disponibilizacao:chararray,dias:chararray),estado:tuple(estado:chararray),fonte_gestora:tuple(fonte_gestora_url:chararray,fonte_gestora_id:chararray,fonte_gestora_nome:chararray,fonte_gestora_descricao: chararray, fonte_gestora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_gestora_orgao_nome:chararray,fonte_gestora_orgao_descricao:chararray)), fonte_provedora:tuple(fonte_provedora_url:chararray,fonte_provedora_id:chararray,fonte_provedora_nome:chararray,fonte_provedora_descricao: chararray, fonte_provedora_tipo:chararray,orgao_primeiro_escalao:tuple(fonte_provedora_orgao_nome:chararray,fonte_provedora_orgao_descricao:chararray)), grupo_informacao:tuple(grupo_informacao_url:chararray,grupo_informacao_id:chararray,grupo_informacao_nome:chararray,grupo_informacao_palavras_chave:{(chararray)}), base_territorial:tuple(base_territorial:chararray),periodicidade:tuple(periodicidade: chararray),multiplicador:tuple(multiplicador_id:chararray,multiplicador_nome:chararray ), produto:tuple(produto_nome:chararray),publicacao:tuple(status_publicacao:chararray), unidade_medida:tuple(unidade_medida_url:chararray,unidade_medida_id:chararray,unidade_medida_nome:chararray,unidade_medida:chararray),orgao_primeiro_escalao:tuple(orgao_primeiro_escalao_nome:chararray,orgao_primeiro_escalao_descricao:chararray),valores:{tuple(valor: chararray,municipio_ibge : chararray,ano : chararray)}');
STORE A INTO 'tb_pdde_teste' USING org.apache.hive.hcatalog.pig.HCatStorer();
4) Break the values from valores attrib on the query hive> select x from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1;
OK
{"valor":"40200","municipio_ibge":"120001","ano":"2003"}
Time taken: 0.162 seconds, Fetched: 1 row(s)
hive> select x.valor, x.municipio_ibge from tb_pdde_teste lateral view explode(valores) tb_pdde_teste as x limit 1;
OK
40200 120001
Time taken: 0.11 seconds, Fetched: 1 row(s) PS: I made some changes compared to your code, like: - Added to valores attrib a tuple after the {} array declaration. - Added the HCatStorer to save the result from Pig directly onto Hive - Matched all fields from the JSON file and created the full DDL on Hive - Used the concept of LATERAL VIEW coz we're using a single position in the Array typo with a lot of Struct values inside of the data. Hope this helps!
... View more