Member since
12-15-2015
49
Posts
20
Kudos Received
0
Solutions
07-17-2022
03:58 AM
Hi @colorsoflife Consider using CTE is possible, with the sequence as below for your references. This is Hive script incorporate into Ozzie workflow. set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
with test_CTE as
(select * from table1),
testone_CTE as
(select col1, col2, col3 from test_CTE)
insert into table mytablename partition(biz_dt)
select col1 as name1, col2 as name2, col3 as name3 from testOne_CTE
... View more
10-14-2017
08:19 PM
1 Kudo
@Mamta Chawla You cannot use the MIT Kerberos kadmin to create accounts in an Active Directory. That tool is only for use with the MIT KDC. To create accounts in an Active Directory, you will need to use Active Directory-specific tools. However, you can use a similar process that Ambari uses to create accounts in an Active Directory and then manually build the keytab files. This requires the use of the OpenLDAP ldapadd and ldapmodify tools as well as the ktutil command. See https://community.hortonworks.com/articles/82544/how-to-create-ad-principal-accounts-using-openldap.html. On top of this, you will need to make sure your krb5.conf file is correct in order for you to test it out. The krb5.conf file is not needed to create the AD account if you are using LDAP to do the work.
... View more
10-03-2017
01:41 AM
Hi @Mamta Chawla, To get the table schema, we can easily achieved by respective command line utility ( like Bteq,sqlplus or mysql client etc..), sqoop can be handy to pull the larger volumes of data across the RDBMS to HDFS. However, if you want to store the schema information in Hadoop as files using the sqoop, this can be achieved by querying the database metadata tables. for instance : select * from all_tab_columns where owner = '<OWENR_NAME>' and table_name = 'YOUR TABLE NAME HERE'; --for Oracle
select * from DBC.columns where databasename ='<Database_Name>' and tablename ='<Table_name>'; --for Teradata
SELECT * from INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'tbl_name' AND table_schema = 'db_name';-- for Mysql .. etc Hope this helps !!
... View more
07-10-2016
07:35 AM
@Mamta Chawla Here is the command to import data from mysql to local FS (documentation) : sqoop import -fs local -jt local --connect jdbc:mysql://<host>/sqoop --username <user> --password <password> --table <table-name>
... View more
05-26-2016
03:45 PM
Hi @Mamta Chawla, Please feel free to accept an answer which helped you, so that this thread can be closed. Thanks
... View more
05-12-2016
07:09 AM
My 2 cents to complement a bit Marco Gaido's answer. Doing data validation entirely in Hive is not something uncommon. Basically, some companies use Hive as an ETL solution the same way they used to do it before with traditional databases: 1) loading the data in a first table ("staging") without doing any validation. The easiest way for that in your case would be to have a (Text file format) table where all the columns are defined as "string", so that you don't get any NULL values when doing a select 2) create a second table (ideally, ORC format) with the types correctly enforced (int, date, string...) 3) have a HQL select to fill the second table from the first table. Some basic transformations might be done to have the format that adapt to the corresponding types. In case that some rows don't pass the validation, that field would be marked as NULL or another "ERROR" token. In that query, you might have to write some complex structures/conditions using regex, UDFs etc. 4) Have another query to just extract all the valid rows from the second table and insert it into a third table (would be the production table). All the lines with NULL or ERROR would go into a fourth table (reporting table). Note: there might be some ways to merge the steps 3 & 4 in order to reduce the number of queries, doing some "Hive multi inserts".
... View more
02-22-2018
01:08 PM
For me, the issue was due to some data already present at external table hdfs location. Since that data was not in avro format table was not able to read it properly. I just removed it and the table worked fine.
... View more
04-22-2016
08:26 AM
1 Kudo
There seems to be a length issue. If I compress your avro schema to one line it works in Hiev view and in beeline DROP TABLE metro;
CREATE TABLE metro
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
TBLPROPERTIES ('avro.schema.literal'='{"namespace":"ly.stealth.xmlavro","protocol":"xml","type":"record","name":"MetroBillType","fields":[{"name":"BillDate","type":"string"},{"name":"BillTime","type":"string"},{"name":"Remit_CompanyName","type":"string"},{"name":"Remit_Addr","type":"string"},{"name":"Remit_CityStZip","type":"string"},{"name":"Remit_Phone","type":"string"},{"name":"Remit_Fax","type":"string"},{"name":"Remit_TaxID","type":"string"},{"name":"BillAcct_Break","type":{"type":"record","name":"BillAcct_BreakType","fields":[{"name":"BillAcct","type":"string"},{"name":"Invoice_Number","type":"int"},{"name":"Acct_Break","type":{"type":"record","name":"Acct_BreakType","fields":[{"name":"Acct","type":"string"},{"name":"Items","type":{"type":"record","name":"ItemsType","fields":[{"name":"Item","type":{"type":"array","items":{"type":"record","name":"ItemType","fields":[{"name":"Account","type":"string"},{"name":"Claim_Number","type":"string"},{"name":"Insured_Name","type":"string"},{"name":"Price","type":"float"},{"name":"Control_Number","type":"int"},{"name":"State","type":"string"},{"name":"Report_Type_Code","type":"string"},{"name":"Report_Type_Desc","type":"string"},{"name":"Policy_Number","type":"string"},{"name":"Date_of_Loss","type":"string"},{"name":"Date_Received","type":"string"},{"name":"Date_Closed","type":"string"},{"name":"Days_to_Fill","type":"int"},{"name":"Police_Dept","type":"string"},{"name":"Attention","type":"string"},{"name":"RequestID","type":"int"},{"name":"ForceDup","type":"string"},{"name":"BillAcct","type":"string"},{"name":"BillCode","type":"string"}]}}}]}},{"name":"Acct_Total","type":"float"},{"name":"Acct_Count","type":"int"}]}},{"name":"Bill_Total","type":"float"},{"name":"Bill_Count","type":"int"}]}},{"name":"Previous_Balance","type":"int"}]}');
SELECT * FROM metro;
+-----------------+-----------------+--------------------------+-------------------+------------------------+--------------------+------------------+--------------------+-----------------------+-------------------------+--+
| metro.billdate | metro.billtime | metro.remit_companyname | metro.remit_addr | metro.remit_citystzip | metro.remit_phone | metro.remit_fax | metro.remit_taxid | metro.billacct_break | metro.previous_balance |
+-----------------+-----------------+--------------------------+-------------------+------------------------+--------------------+------------------+--------------------+-----------------------+-------------------------+--+
+-----------------+-----------------+--------------------------+-------------------+------------------------+--------------------+------------------+--------------------+-----------------------+-------------------------+--+
With the nicely formatted avro schema I receive: "Unexpected end-of-input within/between ARRAY entries" which indicates that there is a length restriction for this parameter. Else try to use the avro.schema.url way.
... View more
02-03-2016
06:35 PM
Replace <ok to="get_run_date" /> With <ok to="join-fork-actions" /> In general each of the "subflows" should end up with the join node.
... View more