About colorsoflife

newbieone · ‎07-17-2022

Hi @colorsoflife Consider using CTE is possible, with the sequence as below for your references. This is Hive script incorporate into Ozzie workflow. set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode=nonstrict; with test_CTE as (select * from table1), testone_CTE as (select col1, col2, col3 from test_CTE) insert into table mytablename partition(biz_dt) select col1 as name1, col2 as name2, col3 as name3 from testOne_CTE

rlevas · ‎10-14-2017

@Mamta Chawla You cannot use the MIT Kerberos kadmin to create accounts in an Active Directory. That tool is only for use with the MIT KDC. To create accounts in an Active Directory, you will need to use Active Directory-specific tools. However, you can use a similar process that Ambari uses to create accounts in an Active Directory and then manually build the keytab files. This requires the use of the OpenLDAP ldapadd and ldapmodify tools as well as the ktutil command. See https://community.hortonworks.com/articles/82544/how-to-create-ad-principal-accounts-using-openldap.html. On top of this, you will need to make sure your krb5.conf file is correct in order for you to test it out. The krb5.conf file is not needed to create the AD account if you are using LDAP to do the work.

bkosaraju · ‎10-03-2017

Hi @Mamta Chawla, To get the table schema, we can easily achieved by respective command line utility ( like Bteq,sqlplus or mysql client etc..), sqoop can be handy to pull the larger volumes of data across the RDBMS to HDFS. However, if you want to store the schema information in Hadoop as files using the sqoop, this can be achieved by querying the database metadata tables. for instance : select * from all_tab_columns where owner = '<OWENR_NAME>' and table_name = 'YOUR TABLE NAME HERE'; --for Oracle select * from DBC.columns where databasename ='<Database_Name>' and tablename ='<Table_name>'; --for Teradata SELECT * from INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'tbl_name' AND table_schema = 'db_name';-- for Mysql .. etc Hope this helps !!

sandyy006 · ‎07-10-2016

@Mamta Chawla Here is the command to import data from mysql to local FS (documentation) : sqoop import -fs local -jt local --connect jdbc:mysql://<host>/sqoop --username <user> --password <password> --table <table-name>

jyadav · ‎05-26-2016

Hi @Mamta Chawla, Please feel free to accept an answer which helped you, so that this thread can be closed. Thanks

sluangsay · ‎05-12-2016

My 2 cents to complement a bit Marco Gaido's answer. Doing data validation entirely in Hive is not something uncommon. Basically, some companies use Hive as an ETL solution the same way they used to do it before with traditional databases: 1) loading the data in a first table ("staging") without doing any validation. The easiest way for that in your case would be to have a (Text file format) table where all the columns are defined as "string", so that you don't get any NULL values when doing a select 2) create a second table (ideally, ORC format) with the types correctly enforced (int, date, string...) 3) have a HQL select to fill the second table from the first table. Some basic transformations might be done to have the format that adapt to the corresponding types. In case that some rows don't pass the validation, that field would be marked as NULL or another "ERROR" token. In that query, you might have to write some complex structures/conditions using regex, UDFs etc. 4) Have another query to just extract all the valid rows from the second table and insert it into a third table (would be the production table). All the lines with NULL or ERROR would go into a fourth table (reporting table). Note: there might be some ways to merge the steps 3 & 4 in order to reduce the number of queries, doing some "Hive multi inserts".

saneejmc · ‎02-22-2018

For me, the issue was due to some data already present at external table hdfs location. Since that data was not in avro format table was not able to read it properly. I just removed it and the table worked fine.

bwalter1 · ‎04-22-2016

There seems to be a length issue. If I compress your avro schema to one line it works in Hiev view and in beeline DROP TABLE metro; CREATE TABLE metro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ('avro.schema.literal'='{"namespace":"ly.stealth.xmlavro","protocol":"xml","type":"record","name":"MetroBillType","fields":[{"name":"BillDate","type":"string"},{"name":"BillTime","type":"string"},{"name":"Remit_CompanyName","type":"string"},{"name":"Remit_Addr","type":"string"},{"name":"Remit_CityStZip","type":"string"},{"name":"Remit_Phone","type":"string"},{"name":"Remit_Fax","type":"string"},{"name":"Remit_TaxID","type":"string"},{"name":"BillAcct_Break","type":{"type":"record","name":"BillAcct_BreakType","fields":[{"name":"BillAcct","type":"string"},{"name":"Invoice_Number","type":"int"},{"name":"Acct_Break","type":{"type":"record","name":"Acct_BreakType","fields":[{"name":"Acct","type":"string"},{"name":"Items","type":{"type":"record","name":"ItemsType","fields":[{"name":"Item","type":{"type":"array","items":{"type":"record","name":"ItemType","fields":[{"name":"Account","type":"string"},{"name":"Claim_Number","type":"string"},{"name":"Insured_Name","type":"string"},{"name":"Price","type":"float"},{"name":"Control_Number","type":"int"},{"name":"State","type":"string"},{"name":"Report_Type_Code","type":"string"},{"name":"Report_Type_Desc","type":"string"},{"name":"Policy_Number","type":"string"},{"name":"Date_of_Loss","type":"string"},{"name":"Date_Received","type":"string"},{"name":"Date_Closed","type":"string"},{"name":"Days_to_Fill","type":"int"},{"name":"Police_Dept","type":"string"},{"name":"Attention","type":"string"},{"name":"RequestID","type":"int"},{"name":"ForceDup","type":"string"},{"name":"BillAcct","type":"string"},{"name":"BillCode","type":"string"}]}}}]}},{"name":"Acct_Total","type":"float"},{"name":"Acct_Count","type":"int"}]}},{"name":"Bill_Total","type":"float"},{"name":"Bill_Count","type":"int"}]}},{"name":"Previous_Balance","type":"int"}]}'); SELECT * FROM metro; +-----------------+-----------------+--------------------------+-------------------+------------------------+--------------------+------------------+--------------------+-----------------------+-------------------------+--+ | metro.billdate | metro.billtime | metro.remit_companyname | metro.remit_addr | metro.remit_citystzip | metro.remit_phone | metro.remit_fax | metro.remit_taxid | metro.billacct_break | metro.previous_balance | +-----------------+-----------------+--------------------------+-------------------+------------------------+--------------------+------------------+--------------------+-----------------------+-------------------------+--+ +-----------------+-----------------+--------------------------+-------------------+------------------------+--------------------+------------------+--------------------+-----------------------+-------------------------+--+ With the nicely formatted avro schema I receive: "Unexpected end-of-input within/between ARRAY entries" which indicates that there is a length restriction for this parameter. Else try to use the avro.schema.url way.

nsabharwal · ‎02-23-2016

More information https://github.com/pythian/hive-udfs

bluesmix · ‎02-03-2016

Replace <ok to="get_run_date" /> With <ok to="join-fork-actions" /> In general each of the "subflows" should end up with the join node.

Online	Offline
Last Visited	‎03-11-2019 08:05 PM

Member Since	‎12-15-2015 07:02 PM
Last Visited	‎03-11-2019 08:05 PM
Posts	49
Kudos received	20

Cloudera Community

Re: ParseException line 1:162 cannot recognize inp...

Re: keytab creation/KDC is AD

Re: Sqoop get table schema with data types

Re: sqooping to edge node

Re: Executing Shell script in HDFS

Re: Data Type Validation in Hive

Re: Avro - java.io.IOException: java.io.IOExceptio...

Re: Avro for multiple records type

Re: Business Days in Hive

Re: Oozie - Fork, join, subflow - No Fork for Join...