About bpreachuk

bpreachuk · ‎01-24-2018

Hi @Ron Lee. Thanks for this writeup. One question - On a kerberized cluster will the keytabs need to be regenerated if the service account name is changed?

bpreachuk · ‎11-04-2017

Hi @Jeff Watson. You are correct about SAS use of String datatypes. Good catch! One of my customers also had to deal with this. String datatype conversions can perform very poorly in SAS. With SAS/ACCESS to Hadoop you can set the libname option DBMAX_TEXT (added with SAS 9.4m1 release) to globally restrict the character length of all columns read into SAS. However for restricting column size SAS does specifically recommends using the VARCHAR datatype in Hive whenever possible. http://support.sas.com/documentation/cdl/en/acreldb/67473/HTML/default/viewer.htm#n1aqglg4ftdj04n1eyvh2l3367ql.htm Use Case Large Table, All Columns of Type String: Table A stored in Hive has 40 columns, all of type String, with 500M rows. By default, SAS Access converts String to $32K. So, 32K in length for char. The math for this size table yields 1.2MB row length x 500M rows. This causes the system to come to a halt - Too large to store in LASR or WORK. The following techniques can be used to work around the challenge in SAS, and they all work: Use char and varchar in Hive instead of String. Set the libname option DBMAX_TEXT to globally restrict the character length of all columns read in In Hive do "SET TBLPROPERTIES SASFMT" to add formats for SAS on schema in HIVE. Add formatting to SAS code during inbound reads example: Sequence Length 8 Informat 10. format 10. I hope this helps.

bpreachuk · ‎10-23-2017

Hi @tanmoy. I would recommend you go the custom route - create UDF jars for Hive. That way you can use them anywhere you use Hive code. HPL/SQL is still not a GA feature, and in my opinion HPL/SQL use a little bit more time to "bake" and become more robust.

bpreachuk · ‎09-09-2017

Hi @Bala Vignesh N V. I know this is an old question, but I have encountered this recently. This answer may help someone else as well... The issue you had is most likely caused by specifying " COLLECTION ITEMS TERMINATED BY ',' ". When the table is defined like this (with COLLECTION ITEMS TERMINATED BY comma): -- Create a Dummy table to use in the insert query - like an Oracle DUAL table create table dummy_TBL (col1 int) ; insert into dummy_TBL (col1) values(1) ; create table data_TBL (id int, name string, address struct<city:string,State:string>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' COLLECTION ITEMS TERMINATED BY ',' STORED AS TEXTFILE; insert into table data_TBL select 1, 'Bala', named_struct('city','Tampa','State','FL') from dummy_TBL limit 1; The address.state value is NULL: +--------------+----------------+--------------------------------+--+ | data_tbl.id | data_tbl.name | data_tbl.address | +--------------+----------------+--------------------------------+--+ | 1 | Bala | {"city":"Tampa","state":null} | +--------------+----------------+--------------------------------+--+ But when you define the table like this (without COLLECTION ITEMS TERMINATED BY comma): create table data_TBL (id int, name string, address struct<city:string,State:string>) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE; insert into table data_TBL select 1, 'Bala', named_struct('city','Tampa','State','FL') from dummy_TBL limit 1; The address.state value is correct: +--------------+----------------+--------------------------------+--+ | data_tbl.id | data_tbl.name | data_tbl.address | +--------------+----------------+--------------------------------+--+ | 1 | Bala | {"city":"Tampa","state":"FL"} | +--------------+----------------+--------------------------------+--+ I hope this helps.

bpreachuk · ‎08-09-2017

Hi @avinash midatani. As mentioned in that other HCC post - this capability is not in Hive yet. The JIRA tracking the request is found here: https://issues.apache.org/jira/browse/HIVE-10593 The Spark code from @Alexander Bij found in the HCC post accomplishes that functionality - creating the Hive table structure automatically based on parquet file metadata. https://community.hortonworks.com/questions/5833/create-hive-table-to-read-parquet-files-from-parqu.html

bpreachuk · ‎08-08-2017

Also read this HCC post for more information: https://community.hortonworks.com/questions/5833/create-hive-table-to-read-parquet-files-from-parqu.html

bpreachuk · ‎08-08-2017

Hi @avinash midatani. I suspect the "LIKE PARQUET..." syntax is only valid in Impala. Your CREATE TABLE SYNTAX might have to look more like this (with explicit column definitions and without the "LIKE PARQUET" block): CREATE EXTERNAL TABLE tbl_test (col1 datatype1, col2 datatype2, ..., coln datatype3) STORED AS PARQUET LOCATION '/test/kpi'; I hope this helps.

bpreachuk · ‎07-19-2017

Thanks @Wynner!

bpreachuk · ‎07-19-2017

Additional information from @Matt Clarke: There are three files crucial to a new node being able to successfully join an existing cluster....(flow.xml.gz, users.xml, and authorizations.xml). All three of these files (flow, users, authorizations) must match before a node will be allowed to join an existing cluster. The flow.xml.gz file contains everything you have added while interfacing with the UI. All nodes must have matching flow.xml.gz files in order to join cluster. All you need to do is copy flow.xml.gz file from the original cluster node to the new node, make sure ownership is correct, and restart new node. Normally these files will be given out by the cluster to any new node who has none of them; however, if Ambari metrics are enabled and a flow.xml.gz does not exist, Ambari generates a flow.xml.gz file that contains only the Ambari reporting task. Because of this the new node will not match and will be unable to join the cluster. A NiFi cluster will never overwrite an existing flow.xml.gz file on a new node with its own. Secured NiFi clusters also requires that the users.xml and authorizations.xml file match if file based authorization is used. The users and authorizations XML files only come in to play when NiFi is secured and using the local file based authorization. If secured, the only time a cluster will hand out the users and authorizations XML files is if they don't exist as well. Bottom line... If you add a new NiFi host via ambari, it will try to join cluster. If it fails and shuts back down, copy the above the files from one of the existing nodes to the new node and restart via Ambari.

bpreachuk · ‎06-28-2017

We encountered a similar issue when upgrading our Ambari from 2.4 to 2.5. Our Kafka brokers would not restart. Here was the error message: /var/log/kafka/server.log.2017-06-27-19:java.lang.IllegalArgumentException: requirement failed: security.inter.broker.protocol must be a protocol in the configured set of advertised.listeners. The valid options based on currently configured protocols are Set(SASL_PLAINTEXT) We had specified PLAINTEXTSASL as the SASL protocol in the configuration. To fix this we changed the following configuration in Custom kafka-broker: security.inter.broker.protocol=SASL_PLAINTEXT

Online	Offline
Last Visited	‎04-26-2019 11:02 AM

Member Since	‎09-25-2015 05:26 PM
Last Visited	‎04-26-2019 11:02 AM
Posts	112
Kudos received	85

Cloudera Community

Re: Does Hortonworks have EOL dates?

Re: I have multiple tables which i need to join an...

Re: Difference between WHERE ...OR & WHERE ... IN

Re: How to insert individual rows into hive based ...

Re: Why is there data limitation with LLAP?

Re: How to change Ambari Services Account names us...

Re: Hive STRING vs VARCHAR Performance

Re: Migrating SQL or PLSQL language to Pig and Hiv...

Re: Struct datatype insert in hive

Re: parquet snappy file loading into hive

Re: parquet snappy file loading into hive

Re: parquet snappy file loading into hive

Re: HDF 2.x - Adding a new NiFi Node to an existin...

Re: HDF 2.x - Adding a new NiFi Node to an existin...

Re: Why Ambari is setting the security protocol of...