Member since
09-25-2015
112
Posts
88
Kudos Received
12
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
9621 | 03-15-2017 03:17 PM | |
6031 | 02-17-2017 01:03 PM | |
1779 | 01-02-2017 10:47 PM | |
2680 | 11-16-2016 07:03 PM | |
1072 | 10-07-2016 05:24 PM |
01-24-2018
03:28 PM
Hi @Ron Lee. Thanks for this writeup. One question - On a kerberized cluster will the keytabs need to be regenerated if the service account name is changed?
... View more
11-04-2017
12:19 PM
Hi @Jeff Watson. You are correct about SAS use of String datatypes. Good catch! One of my customers also had to deal with this. String datatype conversions can perform very poorly in SAS. With SAS/ACCESS to Hadoop you can set the libname option DBMAX_TEXT (added with SAS 9.4m1 release) to globally restrict the character length of all columns read into SAS. However for restricting column size SAS does specifically recommends using the VARCHAR datatype in Hive whenever possible. http://support.sas.com/documentation/cdl/en/acreldb/67473/HTML/default/viewer.htm#n1aqglg4ftdj04n1eyvh2l3367ql.htm Use Case
Large Table, All Columns of Type String: Table A stored in Hive has 40 columns, all of type String, with 500M rows. By default, SAS Access converts String to $32K. So, 32K in length for char. The math for this size table yields 1.2MB row length x 500M rows. This causes the system to come to a halt - Too large to store in LASR or WORK. The following techniques can be used to work around the challenge in SAS, and they all work:
Use char and varchar in Hive instead of String. Set the libname option DBMAX_TEXT to globally restrict the character length of all columns read in In Hive do "SET TBLPROPERTIES SASFMT" to add formats for SAS on schema in HIVE. Add formatting to SAS code during inbound reads
example: Sequence Length 8 Informat 10. format 10. I hope this helps.
... View more
10-23-2017
03:33 AM
Hi @tanmoy. I would recommend you go the custom route - create UDF jars for Hive. That way you can use them anywhere you use Hive code. HPL/SQL is still not a GA feature, and in my opinion HPL/SQL use a little bit more time to "bake" and become more robust.
... View more
09-09-2017
03:43 PM
2 Kudos
Hi @Bala Vignesh N V. I know this is an old question, but I have encountered this recently. This answer may help someone else as well... The issue you had is most likely caused by specifying " COLLECTION ITEMS TERMINATED BY ',' ". When the table is defined like this (with COLLECTION ITEMS TERMINATED BY comma): -- Create a Dummy table to use in the insert query - like an Oracle DUAL table
create table dummy_TBL (col1 int) ;
insert into dummy_TBL (col1) values(1) ;
create table data_TBL (id int, name string, address struct<city:string,State:string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY ','
STORED AS TEXTFILE;
insert into table data_TBL
select 1,
'Bala',
named_struct('city','Tampa','State','FL')
from dummy_TBL limit 1; The address.state value is NULL: +--------------+----------------+--------------------------------+--+
| data_tbl.id | data_tbl.name | data_tbl.address |
+--------------+----------------+--------------------------------+--+
| 1 | Bala | {"city":"Tampa","state":null} |
+--------------+----------------+--------------------------------+--+ But when you define the table like this (without COLLECTION ITEMS TERMINATED BY comma): create table data_TBL (id int, name string, address struct<city:string,State:string>)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
insert into table data_TBL
select 1,
'Bala',
named_struct('city','Tampa','State','FL')
from dummy_TBL limit 1; The address.state value is correct: +--------------+----------------+--------------------------------+--+
| data_tbl.id | data_tbl.name | data_tbl.address |
+--------------+----------------+--------------------------------+--+
| 1 | Bala | {"city":"Tampa","state":"FL"} |
+--------------+----------------+--------------------------------+--+
I hope this helps.
... View more
08-09-2017
09:56 AM
1 Kudo
Hi @avinash midatani. As mentioned in that other HCC post - this capability is not in Hive yet. The JIRA tracking the request is found here: https://issues.apache.org/jira/browse/HIVE-10593 The Spark code from @Alexander Bij found in the HCC post accomplishes that functionality - creating the Hive table structure automatically based on parquet file metadata. https://community.hortonworks.com/questions/5833/create-hive-table-to-read-parquet-files-from-parqu.html
... View more
08-08-2017
06:48 PM
Also read this HCC post for more information: https://community.hortonworks.com/questions/5833/create-hive-table-to-read-parquet-files-from-parqu.html
... View more
08-08-2017
06:45 PM
Hi @avinash midatani. I suspect the "LIKE PARQUET..." syntax is only valid in Impala.
Your CREATE TABLE SYNTAX might have to look more like this (with explicit column definitions and without the "LIKE PARQUET" block): CREATE EXTERNAL TABLE tbl_test (col1 datatype1, col2 datatype2, ..., coln datatype3)
STORED AS PARQUET
LOCATION '/test/kpi'; I hope this helps.
... View more
07-19-2017
07:54 PM
Thanks @Wynner!
... View more
07-19-2017
07:05 PM
Additional information from @Matt Clarke: There are three files crucial to a new node being able to successfully join an existing cluster....(flow.xml.gz, users.xml, and authorizations.xml). All three of these files (flow, users, authorizations) must match before a node will be allowed to join an existing cluster. The flow.xml.gz file contains everything you have added while interfacing with the UI. All nodes must have matching flow.xml.gz files in order to join cluster. All you need to do is copy flow.xml.gz file from the original cluster node to the new node, make sure
ownership is correct, and restart new node. Normally these files will be given out by the cluster to any new node who has none of them; however, if Ambari metrics are enabled and a flow.xml.gz does not exist, Ambari generates a flow.xml.gz file that contains only the Ambari reporting task. Because of this the new node will not match and will be unable to join the cluster. A NiFi cluster will never overwrite an existing flow.xml.gz file on a new node with its own. Secured NiFi clusters also requires that the users.xml and authorizations.xml file match if file based authorization is used. The users and authorizations XML files only come in to play when NiFi is secured and using the local file based authorization. If secured, the only time a cluster will hand out the users and authorizations XML files is if they don't exist as well.
Bottom line... If you add a new NiFi host via ambari, it will try to join cluster. If it fails and shuts back down, copy the above the files from one of the existing nodes to the new node and restart via Ambari.
... View more
06-28-2017
11:17 AM
We encountered a similar issue when upgrading our Ambari from 2.4 to 2.5. Our Kafka brokers would not restart. Here was the error message: /var/log/kafka/server.log.2017-06-27-19:java.lang.IllegalArgumentException: requirement failed: security.inter.broker.protocol must be a protocol in the configured set of advertised.listeners. The valid options based on currently configured protocols are Set(SASL_PLAINTEXT) We had specified PLAINTEXTSASL as the SASL protocol in the configuration. To fix this we changed the following configuration in Custom kafka-broker: security.inter.broker.protocol=SASL_PLAINTEXT
... View more