About james1

james1 · ‎05-17-2019

@Rajkumar Singh - this is a useful article and I've probably passed it on to 5-6 customers. One thing to note though, this is no longer an issue with HDP3. 🙂 This is still very applicable and necessary for HDP2.x.

james1 · ‎11-16-2018

@carol elliott you need to set client option for unsupport terminal prior to launching via nohup: export HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" nohup beeline -f foo.sql -u ${jdbcurl} >> nohup_beelineoutput.out &

james1 · ‎07-03-2018

We had to add this to custom hive-site (custom hive-env does not work). Once added to custom hive-site, this worked fine. James

james1 · ‎05-28-2018

@Sergey Shelukhin The "Set Recommended" picks 12288Mb for overhead. My MX is 32Gb and Cache is 16Gb. 6% would be 1.92Gb, so I was going to go with 2Gb based on this article. Maybe overhead requirements have changed with 2.6.4 relative to this article? Thanks!!

james1 · ‎12-19-2017

you should be able to use show table extended partition to see if you can get info on it and not try to open anyone who is zero bytes. Like this: scala> var sqlCmd="show table extended from mydb like 'mytable' partition (date_time_date='2017-01-01')" sqlCmd: String = show table extended from mydb like 'mytable' partition (date_time_date='2017-01-01') scala> var partitionsList=sqlContext.sql(sqlCmd).collectAsList partitionsList: java.util.List[org.apache.spark.sql.Row] = [[mydb,mytable,false,Partition Values: [date_time_date=2017-01-01] Location: hdfs://mycluster/apps/hive/warehouse/mydb.db/mytable/date_time_date=2017-01-01 Serde Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Storage Properties: [serialization.format=1] Partition Parameters: {rawDataSize=441433136, numFiles=1, transient_lastDdlTime=1513597358, totalSize=4897483, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numRows=37825} ]] Let me know if that works and you can avoid the 0 byter's with such or if you still get null pointer.. James

james1 · ‎12-19-2017

This is a way in lieu of not having Ranger to set the permissions at the HDFS level of a database (and have it carry to all tables and files in that db) or directory (and have it carry to all subdirectories and files in that directory). For example, after setting that parameter, you can do a hadoop fs -chown myuser /apps/hive/warehouse/mydb.db; hadoop fs -chmod 700 /apps/hive/warehouse/mydb.db and now myuser is the only one who can see or do things with that db as hive.warehouse.subdir.inherit.perms=true causes anything created underneath it to inherit the same permissions and ownership as the parent. While this works, Ranger is the way to go though.

james1 · ‎12-19-2017

@Benakaraj KS Go to 2.6.3 and set spark.sql.hive.convertMetastoreOrc=true and spark.sql.orc.enabled=true. You'll get a 3x 🙂

james1 · ‎06-26-2017

Thank you - very useful!

james1 · ‎06-08-2017

@Constantin Stanca

james1 · ‎06-08-2017

i forgot to update this. Actually, the LLAP and the cache was setup correctly. In the queries taken from the Teradata, each of the 200k queries performs several joins and one of the joins was on a column that was a null. So, result set was null. Side benefit though was a nice gain in knowledge on all the nobs/levers of this product. And it is a very nice product. My experience is in Teradata/Netezza/Hawq and I've found LLAP to be a clear winner in replacing those. Very fast, very robust. Couple of Notes on the things that mattered: -Use ORC (duh) -Get Partitioned by / Clustered by right (use hive --orcfiledump as shown in Sergey's slides to make sure you get 100k records per orc file) -Get number of nodes / appmasters right -Use local Metastore -Get heap's right -Get yarn capacity scheduler settings right -Increase executors -(probably not supported) clone IS2 jvm's for more throughput if needed. Pull-up hive ui or grafana and sit back, smile and enjoy watching the transactions fly.

Online	Offline
Last Visited	‎12-12-2022 03:05 PM

Member Since	‎07-19-2016 06:07 PM
Last Visited	‎12-12-2022 03:05 PM
Posts	26
Kudos received	7

Cloudera Community

Re: hive.warehouse.subdir.inherit.perms=false

Re: hivemetastore dead lock while running compacti...

Re: Unable to successfully launch beeline script f...

Re: Where should I add hive.support.sql11.reserved...

Re: LLAP sizing and setup

Re: Is there any fix or work around available for ...

Re: hive.warehouse.subdir.inherit.perms=false

Re: Dataframe Insert into ORC table is slow compar...

Re: Ambari 2.5 deploy of hdp 2.6 faiils on RHEL 7 ...

Re: LLAP not using io cache

Re: LLAP not using io cache