Member since
07-19-2016
26
Posts
7
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5300 | 12-19-2017 08:44 PM |
05-17-2019
02:11 AM
@Rajkumar Singh - this is a useful article and I've probably passed it on to 5-6 customers. One thing to note though, this is no longer an issue with HDP3. 🙂 This is still very applicable and necessary for HDP2.x.
... View more
11-16-2018
12:42 AM
@carol elliott you need to set client option for unsupport terminal prior to launching via nohup: export HADOOP_CLIENT_OPTS="-Djline.terminal=jline.UnsupportedTerminal" nohup beeline -f foo.sql -u ${jdbcurl} >> nohup_beelineoutput.out &
... View more
07-03-2018
11:34 AM
We had to add this to custom hive-site (custom hive-env does not work). Once added to custom hive-site, this worked fine. James
... View more
05-28-2018
04:02 PM
@Sergey Shelukhin The "Set Recommended" picks 12288Mb for overhead. My MX is 32Gb and Cache is 16Gb. 6% would be 1.92Gb, so I was going to go with 2Gb based on this article. Maybe overhead requirements have changed with 2.6.4 relative to this article? Thanks!!
... View more
12-19-2017
11:12 PM
you should be able to use show table extended partition to see if you can get info on it and not try to open anyone who is zero bytes. Like this: scala> var sqlCmd="show table extended from mydb like 'mytable' partition (date_time_date='2017-01-01')" sqlCmd: String = show table extended from mydb like 'mytable' partition (date_time_date='2017-01-01') scala> var partitionsList=sqlContext.sql(sqlCmd).collectAsList partitionsList: java.util.List[org.apache.spark.sql.Row] = [[mydb,mytable,false,Partition Values: [date_time_date=2017-01-01] Location: hdfs://mycluster/apps/hive/warehouse/mydb.db/mytable/date_time_date=2017-01-01 Serde Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat Storage Properties: [serialization.format=1] Partition Parameters: {rawDataSize=441433136, numFiles=1, transient_lastDdlTime=1513597358, totalSize=4897483, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numRows=37825} ]] Let me know if that works and you can avoid the 0 byter's with such or if you still get null pointer.. James
... View more
12-19-2017
08:44 PM
2 Kudos
This is a way in lieu of not having Ranger to set the permissions at the HDFS level of a database (and have it carry to all tables and files in that db) or directory (and have it carry to all subdirectories and files in that directory). For example, after setting that parameter, you can do a hadoop fs -chown myuser /apps/hive/warehouse/mydb.db; hadoop fs -chmod 700 /apps/hive/warehouse/mydb.db and now myuser is the only one who can see or do things with that db as hive.warehouse.subdir.inherit.perms=true causes anything created underneath it to inherit the same permissions and ownership as the parent. While this works, Ranger is the way to go though.
... View more
12-19-2017
03:38 AM
@Benakaraj KS Go to 2.6.3 and set spark.sql.hive.convertMetastoreOrc=true and spark.sql.orc.enabled=true. You'll get a 3x 🙂
... View more
06-26-2017
04:42 PM
Thank you - very useful!
... View more
06-08-2017
01:14 PM
4 Kudos
i forgot to update this. Actually, the LLAP and the cache was setup correctly. In the queries taken from the Teradata, each of the 200k queries performs several joins and one of the joins was on a column that was a null. So, result set was null. Side benefit though was a nice gain in knowledge on all the nobs/levers of this product. And it is a very nice product. My experience is in Teradata/Netezza/Hawq and I've found LLAP to be a clear winner in replacing those. Very fast, very robust. Couple of Notes on the things that mattered: -Use ORC (duh) -Get Partitioned by / Clustered by right (use hive --orcfiledump as shown in Sergey's slides to make sure you get 100k records per orc file) -Get number of nodes / appmasters right -Use local Metastore -Get heap's right -Get yarn capacity scheduler settings right -Increase executors -(probably not supported) clone IS2 jvm's for more throughput if needed. Pull-up hive ui or grafana and sit back, smile and enjoy watching the transactions fly.
... View more