About dmueller1607

arald · ‎08-07-2018

have a look here: https://community.hortonworks.com/questions/88526/how-to-salt-row-key-in-hbase-table.html Basically it says that your prefix definition should be made in a way that you can calculate it during the query as well. In your (but perhaps simplified) example it might be even numbers prefix 000, odd numbers prefix 001.

elserj · ‎08-06-2018

Does your data actually span all of the regions you created splitpoints for? Or, when this finishes generating the HFile, does the client end up having to split the HFiles (and not just load them?). The only thing I can guess would be that the HBaseStorageHandler isn't doing something right. Generating only on HFile when you have 10 regions is definitely suboptimal.

dmueller1607 · ‎08-01-2018

Good question. I'm just running the insert into ... select * from ... command, like I could do it e.g. in Beeline or Ambari Hive View (JDBC). Is this running in Single-Insert or Batch mode?

ssubhas · ‎07-19-2018

@Daniel Müller When merging the Hive ORC files, instead of matching the files wrt Block size, the files are merge as per the ORC stripe size. The property which controls this is hive.merge.orcfile.stripe.level. When the property is set to true, the merge happens at stripe level and when set to false, the files are merge at file level. Parameters which affect the file level merge are: hive.merge.tezfiles=true hive.merge.mapfiles=true hive.merge.size.per.task=256000000 hive.merge.smallfiles.avgsize=16000000 For more details refer link. Also, there are some known limitations related to concatenation. Do observe the behaviour and file count when the concatenate is run in say 5 iterations.

dmueller1607 · ‎07-10-2018

Yes, I'm familiar with Spark. What I wondered about was the caching behavior. It really seems to know which HiveQL statement belongs to the cached data, and re-uses it automatically when the same query comes: // Cache the table for the first time => takes some time! val df1_1 = sqlContext.sql("SELECT a, b FROM db.table limit 1000000") val df1_2 = df1_1.cache() df1_2.count() // This re-uses the cached object, as the request is the same as before => very fast! val df2_1 = sqlContext.sql("SELECT a, b FROM db.table limit 1000000") val df2_2= df2_1.cache() df2_2.count() // This caches the data, because the request is different (another limit clause) => takes some time! val df3_1 = sqlContext.sql("SELECT a, b FROM db.table limit 10") val df3_2= df3_1.cache() df3_2.count() Thanks for your help @Felix Albani

dmueller1607 · ‎01-12-2018

Yes, that was the solution! Thank you very much! Everything works fine with the following statement now: SELECT SUM(menge) menge FROM mytable

xyao · ‎08-27-2018

@Daniel Muller, can you grep "Safe mode is" from hdfs namenode log? That will tell the reason why namenode does not exit safemode directly.

dmueller1607 · ‎07-06-2017

Removing the "hdp-1.novalocal" from the hosts list and using the hostname script for setting the public / private hostname did it for me! Thank you so much, I think you saved my whole week!

dmueller1607 · ‎07-07-2017

Ok, easy solution here: I took the hive-jdbc-<version>.jar file as dependency, but I have to take hive-jdbc-<version>-standalone.jar, so changing /usr/hdp/current/hive-client/lib/hive-jdbc-1.2.1000.2.6.1.0-129.jar into /usr/hdp/2.6.1.0-129/hive2/jdbc/hive-jdbc-2.1.0.2.6.1.0-129-standalone.jar did it for me! You can find the hive-jdbc-standalone.jar with find / -name "hive-jdbc*standalone.jar

dmueller1607 · ‎04-24-2018

Reading the Value with the XPath //@Type works fine.

Online	Offline
Last Visited	‎11-25-2019 04:11 AM

Member Since	‎04-24-2017 12:08 PM
Last Visited	‎11-25-2019 04:11 AM
Posts	106
Kudos received	13

Cloudera Community

Re: Spark Streaming / Hive + Kafka: Only one Worke...

Re: Filter a Phoenix Timestamp Column in SparkSQL ...

Re: Phoenix Query with Split operation on String (...

Re: Hive Metastore not starting in HDP 3.0

Re: HFile creation from Hive Table not working

Re: Use HBase Shell Scan method to search in salte...

Re: HFile creation from Hive Table not working

Re: Hive HBase Integration very slow inserts

Re: Hive CONCATENATE not always merging all small ...

Re: Apache Zeppelin (HDP 2.6) - First iteration of...

Re: Sum of Table Column in NiFi

Re: HDFS NameNode won't leave safemode

Re: Ambari Server lists too many hosts (caused by ...

Re: NoSuchMethodException when running %jdbc parag...

Re: How to use EvaluateXPath to get xml root's att...