About bleonhardi

bleonhardi · ‎04-08-2016

"submitted by user hive to unknown queue: default" So does your Resourcemanager have a default queue? You can check in the Resourcemanager UI at port 8088

bleonhardi · ‎04-08-2016

In an unkerberised HDP cluster the hbase node is /hbase-unsecure and will be changed to /hbase-secure In this question he did the same thing and fixed it by adding the url "zkUrl","sandbox:2181:/hbase-unsecure", https://community.hortonworks.com/questions/18228/phoenix-hbase-problem-with-hdp-234-and-java.html I doubt adding it to the spark config helps anything ( only parameters with spark. get serialized for example ) sqlline needed the /hbae-unsecure before but in the newest version they seem to take the znode from the hbase-site.xml if not otherwise configured. You can check in your hbase-site which node is needed.

bleonhardi · ‎04-08-2016

Depends what you plan to do. - Aggregation queries and analytical reports then Hive ( simple jdbc connection is supported by BIRT and Pentaho and you can also make servlets with jdbc pools the whole shebang ) - Selecting one record at a time ( like a dashboard that shows the data of one customer Hbase with REST api from javascript might work Hbase with java api from a servlet if you prefer SQL Apache Phoenix is a cool SQL layer on top of HBase https://phoenix.apache.org/ - Interactive reports on thousands to millions of records ( not billions ) Apache Phoenix, it provides some good enhancements on base HBase from a performance perspective for anything that touches more than a row. You can also do Joins aggregations etc. pp. ( If you want HBase but have kerberos setup have a look at Knox its a SSL capable proxy that strips away the Kerberos requirement and replaces it with a normal web authentication setting for the hbase API )

bleonhardi · ‎04-08-2016

@Maharaj Muthusamy You need to set the hint after the first statement. I.e. the upsert statement not the select one so it works. Just tried it This works and results in a sort merge join: explain upsert /*+ USE_SORT_MERGE_JOIN */ into productsales select productsales.product, productsales.date, productsales.amount from sales,productsales where sales.product = productsales.product; This doesn't: explain upsert into productsales select /*+ USE_SORT_MERGE_JOIN */ productsales.product, productsales.date, productsales.amount from sales,productsales where sales.product = productsales.product;

bleonhardi · ‎04-08-2016

You can use your own frontend using angular or whatever you want ( Dojo has some nice charts ) https://www.sitepen.com/blog/2008/06/06/a-beginners-guide-to-dojo-charting-part-1-of-2/ However if you need only a lower level of flexibility you could use BIRT http://www.eclipse.org/birt/ or pentaho or other reporting tools. That would be easier and BIRT for example provides pretty flexible report creation capabilities. ( It breaks down when you want a hugely interactive frontend. )

bleonhardi · ‎04-08-2016

Normally Copy and paste works. Do you use putty? Also if filezilla fails you could use winscp. But normally he tells you more than critical transfer error.

bleonhardi · ‎04-07-2016

Hive expects a SASL Wrapper from the client. ( empty in your case ). And doesnt seem to get any or with a wrong status. Is it possible that the odbc driver is old? Did you use the ODBC driver from here? http://hortonworks.com/hdp/addons/

bleonhardi · ‎04-07-2016

I assume if you run explain that he shows both times that he ignores the hint. I found this link where SquirrelJDBC was removing hints but it looks like you use sqlline and the link said it works there and the usage looks identical to what you do. ( Could you try it once without the upsert to see if that helps? ) https://mail-archives.apache.org/mod_mbox/phoenix-user/201503.mbox/%3cfc15b78a902a1ad5c0d66efc4a5b2342@mail.gmail.com%3e The second possibility would obviously be to increase the hash cache? 100MB is not that much in todays day and age and 3m rows is not the world. phoenix.query.maxServerCacheBytes I hope he is smart enough to only build a cache from the two columns of the right side he actually needs. But I assume since its a description column, that is actually bigger. 200 bytes * 3m rows would be 600m of data.

bleonhardi · ‎04-07-2016

Yeah the first approach is simple and what I did before so I know it works. Scanning the whole data 8 times is a bit wasteful but the operation should be very fast ( you only parse the dataset once and filters are quick) . Groupby might be more efficient for large number of types but you need to somehow implement a file save for an array and he will put everything for one type in memory of one executor I think. So more work and less robust. If you go that second way an article here would be cool.

bleonhardi · ‎04-07-2016

so the data is in the same stream? I.e. one row will have one format and the second one will have another? If you have 8 Kafka Streams I suppose you wouldn't ask. In that case you have two chances: - Make a identify function apply it and then filter the RDD 8 times for each type then each time do the correct parsing and persisting in SQL. As an illustration val inputStream ... inputStream.map(record => ( identifyType(record), record)) type1Stream = inputStream.filter ( record._1 == "type1" ); type2Stream = inputStream.filter ( record._1 == "type2" ); ... type1Stream.map(record => myParse1Function(record._2); type1Stream.map(persist as my dataframe in table1); type2Stream.map(record => myParse2Function(record._2); type2Stream.map(persist as my dataframe in table2); - Make a identify function apply it and then group by the type somehow, problem is how do you save the grouped by values they will all end up in the same executor I think. Would be a bit more work but more efficient because above you filter the same stream 8 times. Unfortunately there is no tee yet that could split apart a stream. That would be exactly what you need. ( If I understood the question correctly ) https://issues.apache.org/jira/browse/SPARK-13378

Online	Offline
Last Visited	‎08-27-2016 12:14 PM

Member Since	‎09-23-2015 08:23 PM
Last Visited	‎08-27-2016 12:14 PM
Posts	800
Kudos received	888

Cloudera Community

Re: where an when does the fileinputformat() runs...

Re: We perform frequently Cartesian products invo...

Re: Kafka for queue to spark

Re: How new DAGs are submitted to existing Tez App...

Re: What is it meant by "HiveServer cannot handle ...

Re: Failed to execute tez graph

Re: The node /hbase is not in ZooKeeper. It should...

Re: Develop a RESTful API for a Front End

Re: Phoenix - Query - Error Size of hash cache (10...

Re: Develop a RESTful API for a Front End

Re: How to select SSH Private key from web browser...

Re: Cannot Connect Tableau to HiveServer2

Re: Phoenix - Query - Error Size of hash cache (10...

Re: Create different schemas at run time for diff...

Re: Create different schemas at run time for diff...