About Tomas79

Tomas79 · ‎11-26-2017

Exactly. During parcel activation the correct HADOOP_HOME is set to /opt/cloudera/parcels.. etc.

Tomas79 · ‎09-28-2017

How to set DEBUG log level on agent?

Bill Havanki · ‎09-19-2017

Hi Tomas79, First, thanks for your contributions to this thread as well as your suggestions! We do steer experienced users toward using configuration files and the bootstrap-remote CLI command vs. the UI, because the UI would get complex if we tried to add a checkbox or field or form for every Director feature into it. The Director server does have a full set of API endpoints that you can use to make updates to clusters that aren't easy or possible to do over the UI, so I recommend taking a look there. If you go to the /api-console URL for Director, there's an interactive facility for learning about the API and trying it out live. For example, there is an API endpoint for importing a configuration file directly into the server. It's documented here: https://www.cloudera.com/documentation/director/latest/topics/director_cluster_config.html#concept_lqt_2y1_x1b We don't have a corresponding configuration file export API endpoint yet, but you are not the first to suggest it, so be assured that it's on our wish list. In the meantime, the API can help you if you're willing to work with that. Director's single log is tough to navigate. Recent Director versions have added more context to lines in the log which make it feasible to filter relevant lines out. We've got some techniques documented here: https://www.cloudera.com/documentation/director/latest/topics/director_troubleshoot.html But I see room for more documentation. Specifically, at least as of 2.4: Each line includes a thread ID in square brackets. The ones starting with "p-" are for pipelines, Director's internal workflows, so you can follow one of those among all the other pipelines and other asynchronous tasks within Director. The fields following the thread ID are the unique (API) request ID, request method, and request URI that ultimately caused the activity being logged. You can work with the logback.xml file for the server to change the formatting, and perhaps even route logging to multiple files for easier comprehension (another ask that we've heard). Again, thanks for your feedback!

J-D · ‎08-28-2017

@hindog wrote: "Nor the browser column" -- What if the "browser" column WAS the primary key? Would increment be possible in similar fashion to upsert, the difference being that internally Kudu would read the existing value and add it to the increment value before saving? The other Kudu limitation that this runs into is that row key columns cannot be modified. It would be somewhat easier to implement atomic increments on non-key columns, but then that means doing a lot of updates and it's not something Kudu excels at e.g. it supports modifying data but the higher the rate the slower the reads are.

dna_29a · ‎08-16-2017

@Tomas79 wrote: Update: try Toad for Hadoop. It supports Impala and Hive with Kerberos, unfortunately the combination LDAP+Kerberos is not supported (yet). Not supported, but where is a trick how to connect to Impala under Kerberos + LDAP in TOAD for Hadoop. 1) First, select in TOAD Impala + LDAP only (without Kerberos) and check connection. It fails, well. 2) Enable Kerberos in TOAD (LDAP auth options will become unalivable), but still actual. Check connection again. 3) Success!

Mike Wilson · ‎07-24-2017

Can you give me the exact AMI IDs that you are using to try out?

allod · ‎05-18-2017

Thanks for sharing the code of your solution. I've also found that just making HiveContext variable lazy works: val sparkConf = new SparkConf().setAppName("StreamHDFSdata") sparkConf.set("spark.dynamicAllocation.enabled","false") val ssc = new StreamingContext(sparkConf, Seconds(5)) ssc.checkpoint("/user/hdpuser/checkpoint") val sc = ssc.sparkContext val smDStream = ssc.textFileStream("/user/hdpuser/data") val smSplitted = smDStream.map( x => x.split(";") ).map( x => Row.fromSeq( x ) ) ... lazy val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) smSplitted.foreachRDD( rdd => { // use sqlContext here } )

adminzeeshan · ‎03-28-2017

in my case opend jdk is causing the issue once i removed them and installed the correct version of JDK the distribution completed successfully

alex.behm · ‎02-23-2017

Thomas, you have a legitimate request and concern. First, there is no perfectly fool-proof solution because the resource consumption is somewhat dependent on what happens at runtime, and not all memory consumption is tracked by Impala (but must is). We are constantly making improvements in this area though. 1. I'd recommend fixing the num_scanner_threads for your queries. A different number of scanner threads can result in different memory consumption from run to run (and dependent on what else is going on in the system at the time). 2. The operators of a query do not run one-by-one. Some of them run concurrently (e.g. join builds may execute concurrently). So just looking at the highest peak in the exec summary is not enough. Taking the sum of the peaks over all operators is a safer bet, but tends to overestimate the actual consumption. Hope this helps!

Tomas79 · ‎02-08-2017

It looks like the problem is really in the timestamp field. Running a similar query on table without timestamp show much better results on the new environment. Thanks for the help

Online	Offline
Last Visited	‎01-14-2021 05:46 AM

Member Since	‎07-01-2015 06:03 AM
Last Visited	‎01-14-2021 05:46 AM
Posts	460
Kudos received	79

Cloudera Community

Re: Read service-wide configuration values via API

Re: Cloudera Altus - create CM with existing postg...

Re: Spark job getting failed with Jupyter notebook

Re: Create Parameterized view Impala

Re: Unable to access NameNode in cross realm trust...

Re: Failed to format namenode

Re: Cloudera Agent won't start - WARNING Stopping...

Re: Director cannot create EC2 - Insufficient numb...

Re: Kudu increment update support

Re: Access Impala from Oracle SQL Developer or sim...

Re: Java 8 on CDH cluster using Cloudera director ...

Re: How to write data from dStream into permanent ...

Re: CDH Cluster installation freezes (Distribution...

Re: How to set MEM_LIMIT based on explain plan

Re: Impala - performance degradation between 2.6 a...