Member since
07-01-2015
460
Posts
78
Kudos Received
43
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1346 | 11-26-2019 11:47 PM | |
1304 | 11-25-2019 11:44 AM | |
9471 | 08-07-2019 12:48 AM | |
2175 | 04-17-2019 03:09 AM | |
3484 | 02-18-2019 12:23 AM |
11-26-2017
11:49 AM
Exactly. During parcel activation the correct HADOOP_HOME is set to /opt/cloudera/parcels.. etc.
... View more
09-28-2017
04:06 PM
1 Kudo
How to set DEBUG log level on agent?
... View more
09-19-2017
08:12 AM
Hi Tomas79, First, thanks for your contributions to this thread as well as your suggestions! We do steer experienced users toward using configuration files and the bootstrap-remote CLI command vs. the UI, because the UI would get complex if we tried to add a checkbox or field or form for every Director feature into it. The Director server does have a full set of API endpoints that you can use to make updates to clusters that aren't easy or possible to do over the UI, so I recommend taking a look there. If you go to the /api-console URL for Director, there's an interactive facility for learning about the API and trying it out live. For example, there is an API endpoint for importing a configuration file directly into the server. It's documented here: https://www.cloudera.com/documentation/director/latest/topics/director_cluster_config.html#concept_lqt_2y1_x1b We don't have a corresponding configuration file export API endpoint yet, but you are not the first to suggest it, so be assured that it's on our wish list. In the meantime, the API can help you if you're willing to work with that. Director's single log is tough to navigate. Recent Director versions have added more context to lines in the log which make it feasible to filter relevant lines out. We've got some techniques documented here: https://www.cloudera.com/documentation/director/latest/topics/director_troubleshoot.html But I see room for more documentation. Specifically, at least as of 2.4: Each line includes a thread ID in square brackets. The ones starting with "p-" are for pipelines, Director's internal workflows, so you can follow one of those among all the other pipelines and other asynchronous tasks within Director. The fields following the thread ID are the unique (API) request ID, request method, and request URI that ultimately caused the activity being logged. You can work with the logback.xml file for the server to change the formatting, and perhaps even route logging to multiple files for easier comprehension (another ask that we've heard). Again, thanks for your feedback!
... View more
08-28-2017
09:03 AM
@hindog wrote: "Nor the browser column" -- What if the "browser" column WAS the primary key? Would increment be possible in similar fashion to upsert, the difference being that internally Kudu would read the existing value and add it to the increment value before saving? The other Kudu limitation that this runs into is that row key columns cannot be modified. It would be somewhat easier to implement atomic increments on non-key columns, but then that means doing a lot of updates and it's not something Kudu excels at e.g. it supports modifying data but the higher the rate the slower the reads are.
... View more
08-16-2017
09:30 AM
@Tomas79 wrote: Update: try Toad for Hadoop. It supports Impala and Hive with Kerberos, unfortunately the combination LDAP+Kerberos is not supported (yet). Not supported, but where is a trick how to connect to Impala under Kerberos + LDAP in TOAD for Hadoop. 1) First, select in TOAD Impala + LDAP only (without Kerberos) and check connection. It fails, well. 2) Enable Kerberos in TOAD (LDAP auth options will become unalivable), but still actual. Check connection again. 3) Success!
... View more
07-24-2017
08:23 AM
Can you give me the exact AMI IDs that you are using to try out?
... View more
05-18-2017
02:07 AM
1 Kudo
Thanks for sharing the code of your solution. I've also found that just making HiveContext variable lazy works: val sparkConf = new SparkConf().setAppName("StreamHDFSdata")
sparkConf.set("spark.dynamicAllocation.enabled","false")
val ssc = new StreamingContext(sparkConf, Seconds(5))
ssc.checkpoint("/user/hdpuser/checkpoint")
val sc = ssc.sparkContext
val smDStream = ssc.textFileStream("/user/hdpuser/data")
val smSplitted = smDStream.map( x => x.split(";") ).map( x => Row.fromSeq( x ) )
...
lazy val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
smSplitted.foreachRDD( rdd => {
// use sqlContext here
} )
... View more
03-28-2017
11:24 AM
in my case opend jdk is causing the issue once i removed them and installed the correct version of JDK the distribution completed successfully
... View more
02-23-2017
06:47 PM
Thomas, you have a legitimate request and concern. First, there is no perfectly fool-proof solution because the resource consumption is somewhat dependent on what happens at runtime, and not all memory consumption is tracked by Impala (but must is). We are constantly making improvements in this area though. 1. I'd recommend fixing the num_scanner_threads for your queries. A different number of scanner threads can result in different memory consumption from run to run (and dependent on what else is going on in the system at the time). 2. The operators of a query do not run one-by-one. Some of them run concurrently (e.g. join builds may execute concurrently). So just looking at the highest peak in the exec summary is not enough. Taking the sum of the peaks over all operators is a safer bet, but tends to overestimate the actual consumption. Hope this helps!
... View more
02-08-2017
05:26 AM
It looks like the problem is really in the timestamp field. Running a similar query on table without timestamp show much better results on the new environment. Thanks for the help
... View more