Member since
08-16-2016
642
Posts
130
Kudos Received
68
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2736 | 10-13-2017 09:42 PM | |
4421 | 09-14-2017 11:15 AM | |
2424 | 09-13-2017 10:35 PM | |
3741 | 09-13-2017 10:25 PM | |
4108 | 09-13-2017 10:05 PM |
07-27-2017
10:35 AM
Hmm I don't have an answer but a comment. I am using CDH 5.8.2. I get this same behavior from HUE. Some connections are not closed when the tab is closed. I don't recall the exact state, but we have the idle_query_timeout and idle_session_timeout set to 1 hour and these are closed after that time. So if idle_query isn't working try idle_session. If that doesn't work, there maybe something going one with your specific setup.
... View more
07-27-2017
10:31 AM
1 Kudo
Impala and hive have idle and session timeouts. These can be set globally at the service level or per client, so HUE can have its own. The Quickstart VM is not the place or method to test performance or compare performance. With that said, the statement below is all that is needed, if this is the usage pattern, then you should not use Hive. Impala will always be better for single record or column aggregation. "I want to fetch one particular record based on unique Id amoung 110GB data."
... View more
07-27-2017
10:27 AM
Does the CM server fail to start? Is this from the CM server logs? Can you get to the UI? The error is complaining that it can't reach the Service Monitor which is a secondary service of CM. The CM server should still start. If you can get to the UI, check the Service Monitor logs for more information on why it is failing.
... View more
07-27-2017
10:24 AM
What instance type are you using? Are they on-demand or reserved? What region and zone is this in? Did you use a Placement group? It could be that you are requesting an instance size, like the d2.8xl, and there just isn't enough room in the region/zone to allocate them in time, 20 mins. It could also be that AWS black magik isn't fast enough in moving instances around if you used a Placement group.
... View more
07-27-2017
09:28 AM
Lets take a step back. You mentioned that the cluster is Kerberized and therefor Navigator needs some configs along those lines to interact with the cluster but what is its auth mechanism. I believe it can be different. Yes, it is. The External auth options for Nav are AD (LDAP), OpenLDAP, and SAML. If you do not have one of these set up then it will use the Cloudera Manager accounts (which could be internal or external). If it is internal, then you will need to use one of those accounts, with the proper Nav role/group assigned, in the -u cmuser:cmpass switch. You won't need a Kerberos ticket in this case.
... View more
07-27-2017
09:22 AM
1 Kudo
On 2, I was talking more of Hadoop in general without consideration that we are talking about the Quickstart VM. If I recall correctly, the VM itself already has Ubuntu installed and CM on top of that. The Quickstart itself comes in either the Virtualbox, VMWare, KVM, or docker. So you will need to load on of those and get the image for that specific one to run the VM. I don't recall all of the support OS but I know virtualbox and VMWare work on Windows. As for memory, it has been sometime but I vaguely recalling it not starting without at least 8 GB allocated to the VM. This could be reduced by eliminating services (or stopping them from starting on boot) and maybe some config changes. This of course would deviate from the base image, which you would need to manage or track, to make it repeatable, in the event that you need to start over.
... View more
07-26-2017
05:54 PM
It should work with both --negotiate and -u. curl -v --negotiate -u "username" http://www.blah.com
... View more
07-26-2017
01:58 PM
Did you restart CM and CMS? If not, then it will not pickup the csd file and it will not be available as a service to install. If you have, for the cluster with the parcels distributed and activated, choose 'Add a Service' from the cluster action menu. Is it available in that list of services?
... View more
07-26-2017
01:00 PM
1 Kudo
1. Yes. It is a quick and easy way to get it up and running so that you can start using and learning all of the services. 2. No. Some, I have been told, have gotten Hadoop to run on Windows, HD Insight maybe the one I am thinking of, but it is not easy. Windows is not a supported OS for CDH. 3. There are a few but it primarily centers on feature access and service access with support. The last two words are key, as you can run all of CDH for free and have access to all services without support. So if you will be running production word loads you should be looking at Enterprise or above.
... View more
07-26-2017
12:51 PM
Hive has the limitation that it cannot set these values at runtime. They have to either be in the core-site.xml or the table definition (not positive on this one though). The former will require admin access to the CM UI. The latter can be done by the person creating the table, you. Set the location as such, s3a://ACCESS_KEY:SECRET_KEY/path/. This will not work if there is a '/' unless you have this patch. https://issues.apache.org/jira/browse/HADOOP-3733
... View more
07-25-2017
11:37 AM
The log entries have to do with reading from HDFS. Normally, metadata option like the ones you mentioned go through the Statestore, CatalogD, and HMS. I would check out the threads on each and the ImpalaD you are running the commands from to see what else is running. It is possible that one of these service is slow; it is also possible, based on the log entries, that reading from HDFS slow and the other threads are waiting on the one reading from HDFS.
... View more
07-25-2017
11:34 AM
Have you looked at the idle query timeout setting in Impala itself? There is the session level equivalent of QUERY_TIMEOUT_S that you can try from within your JDBC connection. https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_timeouts.html
... View more
07-25-2017
09:57 AM
Do you have the HDFS Gateway installed on the same host that spark2-shell is running on?
... View more
07-21-2017
11:26 PM
Try curl -H "Content-Type: application/json" --upload-file deploymnet.json -u admin:admin 'http://scmhost:7180/api/v17/cm/deployment?deleteCurrentDeployment=true'
... View more
07-21-2017
11:17 PM
@Fawze Your other questions have been answers but I wanted to add this bit regarding: "spark streaming." Spark2 comes with Structure Streams which is the new version of Spark Streaming. Currently Cloudera doesn't support it due to view it as an experimental API. I haven't looked myself, but if it is, then you run the risk of building apps based on it that could break with each upgrade of Spark2. Just a word of caution. I am still in the testing phase but so far no issues with running Spark1 and Spark2 on the same cluster. I have the Spark History servers on different hosts but that is more to spread the load. They run on different . ports and the configuration work out of the box. As mentioned they are separate service with separate configs. I currently have the gateway on the same host.
... View more
07-21-2017
11:11 PM
It is looking for a list of hostnames. Reading the CM API docs, you need to build it in a JSON array. https://cloudera.github.io/cm_api/apidocs/v17/ns0_apiHostNameList.html try decommHosts = {"host.name"} cm_handle.hosts_decommission(decommHosts)
... View more
07-21-2017
02:52 PM
That warning indicates that something is talking to CM without using SSL. Did you change all of the agent config files to use_tls=1? As for the truststore questions. First there is a keystore and a truststore. The keystore stores the key and certificate for a service. This is sensitive as it is the source of how a service identifies itself to another. The truststore just hold the signing certificate and is used by clients to trust any certs signed by the certs in it. The path /usr/lib/jvm/java-7-oracle-cloudera/jre/lib/security/cacerts looks similar to the location that you would store a system-wide truststore. I think that location is right and the name would be jssecacert or something similar. This means that all Java based program will use this by default without needing to tell the app or client of its location. Now you don't have to use it; you can create and use your own. And you can have as many as you want although each app, service, client can usually only be configured to use one at a time. Plus, since it is only storing the CA cert why not just have them all in one store to cut down the work. Note: with self-sign certs, the cert itself become the certificate signing or CA cert and must be put in the truststore.
... View more
07-21-2017
02:43 PM
Umm you when from hive:hadoop:drwxrwx--- to hive:hdfs:drwx------. That is 770 to 700, which is more restrictive. Please review my previous post.
... View more
07-21-2017
02:39 PM
@Fawze I don't collect specific metric, yet. I make an api call to get all Hive jobs between this time and that time (same for Impala) from... This data is then crunched to provide usage analysis for these specific types of jobs. /clusters/{clusterName}/services/{serviceName}/yarnApplications /clusters/{clusterName}/services/{serviceName}/impalaQueries
... View more
07-21-2017
02:35 PM
I would say Cloudera support if you have that for your cluster. They can then vet it against existing bug and patches backported to your version. They can also tell you if a bug exist, when it will be available and which version. And failing all of that they can open a new JIRA. You can open a JIRA account and create a ticket yourself, providing the CDH version and ask the community how to proceed. They should have some guidelines as well although I do not know them or have them handy.
... View more
07-21-2017
02:32 PM
1 Kudo
Based on some SO post, the exception is most likely related to some invalid JSON somewhere. This si the Spark History server though and I cannot think of any json files it would be using on a regular basis. On mine I see a redaction-rules.json. Are you using redaction? Oh wow, I think it was staring at us in the face. It is trying to read a specific application log which has invalid JSON characters. Read that file and put its output into a JSON validator to see what is invalid. I would save it somewhere so it can be review again if needed. Then remove it and try to run the job again. If it fails again, then something is causing it to create the invalid JSON in the application log.
... View more
07-21-2017
02:21 PM
The mappers are not even getting off of the ground. Just a sanity check, but are you able to run normal MapReduce jobs on the cluster. Have you tried just using hdfs dfs -cp hdfs://... s3a:// ? This will do the work in HDFS instead of launching mappers to run the tasks. I don't have enough info to pin it down but I would start by narrowing down where the problem is and check basic network connectivity.
... View more
07-21-2017
02:17 PM
1 Kudo
@MilesYao That may be. On that topic, I don't think anytime soon as Cloudera does not support many features that it does for Spark 1.6. I suspect that sometime post CDH 6 we will see Spark 2.x supplant Spark 1.x as the only version of Spark in CDH. Ah, I checked out HDP and see what you are getting at. It is really trivial on the difference. Cloudera asks you to put a file on the CM host and configure a separate parcel while HDP includes both Spark1 and Spark2 packages in the same repo.
... View more
07-20-2017
11:40 PM
Open up the Hive warehouse directory. Run hdfs dfs -chmod -R 1776 /user/hive/warehouse, to make it readable by all, or hdfs dfs -chmod 1777 to open it for both read and write.
... View more
07-20-2017
09:41 AM
Can you post the info in /var/run/cloudera-scm-agent/process/4229-impala-IMPALAD/hs_err_pid16825.log?
... View more
07-20-2017
08:27 AM
I would say to add the internal IPs to the hosts file for the Datanodes, as it seems that they are communicating over it, and the external for the Namenode. You could possibly even try the internal for the Namenode if the internal IPs are reachable by the other cluster.
... View more
07-20-2017
08:21 AM
https://community.cloudera.com/t5/Cloudera-Manager-Installation/how-to-rollback-cloudera-manager-tls-configuration-without-UI/m-p/46484/highlight/true#M8455
... View more
07-20-2017
08:18 AM
Aww I can work with password must not be null. I assume that the keytool command did not prompt you for a password. This means that the Java keystore and possible the private key are not password protected. Most service require that a password be set. The challenge here is whether you specified a password in the Cloudera Manager configs. If yes, and you recall it, you can recreate the key and cert in the JKS with that password and bring CM up. Note: the key and JKS password must be the same, CM assumes they are. To revert, you will need to log into the CM database and manually modify it. Let me track down those instructions.
... View more
07-19-2017
08:28 AM
Got the container logs for the failed Mappers to see if they have more information. An End of File exception can mean a few things.
... View more