About mbigelow

mbigelow · ‎07-25-2017

Have you looked at the idle query timeout setting in Impala itself? There is the session level equivalent of QUERY_TIMEOUT_S that you can try from within your JDBC connection. https://www.cloudera.com/documentation/enterprise/5-7-x/topics/impala_timeouts.html

mbigelow · ‎07-25-2017

Do you have the HDFS Gateway installed on the same host that spark2-shell is running on?

mbigelow · ‎07-21-2017

Try curl -H "Content-Type: application/json" --upload-file deploymnet.json -u admin:admin 'http://scmhost:7180/api/v17/cm/deployment?deleteCurrentDeployment=true'

mbigelow · ‎07-21-2017

@Fawze Your other questions have been answers but I wanted to add this bit regarding: "spark streaming." Spark2 comes with Structure Streams which is the new version of Spark Streaming. Currently Cloudera doesn't support it due to view it as an experimental API. I haven't looked myself, but if it is, then you run the risk of building apps based on it that could break with each upgrade of Spark2. Just a word of caution. I am still in the testing phase but so far no issues with running Spark1 and Spark2 on the same cluster. I have the Spark History servers on different hosts but that is more to spread the load. They run on different . ports and the configuration work out of the box. As mentioned they are separate service with separate configs. I currently have the gateway on the same host.

mbigelow · ‎07-21-2017

That warning indicates that something is talking to CM without using SSL. Did you change all of the agent config files to use_tls=1? As for the truststore questions. First there is a keystore and a truststore. The keystore stores the key and certificate for a service. This is sensitive as it is the source of how a service identifies itself to another. The truststore just hold the signing certificate and is used by clients to trust any certs signed by the certs in it. The path /usr/lib/jvm/java-7-oracle-cloudera/jre/lib/security/cacerts looks similar to the location that you would store a system-wide truststore. I think that location is right and the name would be jssecacert or something similar. This means that all Java based program will use this by default without needing to tell the app or client of its location. Now you don't have to use it; you can create and use your own. And you can have as many as you want although each app, service, client can usually only be configured to use one at a time. Plus, since it is only storing the CA cert why not just have them all in one store to cut down the work. Note: with self-sign certs, the cert itself become the certificate signing or CA cert and must be put in the truststore.

mbigelow · ‎07-21-2017

@Fawze I don't collect specific metric, yet. I make an api call to get all Hive jobs between this time and that time (same for Impala) from... This data is then crunched to provide usage analysis for these specific types of jobs. /clusters/{clusterName}/services/{serviceName}/yarnApplications /clusters/{clusterName}/services/{serviceName}/impalaQueries

mbigelow · ‎07-21-2017

I would say Cloudera support if you have that for your cluster. They can then vet it against existing bug and patches backported to your version. They can also tell you if a bug exist, when it will be available and which version. And failing all of that they can open a new JIRA. You can open a JIRA account and create a ticket yourself, providing the CDH version and ask the community how to proceed. They should have some guidelines as well although I do not know them or have them handy.

mbigelow · ‎07-21-2017

Based on some SO post, the exception is most likely related to some invalid JSON somewhere. This si the Spark History server though and I cannot think of any json files it would be using on a regular basis. On mine I see a redaction-rules.json. Are you using redaction? Oh wow, I think it was staring at us in the face. It is trying to read a specific application log which has invalid JSON characters. Read that file and put its output into a JSON validator to see what is invalid. I would save it somewhere so it can be review again if needed. Then remove it and try to run the job again. If it fails again, then something is causing it to create the invalid JSON in the application log.

mbigelow · ‎07-21-2017

@MilesYao That may be. On that topic, I don't think anytime soon as Cloudera does not support many features that it does for Spark 1.6. I suspect that sometime post CDH 6 we will see Spark 2.x supplant Spark 1.x as the only version of Spark in CDH. Ah, I checked out HDP and see what you are getting at. It is really trivial on the difference. Cloudera asks you to put a file on the CM host and configure a separate parcel while HDP includes both Spark1 and Spark2 packages in the same repo.

mbigelow · ‎07-20-2017

I would say to add the internal IPs to the hosts file for the Datanodes, as it seems that they are communicating over it, and the external for the Namenode. You could possibly even try the internal for the Namenode if the internal IPs are reachable by the other cluster.

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: JDBC driver socket timeout is not working

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Re: CM Json deployment Import Error

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Re: issue with cloudera management services after ...

Re: How can I get some monitoring data from cloude...

Re: Hive operation logs are not released by hive i...

Re: Spark on Yarn - Unexpected end-of-input: was e...

Re: spark 2.2 parcel availability in CDH

Re: UnresolvedAddressException trying to distcp be...