About mbigelow

mbigelow · ‎07-31-2017

impala-shell -i haproxy1:21000 -k --ssl Are you using the FQDN in the impala-shell command? i.e. impala-shell -i haproxy.company.local -k -ssl Is there an ssl certificate for the HAProxy and is it configured to use it. Is the CA cert for it in the PEM file that Impala is configured to use?

mbigelow · ‎07-31-2017

That shouldn't matter. I am using an ELB that is completely separate from the CDH cluster. Did you specify the FQDN in that setting and does the principal contain the FQDN?

mbigelow · ‎07-31-2017

After making this change did you Generate Missing Credential in the CM Security windows or manually create the account and SPN. I haven't done Impala but for HS2, after adding the LB info in the Hive configs it through a configuration warning that credentials were missing. I generated them, the warning disappeared, and the LB worked.

mbigelow · ‎07-31-2017

I can't seem to find anything but I thought you could change the prefix. I feel sure you can for MR jobs, but not sure for Hive. But if it is a MR property you could set that in your Hive session. The other thing to talk about here is that *_copy_1 is part of the Hive code for dynamic partitions. It checks before hand if 0000_0 already exist, possible from another reducer or another Hive process. It then appends _copy_# to protect the data. This will remain regardless of the prefix. So in theory, even if you went down to the millisecond, you could end up with identical files with the same name. Changing the prefix should help your case though, so try finding something on changing the output file prefix.

mbigelow · ‎07-31-2017

Look for the --netrc switch for your curl command. You can use this file to pass the username and password to the command. This keeps it out of the ps output and history but the file security must be maintained. The format should be the below but check man if needed. machine host.domain.com login myself password secret

mbigelow · ‎07-28-2017

The reason the first query works is because it does not need any MR or Spark jobs to run. The HS2 or Hive client just read the data directly. The second query requires MR or Spark jobs to be ran. This is key to remember when testing or troubleshooting the cluster. Are you able to run Spark jobs out side of Hive? Try the below command but swap out to your jar version. spark-submit --class org.apache.spark.examples.SparkPi --master yarn --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 /opt/cloudera/parcels/SPARK/lib/spark/examples/jars/spark-examples_*.jar Also access the Spark History server to get to the driver and executor logs to try to get more details on the failure.

mbigelow · ‎07-28-2017

Yes. It may cause issues depending on how you are using it when ingestion data into it. What are you trying to do? I wonder if a Hive view would be better then a separate table.

mbigelow · ‎07-28-2017

I haven't set HA Proxy up for Impala, but I think you need a service principal for impala/<HAProxyHost>@REALM.COM in your KDC. The error is that the server is not found in the Kerberos database.

mbigelow · ‎07-28-2017

In the Spark2 configs, ensure that the Hive service is enabled. This will include the Hive client configs for the Spark2 service. This will allow the SparkSession created by spark2-shell to have Hive support for the HMS on the cluster. I haven't tested actual Spark2 applications but with the above setup it should be as simple as using the .enableHiveSupport in the SparkSession builder method. Outside of that you would probably need to include the hive-site.xml or Hive HMS settings in the Spark Context configuration object and then us .enableHiveSupport.

mbigelow · ‎07-28-2017

I haven't done this yet but it should do the trick. You need to update the alternatives to make Spark2 the default. This will make it the default across the board and not just for Livy. So make sure you are ready for that. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_admin.html

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: Error Connecting to Impala via HA Proxy Node

Re: Error Connecting to Impala via HA Proxy Node

Re: Error Connecting to Impala via HA Proxy Node

Re: Set destination filenames during dynamic parti...

Re: Kerberos enabled cluster - Curl command for Cl...

Re: Hive on Spark Queries are not working

Re: Two external tables point to same hdfs locatio...

Re: Error Connecting to Impala via HA Proxy Node

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Re: Spark 2.2 and Livy