About mbigelow

mbigelow · ‎06-16-2017

The semicolon is needed after each HQL statement not at the end of the script name. Remove it and try again. Please share the contents of the script as well. If I recall correctly, if you don't use 'exit' in the HQL script it will dump you into the Hive shell. Without knowing the rest of the script I'd say it is behaving as expected.

mbigelow · ‎06-16-2017

Try taking off the './'. Distributing to the executors will place it in the working directory of each one and can be referenced as such. The annotation of './' means to run a file not read a file.

mbigelow · ‎06-16-2017

A way to think about it is "How will it know the location value?" It won't. You must tell it what the value is and per row so it can partition it dynamically as it is loading the data. If the location value is the same for all of the data in the DF, then you may be better served by loading it statically. In which case, create the subfolder for the location value under the table's path and then write the DF out to that location.

mbigelow · ‎06-16-2017

Your case class must include it so that it maps to the DF correctly. The error is because it is now looking for the location columns in the DF and it doesn't exist. Make the change to your class and DF; then it should be good.

mbigelow · ‎06-16-2017

I don't know of anything that would allow it to read the configuration files from the other file. The difference in mapred and mapreduce settings is the MR API. I think it is possible then that the MR app you have is using the older API and maybe that is why the mapred settings were working before. You can check the configuration settings for each MR job through the RM UI. Use that to verify the exact settings used in each settings. On the resource usage, the number of maps is determined by the input. Are you positive that the same amount of data and blocks was being used by the app in each run?

mbigelow · ‎06-15-2017

I would track down the logs for container container_e14_14XXXXXXXXXXX_XXXXX_01_000001. That should contain more details on the actual error.

mbigelow · ‎06-15-2017

Check the logs for the Jhist role. The stderr log should have the exception or error that caused it to fail to start. The issue with the jobs is that on the worker nodes the yarn.nodemanager.local-dirs directories (it can be more than one) do not have enough space. Check your config, and the check the space on those directories on the worker nodes.

mbigelow · ‎06-15-2017

My issue was that I didn't have that placement rule in place. I created the placement rule to allow it to be specified at runtime and then my queries were assigned to the correct queue. Is that rule above (have a lower number) than the default pool rule? The other item to check is that the user you are running the queries from has access to submit to the pool.

mbigelow · ‎06-15-2017

Make sure that you have the placement rule to allow pools to be specified at runtime. This got me one time and I kept getting the messages that it was set but it would continue to run in the default queue. As for running it with -q. I haven't tried it but I imaged it would be similar to hive. impala-shell -i xxxx -q request_pool=new_pool; select...

mbigelow · ‎06-15-2017

Try adding the saslQop config to the connection configuration. The actual value will need to match your clusters Hive configuration. hive.connect('localhost', configuration={'hive.server2.thrift.sasl.qop': 'auth-conf})

Online	Offline
Last Visited	‎03-25-2019 05:55 PM

Member Since	‎08-16-2016 08:51 PM
Last Visited	‎03-25-2019 05:55 PM
Posts	642
Kudos received	129

Cloudera Community

Re: Configuring the HDFS superuser in Kerberos

Re: Hive process crash

Re: Upgrade from CDH 5.11 Express to Enterprise

Re: Adding user to Cloudera Manager using REST AP...

Re: Running in non-interactive mode, and data appe...

Re: HIVE script execution issue

Re: Unable to connect to kerberized Kafka 2.1 (0.1...

Re: How to use spark to load data into a Hive part...

Re: How to use spark to load data into a Hive part...

Re: MapReduce application failed with OutOfMemoryE...

Re: Application Failed for YARN exit code 12

Re: JobHistory Server Fail to start: Command abort...

Re: Running impala query with specified resosurce ...

Re: Running impala query with specified resosurce ...

Re: Hive sasl and python 3.5