About zack_riesland

zack_riesland · ‎11-18-2016

Thanks, @bpreachuk Interestingly, I think we found a major Hive bug. When I run a query like the one above, it appears to enter an infinite loop. I even pared my data set down to 3 columns - counting 3 tables. And it appears to spawn MR jobs indefinitely. I worked around it. But someone should look at the UNION operation. Happy Friday!

zack_riesland · ‎11-16-2016

I am trying to accomplish something like this: insert into table daily_counts values ( select count(*) from table_a, select count(*) from table_b ) ...etc. I know the syntax works for known values. How can I enter the result of a sub-query? I currently accomplish this in java via jdbc, by running all the queries individually, and then parsing together the final insert statement. But it MUST be possible with HiveQL, right?

zack_riesland · ‎11-11-2016

Oh. Also need this in HDFS configs: fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem

zack_riesland · ‎11-11-2016

I figured it out - I needed to add fs.s3a.access.key and fs.s3a.secret.key values to my HDFS config in Ambari. I already had fs.s3.awsAccessKeyId and fs.s3.awsSecretKeyId, but those are just for s3:// urls, apparently. So I had to do the following to get distcp to work on HDP 2.4.2: Add aws-java-sdk-s3-1.10.62.jar to hadoop/lib on the node running the command Add hadoop/lib* to the classpath for MapReduce and Yarn Add fs.s3a.access.key and fs.s3a.secret.key properties to HDFS config in Ambari.

zack_riesland · ‎11-11-2016

Thanks @Rajesh Balamohan I see that I only had aws-java-sdk-s3*.jar under /usr/hdp/current/zeppelin/lib/lib, so I copied it to /usr/hdp/current/hadoop/lib and /usr/hdp/current/hadoop-mapreduce/lib, but when I try to run with the -Dfs.s3a.impl argument, I get the error below. I have the proper AWS credentials in my config and I don't have credential-related issues if I try a s3n: URL, so I think this is really an issue finding the right jars. Do I need to add that jar to a path somewhere? Any ideas? 16/11/11 06:25:41 ERROR tools.DistCp: Invalid arguments: com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521) at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031) at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994) at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:228) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295) at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216) at org.apache.hadoop.tools.DistCp.run(DistCp.java:116) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.tools.DistCp.main(DistCp.java:430) Invalid arguments: Unable to load AWS credentials from any provider in the chain

zack_riesland · ‎11-10-2016

I'm trying to use distcp to copy data to an S3 bucket, and experiencing nothing but pain. I've tried something like this: sudo -u hdfs hadoop distcp -Dhadoop.root.logger="DEBUG,console" -Dmapreduce.job.maxtaskfailures.per.tracker=1 -bandwidth 10 -i -log /user/hdfs/s3_staging/logging/distcp.log hdfs:///apps/hive/warehouse/my_db/my_table s3n://my_bucket/my_path But I encounter this error: http://stackoverflow.com/questions/37868404/distcp-from-hadoop-to-s3-fails-with-no-space-available-in-any-of-the-local-dire From what I've read, I might have more luck trying s3a instead of s3n, but when I try the same command above using "s3a" in the URL, I get this error: "No FileSystem for scheme: S3a" Can someone please give me some insight to get this working with either file system

zack_riesland · ‎11-01-2016

Perfect. Thanks!

zack_riesland · ‎11-01-2016

Thanks @Sunile Manjee I have actually re-written a lot of this with nifi, and it certainly provides a lot of flexibility. But for now, I also have to maintain this older, script-based version of this process. Any ideas about how to accomplish this with scripts?

zack_riesland · ‎11-01-2016

I have a shell script that calls a series of hive .sql scripts. If a hive operation fails, I want the shell script to stop and exit. How can I accomplish this? Does "hive -f" have return codes that I can check in bash?

zack_riesland · ‎10-26-2016

When I enter "0 0 18 * * * ?", I get this error message: "Scheduling Period '0 0 18 * * * ?' is not a valid cron expression: '?' can only be specified for Day-of-Month or Day-of-Week" However, "0 0 18 * * ? *" seems to work. What is the difference between the meaning of "*" and "?" ?

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: How to insert individual rows into hive based ...

How to insert individual rows into hive based on s...

Re: How to use s3a with HDP

Re: How to use s3a with HDP

Re: How to use s3a with HDP

How to use s3a with HDP

Re: How to determine whether a hive script fails?

Re: How to determine whether a hive script fails?

How to determine whether a hive script fails?

Re: Helping setting up cron-based nifi processor