Member since
02-04-2016
189
Posts
70
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4517 | 07-12-2018 01:58 PM | |
9651 | 03-08-2018 10:44 AM | |
4840 | 06-24-2017 11:18 AM | |
25501 | 02-10-2017 04:54 PM | |
2745 | 01-19-2017 01:41 PM |
11-18-2016
12:04 PM
Thanks, @bpreachuk Interestingly, I think we found a major Hive bug. When I run a query like the one above, it appears to enter an infinite loop. I even pared my data set down to 3 columns - counting 3 tables. And it appears to spawn MR jobs indefinitely. I worked around it. But someone should look at the UNION operation. Happy Friday!
... View more
11-16-2016
06:56 PM
I am trying to accomplish something like this: insert into table daily_counts values ( select count(*) from table_a, select count(*) from table_b ) ...etc. I know the syntax works for known values. How can I enter the result of a sub-query? I currently accomplish this in java via jdbc, by running all the queries individually, and then parsing together the final insert statement. But it MUST be possible with HiveQL, right?
... View more
Labels:
- Labels:
-
Apache Hive
11-11-2016
01:13 PM
Oh. Also need this in HDFS configs: fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem
... View more
11-11-2016
01:11 PM
I figured it out - I needed to add fs.s3a.access.key and fs.s3a.secret.key values to my HDFS config in Ambari. I already had fs.s3.awsAccessKeyId and fs.s3.awsSecretKeyId, but those are just for s3:// urls, apparently. So I had to do the following to get distcp to work on HDP 2.4.2: Add aws-java-sdk-s3-1.10.62.jar to hadoop/lib on the node running the command Add hadoop/lib* to the classpath for MapReduce and Yarn Add fs.s3a.access.key and fs.s3a.secret.key properties to HDFS config in Ambari.
... View more
11-11-2016
11:29 AM
Thanks @Rajesh Balamohan I see that I only had aws-java-sdk-s3*.jar under /usr/hdp/current/zeppelin/lib/lib, so I copied it to /usr/hdp/current/hadoop/lib and /usr/hdp/current/hadoop-mapreduce/lib, but when I try to run with the -Dfs.s3a.impl argument, I get the error below. I have the proper AWS credentials in my config and I don't have credential-related issues if I try a s3n: URL, so I think this is really an issue finding the right jars. Do I need to add that jar to a path somewhere? Any ideas? 16/11/11 06:25:41 ERROR tools.DistCp: Invalid arguments:
com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:228)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.tools.DistCp.setTargetPathExists(DistCp.java:216)
at org.apache.hadoop.tools.DistCp.run(DistCp.java:116)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.tools.DistCp.main(DistCp.java:430)
Invalid arguments: Unable to load AWS credentials from any provider in the chain
... View more
11-10-2016
09:07 PM
I'm trying to use distcp to copy data to an S3 bucket, and experiencing nothing but pain.
I've tried something like this: sudo -u hdfs hadoop distcp -Dhadoop.root.logger="DEBUG,console" -Dmapreduce.job.maxtaskfailures.per.tracker=1 -bandwidth 10 -i -log /user/hdfs/s3_staging/logging/distcp.log hdfs:///apps/hive/warehouse/my_db/my_table s3n://my_bucket/my_path But I encounter this error: http://stackoverflow.com/questions/37868404/distcp-from-hadoop-to-s3-fails-with-no-space-available-in-any-of-the-local-dire From what I've read, I might have more luck trying s3a instead of s3n, but when I try the same command above using "s3a" in the URL, I get this error: "No FileSystem for scheme: S3a" Can someone please give me some insight to get this working with either file system
... View more
Labels:
- Labels:
-
Apache Hadoop
11-01-2016
02:36 PM
Perfect. Thanks!
... View more
11-01-2016
02:15 PM
Thanks @Sunile Manjee I have actually re-written a lot of this with nifi, and it certainly provides a lot of flexibility. But for now, I also have to maintain this older, script-based version of this process. Any ideas about how to accomplish this with scripts?
... View more
11-01-2016
01:38 PM
I have a shell script that calls a series of hive .sql scripts. If a hive operation fails, I want the shell script to stop and exit. How can I accomplish this? Does "hive -f" have return codes that I can check in bash?
... View more
Labels:
- Labels:
-
Apache Hive
10-26-2016
03:42 PM
When I enter "0 0 18 * * * ?", I get this error message:
"Scheduling Period '0 0 18 * * * ?' is not a valid cron expression: '?' can only be specified for Day-of-Month or Day-of-Week" However, "0 0 18 * * ? *" seems to work. What is the difference between the meaning of "*" and "?" ?
... View more