Member since
04-07-2016
10
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2023 | 05-04-2016 10:01 AM |
06-02-2016
10:14 AM
It was as simple as removing the /.snapshot from the path?! Thanks @Jitendra Yadav , clearly time for me to take a break...!
... View more
06-02-2016
09:57 AM
Hello All! I'm working as the snapshot directory owner, but hdfs dfs -deleteSnapshot /path/.snapshot s201605-17-115857.294 returns with deleteSnapshot: Modification on a read-only snapshot is disallowed Trying to change the permissions on the .snapshot directory also returns Modification...disallowed. What can I do?
... View more
Labels:
- Labels:
-
Apache Hadoop
05-19-2016
11:17 AM
Hi all, Odd question - I'm just starting out in Hadoop and am in the process of moving all my test work into production, however I get a strange message on the prod system when working in Hive: "number of reduce tasks is set to 0 since there's no reduce operator". The queries are not failing (yet...?), and there are no strange records in any logs I have looked at. I don't know how to troubleshoot this if indeed it is a problem at all. Any advice? Example: hive> select * from myTable where daily_date='2015-12-29' limit 10;
Query ID = root_20160519113838_73d2b4dc-efb8-4ea6-b0a4-cdc4dc64c33a
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1418226366907_2316, Tracking URL = http://hadoop-head01:8088/proxy/application_1418226366907_2316/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1418226366907_2316
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2016-05-19 11:39:07,038 Stage-1 map = 0%, reduce = 0%
2016-05-19 11:39:12,653 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.65 sec
MapReduce Total cumulative CPU time: 2 seconds 650 msec
Ended Job = job_1418226366907_2316
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.65 sec HDFS Read: 64722 HDFS Write: 831 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 650 msec
OK
[..... records ....]
Time taken: 15.876 seconds, Fetched: 10 row(s)
Hadoop version- 2.4.0.2.1.3.0-563 Hive version...(?) 0.13.0.2.1.3.0-563
... View more
Labels:
- Labels:
-
Apache Hive
05-04-2016
10:01 AM
Fixed!! The problem was the mixed single quotes (around the date format) and double quotes around the regexp_replace parameters. Although the two quote marks are in different statements, the overall statement needs to be consistent it seems. This works: regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),0,7),'-','') as month
Lesson: when embedded in bash scripts, be extra careful about symbol consistency!
I'm sorry <simple question shame>!
... View more
05-04-2016
09:43 AM
Hello @Predrag Minovic and thank you for your reply! Apologies, I should have been more clear. This expression does indeed work when called directly in Hive from the command line. Because this will be an automated process, what I am doing is running a bash shell script on the cmd, which calls Hive like this: #!/bin/bash
. /root/.bash_profile
# local file naming, cleaning, and other admin stuff
hive -e "
use my_db;
SELECT daily_date,
regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),1,7),"-","") as month
FROM staging_table;
quit;
" The error is: FAILED: ParseException line [regexp:line] cannot recognize input near ',' ')' 'as' in expression specification This error is also generated when the / symbols in the date format are escaped (i.e. dd\/MM\/yyyy). Unfortunately I can't do anything about that nasty date format 01/01/2011 😞
... View more
05-03-2016
04:03 PM
Hello Hadoopers! I'm testing some stuff in the Sandbox, and I've come up against this problem. The regexp below works fine when run in the Hive view GUI in Ambari, but when run from a Hive shell in the command line, it fails with error: "....cannot recognize input near ',' ')' 'as' in expression specification" in the regexp line. INSERT OVERWRITE TABLE my_table
SELECT trim(daily_date),
trim(name),
cast(trim(count) AS INT),
regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd/MM/yyyy')),0,7),"-","") as month
FROM staging_table
; Have tried combinations of backticks and single quote marks within the expression - nothing works. But the problem is why Ambari/Hive view would process without error (and with correct output!) but command line Hive throws up this error. Suspect it's to do with the odd characters in the expression, and how the shell treats them combined with the Hive environment... But I thought that all commands run within the Hive shell would be treated as Hive-QL and therefore wouldn't have the escaping issues that the command line does... Puzzled, and completely stuck. Any help troubleshooting this would be very gratefully received!
... View more
Labels:
- Labels:
-
Apache Hive
04-22-2016
04:24 PM
Thanks Benjamin, great information. About option a) - in this case a new (and query-able) meta-column called daily_date in the nice format would be created in the final table, wouldn't it? [Edit:just done it, yes it is] To make this work as an automated process where hive -e is called in a shell script, I would just need to set the new daily_date as a variable somewhere before the hive call (I think). Then: INSERT OVERWRITE TABLE final
PARTITION (daily_date=${nice_date})
SELECT facts, otherFact<exclude daily_date>
FROM staging
;
Should work! Thanks again.
... View more
04-21-2016
02:35 PM
Hadoop experts! Say I have a table with: daily_date STRING fact1 STRING fact2 STRING fact_n STRING where daily_date is of format dd/MM/yyyy (oh horror!). I want to partition the data by date (either yyyyMMdd, or just yyyyMM), but the current date format is no good. How do I create a table (or multiple staging tables) to deal with this? At present my ingest script will load local data into an hdfs location, and into an un-partitioned hive staging table containing the daily_date column and the fact columns. From there I do: INSERT OVERWRITE TABLE {final table}
PARTITION (daily_date=${data_date})
SELECT fact1, fact_n
FROM {staging table}
WHERE daily_date={$data_date}
; and where ${data_date} is a variable defined as the daily_date in the incoming data (which only comes in one day at a time). But I can't get the partitioning bit to work because of the dodgy daily_date format. I wrote this to convert dd/MM/yyyy to yyyyMM, but don't know where to use it. regexp_replace(substring(from_unixtime(unix_timestamp(daily_date, 'dd/MM/yyyy')),0,7),"-","") I'm a bit stuck with this, any help would be very gratefully received. (next thing I'll try is to load data into raw un-partitioned table, select into a new table with a new column nice_date, then select into a third table which is partitioned on nice_date).
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
HDFS
04-07-2016
11:38 AM
4 Kudos
My ambari version: 2.2.1.0, problem not fixed in version 2.2. Not kerberised.
Page refresh doesn't solve. This works: ambari-server restart
... View more