About rachel_wijsmull

rachel_wijsmull · ‎06-02-2016

It was as simple as removing the /.snapshot from the path?! Thanks @Jitendra Yadav , clearly time for me to take a break...!

rachel_wijsmull · ‎06-02-2016

Hello All! I'm working as the snapshot directory owner, but hdfs dfs -deleteSnapshot /path/.snapshot s201605-17-115857.294 returns with deleteSnapshot: Modification on a read-only snapshot is disallowed Trying to change the permissions on the .snapshot directory also returns Modification...disallowed. What can I do?

rachel_wijsmull · ‎05-19-2016

Hi all, Odd question - I'm just starting out in Hadoop and am in the process of moving all my test work into production, however I get a strange message on the prod system when working in Hive: "number of reduce tasks is set to 0 since there's no reduce operator". The queries are not failing (yet...?), and there are no strange records in any logs I have looked at. I don't know how to troubleshoot this if indeed it is a problem at all. Any advice? Example: hive> select * from myTable where daily_date='2015-12-29' limit 10; Query ID = root_20160519113838_73d2b4dc-efb8-4ea6-b0a4-cdc4dc64c33a Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1418226366907_2316, Tracking URL = http://hadoop-head01:8088/proxy/application_1418226366907_2316/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1418226366907_2316 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-05-19 11:39:07,038 Stage-1 map = 0%, reduce = 0% 2016-05-19 11:39:12,653 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.65 sec MapReduce Total cumulative CPU time: 2 seconds 650 msec Ended Job = job_1418226366907_2316 MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 2.65 sec HDFS Read: 64722 HDFS Write: 831 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 650 msec OK [..... records ....] Time taken: 15.876 seconds, Fetched: 10 row(s) Hadoop version- 2.4.0.2.1.3.0-563 Hive version...(?) 0.13.0.2.1.3.0-563

rachel_wijsmull · ‎05-04-2016

Fixed!! The problem was the mixed single quotes (around the date format) and double quotes around the regexp_replace parameters. Although the two quote marks are in different statements, the overall statement needs to be consistent it seems. This works: regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),0,7),'-','') as month Lesson: when embedded in bash scripts, be extra careful about symbol consistency! I'm sorry <simple question shame>!

rachel_wijsmull · ‎05-04-2016

Hello @Predrag Minovic and thank you for your reply! Apologies, I should have been more clear. This expression does indeed work when called directly in Hive from the command line. Because this will be an automated process, what I am doing is running a bash shell script on the cmd, which calls Hive like this: #!/bin/bash . /root/.bash_profile # local file naming, cleaning, and other admin stuff hive -e " use my_db; SELECT daily_date, regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd\/MM\/yyyy')),1,7),"-","") as month FROM staging_table; quit; " The error is: FAILED: ParseException line [regexp:line] cannot recognize input near ',' ')' 'as' in expression specification This error is also generated when the / symbols in the date format are escaped (i.e. dd\/MM\/yyyy). Unfortunately I can't do anything about that nasty date format 01/01/2011 😞

rachel_wijsmull · ‎05-03-2016

Hello Hadoopers! I'm testing some stuff in the Sandbox, and I've come up against this problem. The regexp below works fine when run in the Hive view GUI in Ambari, but when run from a Hive shell in the command line, it fails with error: "....cannot recognize input near ',' ')' 'as' in expression specification" in the regexp line. INSERT OVERWRITE TABLE my_table SELECT trim(daily_date), trim(name), cast(trim(count) AS INT), regexp_replace(substr(from_unixtime(unix_timestamp(trim(daily_date), 'dd/MM/yyyy')),0,7),"-","") as month FROM staging_table ; Have tried combinations of backticks and single quote marks within the expression - nothing works. But the problem is why Ambari/Hive view would process without error (and with correct output!) but command line Hive throws up this error. Suspect it's to do with the odd characters in the expression, and how the shell treats them combined with the Hive environment... But I thought that all commands run within the Hive shell would be treated as Hive-QL and therefore wouldn't have the escaping issues that the command line does... Puzzled, and completely stuck. Any help troubleshooting this would be very gratefully received!

rachel_wijsmull · ‎04-22-2016

Thanks Benjamin, great information. About option a) - in this case a new (and query-able) meta-column called daily_date in the nice format would be created in the final table, wouldn't it? [Edit:just done it, yes it is] To make this work as an automated process where hive -e is called in a shell script, I would just need to set the new daily_date as a variable somewhere before the hive call (I think). Then: INSERT OVERWRITE TABLE final PARTITION (daily_date=${nice_date}) SELECT facts, otherFact<exclude daily_date> FROM staging ; Should work! Thanks again.

rachel_wijsmull · ‎04-21-2016

Hadoop experts! Say I have a table with: daily_date STRING fact1 STRING fact2 STRING fact_n STRING where daily_date is of format dd/MM/yyyy (oh horror!). I want to partition the data by date (either yyyyMMdd, or just yyyyMM), but the current date format is no good. How do I create a table (or multiple staging tables) to deal with this? At present my ingest script will load local data into an hdfs location, and into an un-partitioned hive staging table containing the daily_date column and the fact columns. From there I do: INSERT OVERWRITE TABLE {final table} PARTITION (daily_date=${data_date}) SELECT fact1, fact_n FROM {staging table} WHERE daily_date={$data_date} ; and where ${data_date} is a variable defined as the daily_date in the incoming data (which only comes in one day at a time). But I can't get the partitioning bit to work because of the dodgy daily_date format. I wrote this to convert dd/MM/yyyy to yyyyMM, but don't know where to use it. regexp_replace(substring(from_unixtime(unix_timestamp(daily_date, 'dd/MM/yyyy')),0,7),"-","") I'm a bit stuck with this, any help would be very gratefully received. (next thing I'll try is to load data into raw un-partitioned table, select into a new table with a new column nice_date, then select into a third table which is partitioned on nice_date).

rachel_wijsmull · ‎04-07-2016

My ambari version: 2.2.1.0, problem not fixed in version 2.2. Not kerberised. Page refresh doesn't solve. This works: ambari-server restart

Online	Offline
Last Visited	‎11-21-2016 10:51 AM

Member Since	‎04-07-2016 08:45 AM
Last Visited	‎11-21-2016 10:51 AM
Posts	10
Kudos received	4

Cloudera Community

Re: RegExp problems (works in Ambari Hive view, fa...

Re: Can't delete snapshots - snapshot read only mo...

Can't delete snapshots - snapshot read only modifi...

"Number of reduce tasks is set to 0 since there's ...

Re: RegExp problems (works in Ambari Hive view, fa...

Re: RegExp problems (works in Ambari Hive view, fa...

RegExp problems (works in Ambari Hive view, fails ...

Re: Partitioning - no existing column suitable

Partitioning - no existing column suitable

Re: With Hive View what causes: "H100 Unable to su...