About Harsh J

Harsh J · ‎01-28-2016

Assuming you are running CDH via CM (given you talk of Gateways), this shouldn't ideally happen on a new setup. I can think of a couple of reasons, but it depends on the mode of installation you are using. If you are using parcels, ensure that no /usr/lib/hadoop* directories exist anymore on the machine. Their existence may otherwise confuse the classpath-automating scripts into not finding all the relevant jars required for the "hdfs://" scheme service discovery. What are your outputs for the commands "hadoop classpath" and "ls -ld /opt/cloudera/parcels/CDH"?

Harsh J · ‎01-28-2016

Consult has already posted some alternative solutions, but its worth understanding why it fails. The effective issue is this: When you use 'fs -rm' with trash enabled, we move the file to the authenticated user's /user/{user.name}/.Trash sub-directory. For ex., if the path of deletion is '/data/myapp/part-00000.gz', and the user you delete it as is 'hive', then the trash feature moves it to directory '/user/hive/.Trash/Current/'. When encrypted zones come into play, HDFS disallows you from moving a file from one Encrypted Zone to another Encrypted Zone, as well as from within an Encrypted Zone to a non-Encrypted Zone. This is for security reasons, and ties into how the encryption zone features of HDFS are managed globally within a directory (zone), vs. arbitrary files holding all of the necessary info independently. So if /data/ is an EZ, but /user/hive is not, or is a separate EZ, then the trash moving will fail expectedly. But if / is the EZ, then the moves may work, since both paths come under it. What Consult proposes is a manual step (i.e. use hadoop fs -mv instead of hadoop fs -rm), and keep a manually created /data/.Trash directory to move the files into, followed by scripts to periodically clean it (i.e. Bring-Your-Own-Trash). Its not a great solution but its what may work if you need some data retention. Another option is to consider using limited and periodic snapshots (via BDR, etc.), which give you similar (but not exactly the same) data retention capabilities.

Harsh J · ‎01-25-2016

Regardless of an encryption zone in use or not, an 'hadoop fs -rm' with -skipTrash will permanently remove a file unless you have a snapshot referencing it. If you want to use the trash ability, you need to use 'hadoop fs -rm' without -skipTrash. Encryption zones merely create the blocks with encrypted data and associate keys with it. Other HDFS behaviour remains the same, with the exception being that you cannot move a file within one EZ to another, or move it outside of the EZ.

Harsh J · ‎01-24-2016

Could you post the entire exception and the stack trace, preferably over another community thread? This would be unrelated to the original issue as email actions are done directly from the server vs. via a launcher tasks.

Harsh J · ‎01-24-2016

You may be running into OOZIE-2380, which will be fixed in CDH 5.5.2 onwards. You can apply the below change in your oozie-site.xml to work around this until the bug-fix update arrives: <property> <name>oozie.action.launcher.mapreduce.job.ubertask.enable</name> <value>false</value> </property>

Harsh J · ‎01-23-2016

Any -D parameters applying to hadoop or hadoop-related CLI tools, must be prefixed rather than placed anywhere else. That is, use it in this form than the one you have (prefixed to the command, vs. suffixed/placed in middle): sqoop export -Dsqoop.export.records.per.statement=10000 -Dsqoop.export.statements.per.transaction=100 --direct --connect jdbc:oracle:thin:@scaj43bda01:1521:orcl --username bds --password bds --table orcl_dpi --export-dir /tmp/dpi --input-fields-terminated-by ',' --lines-terminated-by '\n' -m 70 --batch

Harsh J · ‎01-21-2016

Use the second one. See http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/ which explains the newer timestamped paths.

Harsh J · ‎01-15-2016

Could you also post the output of the below? ls -l /tmp/training/ Also try the below before you run your query: set hadoop.tmp.dir=.; And before you run the Hive CLI: export HADOOP_CLIENT_OPTS="-Djava.io.tmpdir=."

Harsh J · ‎01-15-2016

Yes, blocks are not pre-allocated. They are a logical division unit. Read https://wiki.apache.org/hadoop/FAQ#If_a_block_size_of_64MB_is_used_and_a_file_is_written_that_uses_less_than_64MB.2C_will_64MB_of_disk_space_be_consumed.3F

Harsh J · ‎01-11-2016

The hadoop-xz project (github) does not require you to rebuild your CDH. Just build that project and use the produced jar with the suggested config change (add "io.sensesecure.hadoop.xz.XZCodec" to io.compression.codecs).

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: No FileSystem for scheme: hdfs

Re: Is it possible to use trash in HDFS encryption...

Re: Is it possible to use trash in HDFS encryption...

Re: IllegalArgumentException java.net.URISyntaxExc...

Re: IllegalArgumentException java.net.URISyntaxExc...

Re: sqoop.export.records.per.statement parameter d...

Re: Oozie Shared Lib: where to place jars

Re: MKDirs failed to create file

Re: HDFS file and block size

Re: LZMA compression codec support