About Harsh J

Harsh J · ‎07-19-2015

We do almost all of our internal testing with EXT4, so we're pretty certain of its reliability and performance. At least a few years ago, XFS had numerous issues that impacted its use with Hadoop-style workloads. While I am sure the state has improved now if you're using more current versions, we haven't any formal XFS tuning recommendations to offer at the moment, and still recommend use of EXT4 which has been well tested all these years. Does a normal/default alloc size give you better performance? I am not sure if you should be setting a 128m alloc size.

Harsh J · ‎07-19-2015

The first byte value is used as the sign indicator, if the number requires more than one byte to encode with. In your specific example with 172 and -172, 1000|1111, or 8f, is used to indicate positive numbers 1000|0111, or 87, is used to indicate negative numbers It helps also if you look at the sources to answer such questions, i.e. at https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/WritableUtils.java#L271-L298 (serialise) and https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/WritableUtils.java#L307-L320 (deserialise). Also look at this: https://github.com/cloudera/hadoop-common/blob/cdh5.4.0-release/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/WritableUtils.java#L369-L371 Does this help?

Harsh J · ‎07-17-2015

Glad to hear the simpler approach worked. Please consider marking the post as the solution on the thread so others with the same issue may find a solved thread faster! Could I also know where the configuration was applied for the other users allowance? Did you place it into the NM yarn-site.xml?

Harsh J · ‎07-17-2015

The WARNing aims to act as a guideline to indicate that you may be facing hardware level trouble. Your values aren't excessively high to be extremely worried about, but to explain each of the two warnings: > Slow BlockReceiver write data to disk cost This is measured as the time taken to write to disk when a data packet comes in for a block write. Java-wise, its just the duration measurement behind an equivalent of "FileOutputStream.write(…)" call, which actually may not even target the disk in most setups, and go to the Linux buffer cache instead. It appears your writes to these disks seem too slow. I am not well versed with XFS, but we recommend use of EXT4. What is the meaning of "allocsize=128m" in your mount options BTW? Note that we do not write entire blocks at once into an open file, but write packets in a streaming manner (building towards the block size), and packet sizes range between 64k to 128k each. > Slow BlockReceiver write packet to mirror This measures the duration taken to write to the next DN over a regular TCP socket, and the time taken to flush the socket. We forward the same packet here (small sizes, like explained above). An increase in this typically indicates higher network latency, as Java-wise this is a pure SocketOutputStream.write() + SocketOutputStream.flush() cost. Hopefully these facts should help you tune your configuration better. Its likely to be a tuning issue than anything else, given the new hardware.

Harsh J · ‎07-17-2015

Are you using the Hive CLI or Beeline+HS2 for this? Have you tried setting the properties into the configuration file instead, does that work? The property appears to be set correctly, but the check is failing likely cause the default session configuration is checked for the transaction manager instance, and not the query configuration.

Harsh J · ‎07-16-2015

Sqoop will use Kite if parquet-file output format is mentioned. However, I am unable to reproduce your issue on a fresh 5.4.2 Oozie install (Sqoop action import done from a MySQL source, but Kite gets used if you specify --as-parquetfile, not dependent on the source). Was your CDH upgraded from another release, or is it a new cluster installation? Can you check if your Oozie system share-lib is perhaps using older kite jars? If it does look older, can you run the Oozie -> Actions -> Update ShareLib command from CM, and then retry?

Harsh J · ‎07-16-2015

For all of these steps, you can make use of CM's API and other extension features. The CM API allows you to use REST API (via curl cli commands, python or java programs, etc.) to set configuration, restart clusters, services or instances, and a lot more. This is documented with some examples at the CM API website: http://cloudera.github.io/cm_api/ For deploying custom jars easily, you can also consider writing a custom parcel (if you already use parcels and not RPM/DEB packages to run CDH in CM). Documentation on what parcels are and how to write a parcel is at https://github.com/cloudera/cm_ext/wiki/Parcels:-What-and-Why%3F, https://github.com/cloudera/cm_ext/wiki/The-parcel-format and https://github.com/cloudera/cm_ext/wiki/Building-a-parcel One example Oozie-related parcel I have in my personal repo is one that installs the extjs UI libraries in an automatic manner. You can reference it for building your own custom jar deployment parcel: https://github.com/QwertyManiac/extjs-parcel Does this help?

Harsh J · ‎07-16-2015

The share-lib format did change in CDH5. Could you take a look at http://blog.cloudera.com/blog/2014/05/how-to-use-the-sharelib-in-apache-oozie-cdh-5/ and let us know if it helps you resolve your issue?

Harsh J · ‎07-16-2015

Typically, users use output files on HDFS paths to share data or values between jobs. The capture-output size is limited cause we store it inside the Oozie DB before we transfer it over to the next action instance. The max size is therefore limited by the size RDBMS supports for CHAR/VARCHAR columns, 64k for MySQL for example.

Harsh J · ‎07-16-2015

> is not getting written to job.properties file The job.properties file is just for WF variable resolution and some Oozie submission control properties. They aren't used for actual job configuration, however. > Where should define parameter and its value in Hue UI for a workflow? Within your Action's configuration view, look for "Properties" to see a Key=Value like field. Enter your configuration properties here. If you are using new-API values, you may want to read the requirements also at https://cwiki.apache.org/confluence/display/OOZIE/Map+Reduce+Cookbook (search page for "new-api"). > Do I need to upload .jar in lib directory of the workflow and what is the need of that? You will need to upload the application and any non-hadoop dependencies, because Oozie runs a remote launcher job to run your classes, and the jars need to exist within the workflow lib directory for Oozie to ship them properly via distributed-cache. Does this help?

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: slow blockreceiver write data to disk

Re: Why HexaDecimal String of any VIntWritable has...

Re: Cannot pass value from Hive query output direc...

Re: slow blockreceiver write data to disk

Re: Hive update , delete and insert ERROR in cdh 5...

Re: Error Sqooping data with Oozie, java.lang.NoSu...

Re: Oozie custom action deployment on Cloudera 5.4...

Re: CDH 5.4.4 Oozie unable to run Scoop action - C...

Re: How to share data between different oozie acti...

Re: Passing Paramter to Oozie workflow