About Harsh J

Harsh J · ‎08-14-2016

The error "Temporary failure in name resolution" comes out of the DNS lookup sub-system on your OS, and likely indicates a fault of some sort when accessing one or more of your nameservers (defined in /etc/resolv.conf). If this is a repeating yet intermittent problem, I'd recommend contacting the DNS maintainers to find out if there are maintenance events or other downtime related issues ongoing with their servers. You can also check your /var/log/messages or "dmesg" contents for more clues about this lower-env trouble. The RM and other alerts you see coming out as a result of this failure is an avalanche effect. The agent polls metrics and states from the roles it runs, by contacting their webserver end-points. Since that's failing to resolve (its really a local address, shouldn't have to go through DNS if your /etc/nsswitch.conf is setup right) the alert gets flagged too. Its worth also running a local nameservice caching daemon (Such as nscd, etc.) to help cushion such effects to a certain degree and also to prevent overloading the DNS with too many queries which could also cause this potentially.

Harsh J · ‎08-14-2016

No such sqoop tool: sqoop. See 'sqoop help'. Your Sqoop Action's commands should begin with just the "import" command, and not include "sqoop" as its first argument, i.e. it should look like this: <command>import --connect …</command> And not like this, which is how you've specified it: <command>sqoop import --connect …</command> The Sqoop Action of Oozie is documented with an example at http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SqoopActionExtension.html

Harsh J · ‎08-14-2016

I'd recommend not using versions for anything but. If the data you're looking to store via versions does not naturally age out (such as via TTL or via version limits) then its better to store as defined columns instead. Since your reads are going to be specific, going wider per row with growing # of columns shouldn't be a problem. Of course the key design could be thought about - but that depends on your primary read case. Scans allow you to grab specific time range slices easily so a TS-using key may be a good option too, but you'll need to think separately about serving profile information.

Harsh J · ‎08-14-2016

The whole support around Parquet is documented at http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_parquet.html Impala's support for Parquet is ahead of Hive at this moment, while https://issues.apache.org/jira/browse/HIVE-8950 will help it catch up in future. In Hive you will still need to manually specify a column, but you may alternatively create the table in Impala and use it then in Hive. Parquet's loader in Pig supports reading the schema off the file [1] [2], as does Spark's Parquet support [3]. None of the eco system approaches use an external schema file as was the case with Avro storages. [1] - https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/main/java/parquet/pig/ParquetLoader.java#L90-L95 [2] - https://github.com/Parquet/parquet-mr/blob/master/parquet-pig/src/test/java/parquet/pig/TestParquetLoader.java#L94-L97 [3] - http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files

Harsh J · ‎08-14-2016

Impala lets you create a Parquet table from an example data file but there's no separate schema file concept in the Parquet storage implementation today. The LIKE 'FILE' feature is described further at https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_ddl, after which if you want to evolve the schema you can read on at https://www.cloudera.com/documentation/enterprise/latest/topics/impala_parquet.html#parquet_schema_evolution

Harsh J · ‎08-14-2016

Have you placed all the requisite JDBC jars required to connect to Informix under /var/lib/sqoop/ on the host you are trying to invoke this command on?

Harsh J · ‎08-13-2016

The RHBase libraries are very dated and advise the use of Thrift 0.8: https://github.com/RevolutionAnalytics/RHadoop/wiki/Installing-RHadoop-on-RHEL#installing-rhbase The installation proceeds smoothly for me as per instructions if I use Thrift 0.8. Note that Thrift major version upgrades are not guaranteed to be compatible to its clients, so since the library uses the older version you will most likely need to stick to it to ensure it has everything it calls available in the library and set of installed headers. I installed and tested a simple table creation and existence check that worked OK against the 5.7 HBase Thrift Service; I didn't check it extensively beyond that.

Harsh J · ‎08-13-2016

The ExportSnapshot is an MR job, and as a result of that it will run across your NodeManager hosts. To provide its destination as a local filesystem URI, such as your file:///local_linux_fs_dir would only work if that passed path is visible with the same consistent content across all your cluster hosts. You can do this perhaps by mounting the same NFS across all hosts, and then using a controlled ExportSnapshot parallelism to write to them without overloading them (limit the # of maps to be low-enough). If that's not desirable, then you can also opt to run the MR job in local mode, which would still be parallel but limitedly so, by passing -Dmapreduce.framework.name=local to ExportSnapshot before any other option.

Harsh J · ‎08-12-2016

@jh070784 - Please read my previous response. Support right now is disabled for incompleteness of design/impl., not unavailable. API was added before it was later disabled, and calling it will yield an error as the source reads: https://github.com/cloudera/hadoop-common/blob/cdh5.5.1-release/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileContext.java#L1404-L1406 @michaelthoward - I am not aware of any major pick up of redoing symlink but if you're interested in history reading and contributing changes please start at https://issues.apache.org/jira/browse/HADOOP-10019 which is the parent JIRA listing all the problems faced with its design and current implementation.

Harsh J · ‎08-12-2016

You do not need to set these env-vars manually. They should be set automatically for you in CDH. The env-var, if you do choose to set it manually for some reason, must point to a singular directory of configs vs. multiple in a classpath style as it appears in your output. Can you retry with an unset HADOOP_CONF_DIR since its handled automatically and your value's causing it to not use the automatic location? Does /etc/hadoop/conf/ exist now with usable files under it?

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: RLError: <urlopen error [Errno -3] Temporary f...

Re: Sqoop with Oozie error

Re: HBase Table Design - Multiple "Time Series" Co...

Re: Parquet external schema

Re: Parquet external schema

Re: Sqoop to Informix connection issue

Re: Failed to install rhbase in Cloudera Quickstar...

Re: Hbase ExportSnapshot copy-to localFS(NFS)

Re: Is hdfs symbolic link available?

Re: command "yarn application" doesn't work. Getti...