About ravi1

ravi1 · ‎07-07-2016

I don't see a reason for the first insert to be a text/uncompressed avro file. Using HCatalog, you can directly import from sqoop to hive table as ORC. That would save you a lot of space because of compression. Once the initial data import is in Hive as ORC, you can then still continue and transform this data as necessary. If the reason for writing as text is to access from Pig and MR, a HCatalog table also can be accessed from Pig/MR.

ravi1 · ‎07-07-2016

If you already have a running cluster, exporting blueprint from ambari, editing relevant entries and using that to create a DR cluster works. Another approach is to create your first cluster as well with a blueprint. This will ensure easier creation of second cluster. I don't know of any other custom tools that can create an almost identical cluster

ravi1 · ‎07-06-2016

A quick workaround is to disable ranger authorization from hive->authorization in ambari UI and restart hive. However, if ranger is part of your tests and want to keep it enabled, this is not the solution.

ravi1 · ‎07-06-2016

A quick workaround is to disable ranger authorization from hive->authorization in ambari UI and restart hive. However, if ranger is part of your tests and want to keep it enabled, this is not the solution.

ravi1 · ‎07-06-2016

Checkpointing is the process of merging editlogs with base fsimage. This will be stored in namenode metadata directories. Its not the same as editlog, since editlog has the changes that you make to HDFS.

ravi1 · ‎07-05-2016

@bganesan Now that https://issues.apache.org/jira/browse/RANGER-205 is fixed, can we use the rest API instead of DB script?

ravi1 · ‎07-01-2016

When you click on OVA to import, you will see Guest OS Type? What do you see there and has this been changed. You should see Red Hat (64-bit) there. By default you don't change this and if you change this, import is not going to work.

ravi1 · ‎07-01-2016

If you want to use the files as is, then yes. But do you have the file already split by dates? In that case, you will need to have the date column as both a column and a partition (with different names). But you may be better off reorganizing these files into ORC for better lookup speeds. If you want to do that you will create a second table as ORC and can do an insert overwrite.

ravi1 · ‎07-01-2016

Why do you need to include date information as a column? If you are creating a merge using Pig (or hive query), you can move the date field that is a column into a partition.

ravi1 · ‎07-01-2016

It is difficult to say what should be your PARTITION which this information. Best way to get to finding partition is from future query patterns. If you know there will be a where clause in most of the queries and the value is not high cardinality, then that could be your partition. If you think your queries mostly hit a date range, you could partition by date.

Online	Offline
Last Visited	‎12-18-2021 05:54 PM

Member Since	‎01-09-2019 05:01 PM
Last Visited	‎12-18-2021 05:54 PM
Posts	401
Kudos received	163

Cloudera Community

Re: 2 hosts not running master services

Re: ambari restart and service restart updating kr...

Re: How to automate sqoop incremental import using...

Re: Path to core-site.xml in sandbox?

Re: Curious to know why majority of the people are...

Re: Revisited : Import to Hive or HDFS ?

Re: How to automate the setup of DR cluster?

Re: In Sandbox, HDP 2.5, getting a Hive Cli error

Re: Sandbox HDP 2.5 TP: error when running sqoop, ...

Re: 1.WHAT IS CHECKPOINTS IN HADOOP 2.WHERE IT WIL...

Re: How to Remove all External Users from the Rang...

Re: Error while trying to install hdp 2.4 sandbox

Re: Merge multiple directories into one table in H...

Re: Merge multiple directories into one table in H...

Re: Merge multiple directories into one table in H...