About mqureshi

mqureshi · ‎08-25-2016

@Sami Ahmad Can you try distcp2 instead? hadoop distcp2 hdfs:///user/sami/ hdfs:///user/zhang

mqureshi · ‎08-25-2016

Then can you please share more from oozie logs? /var/log/oozie?

mqureshi · ‎08-25-2016

@Michel Sumbul When you talk about encryption in HBase, you Encrypt HFile and WAL. You cannot encrypt only some columns and not others. When you encrypt the HFile, your cells are encrypted. Please check the following link on how to implement this. https://hbase.apache.org/book.html#hbase.encryption.server You can also create HDFS level encryption zone for /hbase directory and your data will be encrypted. Please check the following link https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/hbase-with-hdfs-encr.html

mqureshi · ‎08-24-2016

When creating table use the following: TBLPROPERTIES ('serialization.null.format'='') Then do INSERT INTO table_name (col1,col3, col5) select * from csvtable . Check the following. This should just work. https://community.hortonworks.com/questions/1216/techniques-for-dealing-with-malformed-data-hive.html

mqureshi · ‎08-24-2016

@Brandon Wilson Theoretically, there isn't a limit on number of snapshots but, like everything there is a price to pay. Snapshot as you know only captures metadata information at point in time. Now imagine you created one snapshot every minute (taking an extreme example to explain what will happen). Two hours later you have 120 snapshots. hfiles are immutable. Guess what happens. The moment snapshot is taken, snapshot will contain a reference to hfiles at that point in time. HBase snapshot doesn't make any copies of data. That only happens when you are restoring from snapshot. But what do you think happens when a compaction or deletion is triggered? If snapshot has reference to those immutable hfiles, then they are moved to an archiving folder. They are not really deleted. Because you might decide to restore from that snapshot. If you have lots of compactions and updates then each snapshot might be pointing to different hfiles. This means, your snapshots will affect your storage. So, you do not have a theoretical limit on number of snapshots But, snapshots if used aggressively are not entirely free of cost. You might end up using a significant amount of storage. So, doing that 30 days, 6 months or a year will require significant storage overhead.

mqureshi · ‎08-24-2016

@Gaurab D Do you have Oracle drive in Sqoop directory? Check /usr/hdp/current/sqoop-client/lib/. If it's not there, pease put it here in classpath.

mqureshi · ‎08-24-2016

so you have two clusters on same node? Is it possible that two clusters have different block size settings? Can you please verify dfs.blocksize setting on both clusters?

mqureshi · ‎08-24-2016

@Hans Feldmann One way I parsed my json was to convert it to Avro. So basically, after getting rid of special characters from json using "replaceText" processor, I sent it to "inferAvroSchema". Then used convertJsonToAvro using the inferred schema and then wrote that to HDFS where I had a table and read it in Hive. Another way is to use Json hive Serde. That's actually much easier. Check this out. https://github.com/rcongiu/Hive-JSON-Serde

mqureshi · ‎08-24-2016

This is a connection issue. Can you connect to the cluster from command line from this machine?

mqureshi · ‎08-24-2016

So there is no credential cache. You need to do a kinit first. Then you should run klist -A. Then it will show you the credential cache. It should work.

Online	Offline
Last Visited	‎10-31-2017 03:17 AM

Member Since	‎06-07-2016 09:05 AM
Last Visited	‎10-31-2017 03:17 AM
Posts	923
Kudos received	310

Cloudera Community

Re: YARN recommended configuration

Re: How to resolve for NULL values when they are c...

Re: Why is spark has better speed than Hadoop

Re: Is it possible to assign Hadoop queues to Hado...

Re: Kafka NiFi HDF Installation

Re: hadoop distcp command failing

Re: Oozie sqoop action Error

Re: Hbase Encryption of the cell content and encry...

Re: Is there is any workaround to map csv columns ...

Re: Is there a limit to the number of snapshots I ...

Re: Oozie sqoop action Error

Re: hadoop distcp command failing

Re: JSON to SQL

Re: Java is unable to read the Kerberos credential...

Re: Java is unable to read the Kerberos credential...