Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13482 | 02-20-2018 12:33 PM | |
1531 | 02-19-2018 05:12 AM | |
1888 | 12-28-2017 06:13 AM | |
7187 | 09-28-2017 09:25 AM | |
12229 | 09-25-2017 11:19 AM |
05-12-2017
07:49 AM
@mqureshi @Neeraj Sabharwal @Jay SenSharma Could anyone help me on this please? Thanks in advance!
... View more
05-11-2017
03:59 PM
Hi @n c Hcatalog holds metadata of a table details like Schema, Index, Roles, Structure, Bucketing, Partitions keys, Columns, privileges, When the table was created, By whom it was created etc. But It doesn't contain any details about how many records are stored in each table.
... View more
05-11-2017
01:56 PM
Consider I have a File size of 150M and Im loading into a hive table. Block size in HDFS is 128MB. Now how the files will be present underneath the Hive?. I believe it will be splitted and loaded as 0000_0,0000_1 .,.etc. Why it is splitted into multiple chunks? Does each files represent the block size? Does the block size and mapred size has anything to do with file size in hive? If I alter the mapred size then will the file size underneath hive will be changed? Do we have any control over the no of files created in hive while loading? I understand through merge mapreduce job we will be able to reduce/increase it. Say I just need 10 files to be created and not more than that while loading a hive table. Is it even possible? Thanks in advance
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
05-11-2017
12:23 PM
@Ashnee Sharma Good article. Thanks for sharing!
... View more
05-11-2017
11:53 AM
1 Kudo
Hi @Venkatesan G For 40-50 Millions files block size is 256 MB which is twice of 128 MB. Naturally the no of blocks created for a file is decreased in turn the this details stored in the name node is also reduced. That's why only 24 GB is recommended. If you further increase the block size to higher value then the recommended size would still decrease. Its block size is indirectly proportional to the recommended size. I hope it would help.
... View more
05-10-2017
07:33 PM
1 Kudo
Hi @Ashnee Sharma As the log says that the provided credentials were unable to be identified. Causes may fall under any of the below:
The user set the $KRB5_CONFIG environment variable to something other than the default value of /etc/krb5.conf. The HDFS client will not source the $KRB5_CONFIG from the user's shell. /etc/gphd/hadoop/conf/hdfs-site.xml does not have the correct Kerberos configuration for the namenode HDFS principal. The DNS does not resolve the correct Fully Qualified Domain Name. The Kerberos client libs version do not match the server. Kerberos encryption defaults differ between the client and the KDC. Refer the link for the solution on how to overcome this. https://discuss.pivotal.io/hc/en-us/articles/202210763-The-Secure-HDFS-Error-No-valid-credentials-provided-Displays-when-Running-HDFS-DFS-or-Hadoop-FS Hope it help in solving your issue.
... View more
05-10-2017
07:23 PM
@Dinesh Chitlangia Sort and orderBy are same when spark is considered. It functions/works on the same way in spark. However in Hive or any other DB the function is quite different. If you want to know differences in hive then refer the below link https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy
... View more
05-09-2017
03:38 PM
@mÁRIO Rodrigues Yes even though if its expressed as .deflate its in ORC with compressed state. I thik you will be able to read the files through hive tables in Spark SQL but you cant use the underneath files in it as it is compressed. If you want to read the files then load the hive tables without any compression and then Spark can make use of that file underneath.
... View more
05-09-2017
01:17 PM
@Amol Kulkarni Yes that's a known one. Hash function in hive functions similar to hash algorithm or hash sort logic in data structures. Like modulo of odd number by 2 is always 1. The same way the two values provided by you results in having same hash value. In order to generate unique values make use of md5() function in hive to generate unique values. However I suggest not to this logic for generation of primary key for a table as the values out of md5() will be a total mess.
... View more
05-08-2017
06:28 AM
@mÁRIO Rodrigues Deflate is not a format. But if the file is in compressed state then the extension of your file in HDFS will be mentioned as .defalte. As you stated ORC performance better during loading the table. Parquet and Avro also serves its own purpose. When I have tested a table with 3 billion records the time taken for loading a hive table with specific format were ORC Avro Parquet. In ascending order of time taken. ORC being the least amount of time taken during loading. But if your file format is dynamic then its better to go with parquet/Avro.
... View more