Member since
05-02-2017
360
Posts
65
Kudos Received
22
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
13382 | 02-20-2018 12:33 PM | |
1514 | 02-19-2018 05:12 AM | |
1864 | 12-28-2017 06:13 AM | |
7150 | 09-28-2017 09:25 AM | |
12192 | 09-25-2017 11:19 AM |
05-12-2017
05:24 PM
Number of mappers involved in a job= Number of input splits and number of input splits depends on your block size and file size .If file size is 256 mb and block size is 128mb it will involve 2mappers. @Bala Vignesh N V
... View more
05-11-2017
12:23 PM
@Ashnee Sharma Good article. Thanks for sharing!
... View more
05-05-2017
07:06 AM
@Vinay R Glad it helped you. If you think it solves your problem then please accept the answer.
... View more
04-20-2017
05:56 PM
Hi @Simran Kaur, There is no way that first column can be considered as column name. But if the structure changes the its better to load the data as AVRO or Parquet file in hive. Even if the structure changes there is no need for you to change the old data and new data can be inserted into the same hive table. Points to be noted: 1.External table has to be used 2.You might need a stage table before loading into External hive table which should be in avro/parquet format Steps: 1. Create external table with columns which you have as avro/parquet. 2. Load the csv into stage table and then load the stage data into external table hive table. 3. If the columns changes then drop the external table and re-create with additional fields. 4. Insert the new file by following steps 1-2 By this way there will not be any manually work needed to modify the existing data as avro by default will show 'null' for columns which are available in the table but not in the file. The only manually work is to drop and re-create the table ddl. Let me know if you needed any details. And if you feel it answers your question then please accept the answer
... View more
04-16-2017
04:52 PM
1 Kudo
You can use one of the following regexp_replace(s, "\\[\\d*\\]", "");
regexp_replace(s, "\\[.*\\]", ""); The former works only on digits inside the brackets, the latter on any text. Escapes are required because both square brackets ARE special characters in regular expressions. For example: hive> select regexp_replace("7 September 2015[456]", "\\[\\d*\\]", "");
7 September 2015
... View more
04-13-2017
03:05 PM
@Michael Young Thanks ! That worked like a charm. I still have no idea why it doesn't let me upload using the HDFS UI so if you know why then I would love to know.
... View more
04-18-2017
11:30 AM
This is a longer regex, assumed the log_entry meets 2 ip address displayed.
... View more
04-02-2017
01:56 PM
Thanks @Scott Shaw. Does it mean I have to update the metadata each time after I truncate the partition? Even if the metadata exists it should not display wrong results. In my case select distinct country from mytable should display only India.
... View more
04-03-2017
06:18 PM
Are these tables External Tables? In the case of external tables you would have manually clean the folders by removing the files and folders that are referenced by the table ( using hadoop fs -rm command)
... View more