About 2hanu_valluri

AjoyJones · ‎03-30-2023

Where can i get the client id?

nramanaiah · ‎02-01-2018

You can check for the relevant issue in HIVE Jira

MattWho · ‎11-16-2018

Article content updated to reflect new provenance implementation recommendation and change in JVM Garbage Collector recommendation.

LesterMartin · ‎08-28-2016

It sounds like things are working as expected. Please consider a few things that may not be clear to you with regards to the number of underlying files. First, when you do a subsequent insert (or load) into a (non bucketed) table with existing data you will NOT merge the contents together into a single file. You can test this out by loading the same simple file of 10 or so rows multiple times. You'll see that on the 2nd and 3rd insert/load you will then have an identical 2nd and then 3rd file in the underlying hive table's hdfs folder. Second, for a new bucketed table that you add data to there is not really a guarantee that you will the number of files aligned to the number of buckets. With bucket hashing occurring on the clustered by field it is possible to have less files if the data doesn't align well. To see that in practice, create a table with 32 buckets and load a file with only 10 records into it. At most, you'll have 10 files (again, possibly fewer). Additionally, if the amount of data being added ends up having more data for a particular bucket that causes it to exceed the file block size, you'll actually get more than one file for that bucket. So... what is happening on subsequent inserts/loads is that you are just creating new files that align to how the new data is bucketed and they sit alongside the additional files that are already there. At this time, Hive can still benefit from bucketing by lining up more than one file per bucket to the joining table's bucketed data (yes, it may have multiple files, too, for that same bucket). If you want to get as few files as possible (just one for each bucket if all of a particular bucket's data fits within the block size) then you're right; you'll need to load the contents of this table into another table -- possibly using an ITAS or CTAS strategy.

taresh_soni · ‎09-29-2017

Hi, Hanu V Can u please share the attributes example or flow file as an example to explain how extract text to assign the entire row as an attribute ??? I am searching for it from a while.

bbende · ‎08-15-2016

In that scenario there is always going to be something you have to set that is specific to the user. I think the best approach might be to use the REST API to change the value of GetFile's directory after the user imported the template and set it to the user's input directory.

2hanu_valluri · ‎08-25-2016

It is not working for me. Can you let me know if i'm doing anything wrong?? test4 is a table partitioned on lname and is ORC format. the partition I'm trying to merge has just 2 small files. ALTER TABLE test4 PARTITION (lname='vr') CONCATENATE;

Online	Offline
Last Visited	‎05-23-2018 05:27 AM

Member Since	‎08-01-2016 06:29 PM
Last Visited	‎05-23-2018 05:27 AM
Posts	14
Kudos received	1

Cloudera Community

Re: Use of clientId parameter in REST API calls

Re: Hive in built function greatest is not workin...

Re: HDF/NIFI Best practices for setting up a high ...

Re: Hive bucketing is not working as expected in c...

Re: how to apply NiFi expression language on flow ...

Re: How to dynamically supply directory attribute ...

Re: Hive's "alter table partition concatenate" no...