Member since
08-01-2016
14
Posts
1
Kudos Received
0
Solutions
03-30-2023
12:37 AM
Where can i get the client id?
... View more
02-01-2018
05:19 AM
You can check for the relevant issue in HIVE Jira
... View more
11-16-2018
01:06 PM
Article content updated to reflect new provenance implementation recommendation and change in JVM Garbage Collector recommendation.
... View more
08-28-2016
03:37 PM
1 Kudo
It sounds like things are working as expected. Please consider a few things that may not be clear to you with regards to the number of underlying files.
First, when you do a subsequent insert (or load) into a (non bucketed) table with existing data you will NOT merge the contents together into a single file. You can test this out by loading the same simple file of 10 or so rows multiple times. You'll see that on the 2nd and 3rd insert/load you will then have an identical 2nd and then 3rd file in the underlying hive table's hdfs folder. Second, for a new bucketed table that you add data to there is not really a guarantee that you will the number of files aligned to the number of buckets. With bucket hashing occurring on the clustered by field it is possible to have less files if the data doesn't align well. To see that in practice, create a table with 32 buckets and load a file with only 10 records into it. At most, you'll have 10 files (again, possibly fewer). Additionally, if the amount of data being added ends up having more data for a particular bucket that causes it to exceed the file block size, you'll actually get more than one file for that bucket. So... what is happening on subsequent inserts/loads is that you are just creating new files that align to how the new data is bucketed and they sit alongside the additional files that are already there. At this time, Hive can still benefit from bucketing by lining up more than one file per bucket to the joining table's bucketed data (yes, it may have multiple files, too, for that same bucket). If you want to get as few files as possible (just one for each bucket if all of a particular bucket's data fits within the block size) then you're right; you'll need to load the contents of this table into another table -- possibly using an ITAS or CTAS strategy.
... View more
09-29-2017
03:20 PM
Hi, Hanu V Can u please share the attributes example or flow file as an example to explain how extract text to assign the entire row as an attribute ??? I am searching for it from a while.
... View more
08-15-2016
05:56 PM
In that scenario there is always going to be something you have to set that is specific to the user. I think the best approach might be to use the REST API to change the value of GetFile's directory after the user imported the template and set it to the user's input directory.
... View more
08-25-2016
07:13 PM
It is not working for me. Can you let me know if i'm doing anything wrong?? test4 is a table partitioned on lname and is ORC format. the partition I'm trying to merge has just 2 small files. ALTER TABLE test4 PARTITION (lname='vr') CONCATENATE;
... View more