Member since
06-08-2017
1049
Posts
518
Kudos Received
312
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 11217 | 04-15-2020 05:01 PM | |
| 7117 | 10-15-2019 08:12 PM | |
| 3103 | 10-12-2019 08:29 PM | |
| 11468 | 09-21-2019 10:04 AM | |
| 4331 | 09-19-2019 07:11 AM |
06-11-2019
04:26 AM
@jingyong zou The issue is with the flowfile format that is passed through the processor. As ConvertAvroToJson processor accepts only Avro format but i think you are passing Json format to the processor which is causing java.io.IOException: Not a data file error.
... View more
06-11-2019
04:08 AM
@Jean Paul Barddal TEZ initializes session when we run some hive job and based on the tez configurations we will have one YARN application initialized (as you can see in the Resource manager). The TEZ session will be same for all the queries that we are running until the session gets closed, In TEZ view we can see same application id for all the queries that we are running in Hive but DAG_ID will be different for all. In case of MapReduce we are not going to hold any of the resources like TEZ but we are going to run seperate applications for each query. - If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂
... View more
06-05-2019
03:07 AM
@3nomis ConvertRecord processor is still running as we can see running threads as one at top right corner of the processor. If it is running for ever (or) you are not convinced with performance then you can try splitting the 2GB file into smaller chunks by using SplitRecord processor then use MergeRecord processor to merge them back by using Defragment as merge strategy.
... View more
06-04-2019
03:04 AM
@Haijin Li Try with this and this approaches and also to test out the regex serde functionality, Create new file using vi editor in shell and move it to HDFS directory and create table on top of this directory.
... View more
06-04-2019
02:58 AM
@Patrick Hochstenbach Yes,You can consider custom processor/custom script for this, to do this in easier way and add the attribute to the flowfile with array size.
... View more
06-01-2019
03:57 PM
@Haijin Li Use hive Regex serde and your matching regex will be (.{22})(.{1})(.{12})(.*) (.{22}) -> 1st capture group for 22 characters (.{1}) -> 2nd for 1 character (.{12}) -> 3rd for 12 characters (.*) -> 4th capture group matches for rest of the row.
... View more
06-01-2019
03:42 PM
@Prathamesh H Instead of select count(*) use select count(<column_name>) from my_table; this command will display number of rows in the hive-hbase table. - If the answer is helpful to resolve the issue, Click on Accept button below to close this thread.This will help other community users to find answers quickly :-).
... View more
05-30-2019
02:11 AM
@jingyong zou Could you once change the split strategy and then run you select statement. Set hive.exec.orc.split.strategy=="ETL"; The available options are "BI", "ETL" and "HYBRID".
... View more
05-30-2019
01:59 AM
@Patrick Hochstenbach You can use SplitJson processor to split the array($.*) into individual flowfile then split json processor adds fragment.count attribute to the flowfile which is array size. Use UpdateAttribute processor to change the attribute name. Use MergeContent processor to merge back the content using Defragment strategy.
... View more
05-30-2019
01:41 AM
@OS FetchFile processor accepts incoming connections but we need to pass fully qualified filename to this processor to fetch files from the directory. To get fully qualified filename we need to use ExecuteStreamCommand processor to run shell commands. Refer to this link i have given in detailed answer how to get fully qualified name in middle of the flow. Let us know if you have any issues..!!
... View more