About Shu_ashu

Shu_ashu · ‎06-11-2019

@jingyong zou The issue is with the flowfile format that is passed through the processor. As ConvertAvroToJson processor accepts only Avro format but i think you are passing Json format to the processor which is causing java.io.IOException: Not a data file error.

Shu_ashu · ‎06-11-2019

@Jean Paul Barddal TEZ initializes session when we run some hive job and based on the tez configurations we will have one YARN application initialized (as you can see in the Resource manager). The TEZ session will be same for all the queries that we are running until the session gets closed, In TEZ view we can see same application id for all the queries that we are running in Hive but DAG_ID will be different for all. In case of MapReduce we are not going to hold any of the resources like TEZ but we are going to run seperate applications for each query. - If the answer is helpful to resolve the issue, Login and Click on Accept button below to close this thread.This will help other community users to find answers quickly 🙂

Shu_ashu · ‎06-05-2019

@3nomis ConvertRecord processor is still running as we can see running threads as one at top right corner of the processor. If it is running for ever (or) you are not convinced with performance then you can try splitting the 2GB file into smaller chunks by using SplitRecord processor then use MergeRecord processor to merge them back by using Defragment as merge strategy.

Shu_ashu · ‎06-04-2019

@Haijin Li Try with this and this approaches and also to test out the regex serde functionality, Create new file using vi editor in shell and move it to HDFS directory and create table on top of this directory.

Shu_ashu · ‎06-04-2019

@Patrick Hochstenbach Yes,You can consider custom processor/custom script for this, to do this in easier way and add the attribute to the flowfile with array size.

Shu_ashu · ‎06-01-2019

@Haijin Li Use hive Regex serde and your matching regex will be (.{22})(.{1})(.{12})(.*) (.{22}) -> 1st capture group for 22 characters (.{1}) -> 2nd for 1 character (.{12}) -> 3rd for 12 characters (.*) -> 4th capture group matches for rest of the row.

Shu_ashu · ‎06-01-2019

@Prathamesh H Instead of select count(*) use select count(<column_name>) from my_table; this command will display number of rows in the hive-hbase table. - If the answer is helpful to resolve the issue, Click on Accept button below to close this thread.This will help other community users to find answers quickly :-).

Shu_ashu · ‎05-30-2019

@jingyong zou Could you once change the split strategy and then run you select statement. Set hive.exec.orc.split.strategy=="ETL"; The available options are "BI", "ETL" and "HYBRID".

Shu_ashu · ‎05-30-2019

@Patrick Hochstenbach You can use SplitJson processor to split the array($.*) into individual flowfile then split json processor adds fragment.count attribute to the flowfile which is array size. Use UpdateAttribute processor to change the attribute name. Use MergeContent processor to merge back the content using Defragment strategy.

Shu_ashu · ‎05-30-2019

@OS FetchFile processor accepts incoming connections but we need to pass fully qualified filename to this processor to fetch files from the directory. To get fully qualified filename we need to use ExecuteStreamCommand processor to run shell commands. Refer to this link i have given in detailed answer how to get fully qualified name in middle of the flow. Let us know if you have any issues..!!

Online	Offline
Last Visited	‎04-04-2021 06:38 PM

Member Since	‎06-08-2017 08:15 PM
Last Visited	‎04-04-2021 06:38 PM
Posts	1,049
Kudos received	516

Cloudera Community

Re: Get column values in comma separated value

Re: nifi Json data using routeonattributeto to spl...

Re: HIVE MANAGED TABLE

Re: CSV file with Duplicate Headers

Re: NIFI - SQL Server Lookup

Re: NIFI ConvertAvroToJSON java.io.IOException: No...

Re: Hive on Tez - Works but gets stuck on YARN

Re: Nifi Convert Json to Parquet

Re: Create a hive table upon a fixed-width log fil...

Re: NiFi: How to extract the size of a JSON array ...

Re: Create a hive table upon a fixed-width log fil...

Re: Count(*) not working on Hbase mapped Hive Tab...

Re: HDP3.1 HIVE bucketId out of range: -1 ？？？

Re: NiFi: How to extract the size of a JSON array ...

Re: Input Port to ListFile/GetFile/FetchFile Proce...