Support Questions

sachin_32 · ‎03-01-2022

I have the Flow File which has Duplicate Column i want to pick the Column threw index Number,
is it possible to do with Query Record or any Processor

Note: Column Will change with every new Flow File coming

araujo · ‎03-02-2022

Here's the flow template for those who have older nifi versions

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

View solution in original post

araujo · ‎03-01-2022

Hi, @sachin_32 ,

I guess this is coming as a CSV file, right?

You can achieve what you want with the following approach:

Configure your CSV Reader to ignore and skip the header line (if any)
Configure your CSV Read to use the following schema:

{
  "type": "record",
  "name": "SensorReading",
  "namespace": "com.cloudera.example",
  "doc": "This is a sample sensor reading",
  "fields": [
    { "name": "c1", "type": "string" },
    { "name": "c2", "type": "string" },
    { "name": "c3", "type": "string" }
  ]
}

Ensure you use a schema with the exact number of columns that your input file has.

In your QueryRecord you can then refer to the columns as c1, c2, etc...:

select c1, c2, c3
from flowfile

Cheers,

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

sachin_32 · ‎03-01-2022

Thanks for your Suggestion but in this case i don't have any Exact Number of columns it will keep changing with incoming flow file it completely depends on the Flowfile And the scenario is i have few columns which can directly pick by giving the name of column but for some column which is coming more than one for that i need to setup like indexing and it's around 10-15 files which has this kind of issues so can you suggest for that ?

araujo · ‎03-01-2022

The number of columns in the schema doesn't actually need to be exact if you're happy to ignore the ones after the last one specified in the schema.

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

sachin_32 · ‎03-01-2022

ok

araujo · ‎03-02-2022

@sachin_32 ,

Here one different attempt. You can send your CSV flowfile to a ReplaceText processor with the following configuration:

The Search Value is the following regular expression:

(?s)^([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*),([^,\n]*)(.*$)

And the Replacement Value is:

$1,$2,$3,$4,col_a$6

Each capture group ([^,\n]*) will match the name of one column. If you want to keep the name of that column you just replace it with $x, where x is the position of the column.

If you want to replace the column with another name, e.g. col_a, you just type the name of the new column name in the replacement instead.

The last capture group (.*), will match the remaining of the first line. This way you don't need to match every single column, only the ones up to the position you want to replace.

As an example, for this input:

A,B,C,D,A
1,2,3,4,5
2,3,4,5,6

The above replacement will generate this output:

A,B,C,D,col_a
1,2,3,4,5
2,3,4,5,6

HTH,

André

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

sachin_32 · ‎03-02-2022

Hello @araujo Thank you so much for your Help
Last Question if I have my attribute like :-

INDEX

1,5,3,10
now what will be replacement value and in this case i have around 40 columns I want to rename only those which is present in my index Attribute and i want my column like
1,B,3,D,5---10,f,-- till 40
is there any way so that it don't depend on my all column name it's just replace the name as per the element in my INDEX attribute as it is and keep all columns without changing name??

araujo · ‎03-02-2022

@Sachin

Here's another attempt at this (hopefully the last one 🙂 😞

I created the attached example that gets a flowfile and aattribute INDEX as you described above.

It then uses an UpdateAttribute to convert the INDEX attribute into a FILTER that we can use in the QueryRecord processor.

The QueryRecord process uses a fixed schema that has 100 columns. It's ok if your CSV has less columns. If the CSV can have more than 100 columns you need to update the schema to the maximum of columns you expect to receive in any CSV.

The output is a flowfile with the exact columns that were specified in the INDEX attribute.

Hope this helps.

Cheers,

Andre

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

araujo · ‎03-02-2022

Here's the flow template for those who have older nifi versions

--
Was your question answered? Please take some time to click on "Accept as Solution" below this post.
If you find a reply useful, say thanks by clicking on the thumbs up button.

sachin_32 · ‎03-02-2022

Thanks for the Help 🙂

Cloudera Community

Support Questions

Pick Column Based on Index Number

Phoenix Index Basics - Part 1

How to pick number of executors , cores for each e...

Hive is rounding the number columns automatically.

Number of columns each table has in Hive

Art of Phoenix Secondary Indexes

Phoenix Index Lifecycle

Hbase indexing to Solr with HDP Search

How to change column Type in SparkSQL?

Indexing Oracle tables into Apache Solr

Solr Indexing the database tables :