About arald

arald · ‎08-21-2018

try /homedir/\*/inbox/\* as the filename or maybe better /homedir as the directory and \*/inbox/\* as the filename

arald · ‎08-20-2018

What have you configured as "docker.trusted.registries"? You will need to configure it according to your setup. In a typical setup you create your own docker registry and provide it as trusted. If you don't have already one, you can use this image to run a registry: https://hub.docker.com/_/registry/

arald · ‎08-20-2018

If PK should be part of the column family depends, in most cases if it is just a sequential number without additional info you will not need it, it will be used as the rowkey in Hbase. And yes, you will have to list all columns, actually they don't have the same name, i.e. in hive definition it is just 'HJMPTS' while in Hbase it is 'cf:HJMPTS'. This is important, as you could now add a new column family, which could contain also a column HJMPTS. The column name without the columnfamily name isn't necessarily unique in Hbase. In your case it is as you have been migrating an Oracle table.

arald · ‎08-20-2018

yes you can process with two listFiles processors. So in /homepath all subdirs belong to a customer? Or will there be subdirs not related to customers that you don't want to scan? And for all customers subdirs you want to scan/process the inbox subdir for new files? Assuming you have only customer dirs your dir pattern can be: /homedir/*/inbox

arald · ‎08-20-2018

Your column mapping is wrong, as stated in the error message. The list in the columns mapping must match your list of columns in the external table definition. You simply list all columns in the form "columnFamilyName:columnName". As you seem to have only one column family 'cf', and I assume the oracle columns have all been migrated into the same column name with the column family cf. Then you will need the mapping to be: "hbase.columns.mapping"="cf:HJMPTS, cf:CREATEDTS, cf:MODIFIEDDTS, ... , cf:PROPTS, cf:P_ISHOMEADDRESS"

arald · ‎08-20-2018

you can use a regular expression for the filename in the listFile processor. So something like "/homepath/customer_[ABC]/*" should be possible. But you will need to have a pattern to determine customer dirs from other dirs that will match potential additional customers dirs as well.

arald · ‎08-16-2018

it's defining a columnname in the filter condition. So in your case it means nothing else then column with the name Age.

arald · ‎08-15-2018

have a look here: https://www.crackinghadoop.com/email-spark-dataset-html-format/ it focuses on sending datasets, but you should be able to strip the example to your need.

arald · ‎08-15-2018

I am not completly sure, but I think i came across an information that the limit statement cause a repartioning, so you have significant performance impact by using it. Instead you should use TABLESAMPLE or rewrite the query if it is important which row you get (and not only the limitation)

arald · ‎08-15-2018

Are you running the same query from both clients, connected to the same server (HIVE)? Dependent on the SQL you are running a table name in front of a column name can be required, so in case of different queries that might be the reason.

Online	Offline
Last Visited	‎08-19-2019 03:23 AM

Member Since	‎06-28-2017 06:04 AM
Last Visited	‎08-19-2019 03:23 AM
Posts	279
Kudos received	43

Cloudera Community

Re: secured nifi cluster must import a cert to bro...

Re: Nifi Epoch conversion not working?

Re: Scenario when we store data in HBase and acce...

Re: Setup environment variables in NiFi cluster se...

Re: CREATE EXTERNAL HIVE TABLE on existing HBASE T...

Re: how to read multiple sub folders files thru li...

Re: Not able to run docker container on yarn even ...

Re: CREATE EXTERNAL HIVE TABLE on existing HBASE T...

Re: how to read multiple sub folders files thru li...

Re: CREATE EXTERNAL HIVE TABLE on existing HBASE T...

Re: how to read multiple sub folders files thru li...

Re: What is the significance of $ in this filter s...

Re: need help on sending an email using spark scal...

Re: SparkSQL: Hive sub-query leads to full table s...

Re: Remove qualifier from column name