Member since
06-28-2017
279
Posts
43
Kudos Received
24
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1957 | 12-24-2018 08:34 AM | |
5304 | 12-24-2018 08:21 AM | |
2159 | 08-23-2018 07:09 AM | |
9439 | 08-21-2018 05:50 PM | |
5054 | 08-20-2018 10:59 AM |
08-21-2018
05:59 AM
try /homedir/\*/inbox/\* as the filename or maybe better /homedir as the directory and \*/inbox/\* as the filename
... View more
08-20-2018
05:40 PM
1 Kudo
What have you configured as "docker.trusted.registries"? You will need to configure it according to your setup. In a typical setup you create your own docker registry and provide it as trusted. If you don't have already one, you can use this image to run a registry: https://hub.docker.com/_/registry/
... View more
08-20-2018
01:57 PM
If PK should be part of the column family depends, in most cases if it is just a sequential number without additional info you will not need it, it will be used as the rowkey in Hbase. And yes, you will have to list all columns, actually they don't have the same name, i.e. in hive definition it is just 'HJMPTS' while in Hbase it is 'cf:HJMPTS'. This is important, as you could now add a new column family, which could contain also a column HJMPTS. The column name without the columnfamily name isn't necessarily unique in Hbase. In your case it is as you have been migrating an Oracle table.
... View more
08-20-2018
12:12 PM
yes you can process with two listFiles processors. So in /homepath all subdirs belong to a customer? Or will there be subdirs not related to customers that you don't want to scan? And for all customers subdirs you want to scan/process the inbox subdir for new files? Assuming you have only customer dirs your dir pattern can be: /homedir/*/inbox
... View more
08-20-2018
10:59 AM
1 Kudo
Your column mapping is wrong, as stated in the error message. The list in the columns mapping must match your list of columns in the external table definition. You simply list all columns in the form "columnFamilyName:columnName". As you seem to have only one column family 'cf', and I assume the oracle columns have all been migrated into the same column name with the column family cf. Then you will need the mapping to be: "hbase.columns.mapping"="cf:HJMPTS, cf:CREATEDTS, cf:MODIFIEDDTS, ... , cf:PROPTS, cf:P_ISHOMEADDRESS"
... View more
08-20-2018
09:31 AM
you can use a regular expression for the filename in the listFile processor. So something like "/homepath/customer_[ABC]/*" should be possible. But you will need to have a pattern to determine customer dirs from other dirs that will match potential additional customers dirs as well.
... View more
08-16-2018
10:04 AM
it's defining a columnname in the filter condition. So in your case it means nothing else then column with the name Age.
... View more
08-15-2018
07:32 PM
have a look here: https://www.crackinghadoop.com/email-spark-dataset-html-format/ it focuses on sending datasets, but you should be able to strip the example to your need.
... View more
08-15-2018
10:00 AM
I am not completly sure, but I think i came across an information that the limit statement cause a repartioning, so you have significant performance impact by using it. Instead you should use TABLESAMPLE or rewrite the query if it is important which row you get (and not only the limitation)
... View more
08-15-2018
06:57 AM
Are you running the same query from both clients, connected to the same server (HIVE)? Dependent on the SQL you are running a table name in front of a column name can be required, so in case of different queries that might be the reason.
... View more