Member since
03-03-2017
74
Posts
9
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2577 | 06-13-2018 12:02 PM | |
4634 | 11-28-2017 10:32 AM |
09-14-2017
09:06 AM
The processor PublishKafka does not support those two additional properties.
... View more
09-13-2017
11:54 AM
Hi I am trying to create a kafka message queue uswing nifi and the publishkafka processor we are runing kerberos security on our cluster it times out ( se my log ) I have following settings in my processor Kafka Brokers sktudv01hdp01.ccta.dk:2181,sktudv01hdp03.ccta.dk:2181,sktudv01hdp02.ccta.dk:2181
Security Protocol PLAINTEXT Kerberos Service Name kafka/_HOST@CCTA.DK
SSL Context Service No value
set Topic Name simonkafka Delivery Guarantee Best Effort
Kafka Key No value set Key Attribute Encoding UTF-8 Encoded In my ambari kafka config i have following Kafka Broker host sktudv01hdp03.ccta.dk
zookeeper.connect
sktudv01hdp01.ccta.dk:2181,sktudv01hdp03.ccta.dk:2181,sktudv01hdp02.ccta.dk:2181
listeners
PLAINTEXT://localhost:6667 the kafka service seems to run without problems on my hadoop environment My nifi log [w20960@sktudv01hdf01 nifi]$ tail -f nifi-app.log | grep kafka
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
sasl.kerberos.service.name = kafka/_HOST@CCTA.DK
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
2017-09-13 13:44:34,113 WARN [Timer-Driven Process Thread-6] o.a.k.clients.producer.ProducerConfig The configuration sasl.kerberos.service.name = kafka/_HOST@CCTA.DK was supplied but isn't a known config.
2017-09-13 13:44:34,113 INFO [Timer-Driven Process Thread-6] o.a.kafka.common.utils.AppInfoParser Kafka version : 0.9.0.1
2017-09-13 13:44:34,113 INFO [Timer-Driven Process Thread-6] o.a.kafka.common.utils.AppInfoParser Kafka commitId : 23c69d62a0cabf06
2017-09-13 13:44:39,113 ERROR [Timer-Driven Process Thread-6] o.a.n.p.kafka.pubsub.PublishKafka PublishKafka[id=7a067740-015e-1000-ffff-ffffaeaac0ec] Failed to send all message for StandardFlowFileRecord[uuid=1402ef55-e3db-42e3-901e-28ebcf240224,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1505294789599-13484, container=default, section=172], offset=40948, length=17],offset=0,name=3131919623882367,size=17] to Kafka; routing to failure due to org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.
2017-09-13 13:44:39,113 ERROR [Timer-Driven Process Thread-6] o.a.n.p.kafka.pubsub.PublishKafka
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.
2017-09-13 13:44:39,113 INFO [Timer-Driven Process Thread-6] o.a.kafka.clients.producer.KafkaProducer Closing the Kafka producer with timeoutMillis = 5000 ms.
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
sasl.kerberos.service.name = kafka/_HOST@CCTA.DK
value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
2017-09-13 13:44:39,115 WARN [Timer-Driven Process Thread-6] o.a.k.clients.producer.ProducerConfig The configuration sasl.kerberos.service.name = kafka/_HOST@CCTA.DK was supplied but isn't a known config.
2017-09-13 13:44:39,115 INFO [Timer-Driven Process Thread-6] o.a.kafka.common.utils.AppInfoParser Kafka version : 0.9.0.1
2017-09-13 13:44:39,115 INFO [Timer-Driven Process Thread-6] o.a.kafka.common.utils.AppInfoParser Kafka commitId : 23c69d62a0cabf06
2017-09-13 13:44:44,115 ERROR [Timer-Driven Process Thread-6] o.a.n.p.kafka.pubsub.PublishKafka PublishKafka[id=7a067740-015e-1000-ffff-ffffaeaac0ec] Failed to send all message for StandardFlowFileRecord[uuid=88645193-ca0d-441e-a79b-487870bdf401,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1505294789599-13484, container=default, section=172], offset=41210, length=17],offset=0,name=3131979624448053,size=17] to Kafka; routing to failure due to org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.: org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.
2017-09-13 13:44:44,116 ERROR [Timer-Driven Process Thread-6] o.a.n.p.kafka.pubsub.PublishKafka
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 5000 ms.
2017-09-13 13:44:44,116 INFO [Timer-Driven Process Thread-6] o.a.kafka.clients.producer.KafkaProducer Closing the Kafka producer with timeoutMillis = 5000 ms.
^C
[w20960@sktudv01hdf01 nifi]$
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache NiFi
08-18-2017
08:19 AM
@nramanaiahHi, yes i found out the same,it seems to be happening when running hive through JDBC and ODBC. My problem is that i am running this in NIFI PutHiveQL processor
... View more
08-17-2017
08:50 AM
Hi i am doing some ETL with hiveQl
I am creating new tables based on external hive tables, but then the external hive table is empty no empty table is created. How do i create a empty table if there is no records in the external table om selecting from.
I am running the code from NIFI hiveql processor in nifi 2.x not supporting multiple hql statement. A oneliner would be beautiful
Her is a example
DROP TABLE IF EXISTS myname_t1;
CREATE TABLE myname_t1 STORED AS ORC LOCATION '/archive/data/myname/T1'AS
select*,INPUT__FILE__NAME ETL_FILENAME from myname;
... View more
Labels:
- Labels:
-
Apache Hive
06-19-2017
09:23 AM
Hi Matt, We are running HDF-2.1.2 planning to upgrade ASAP thanks for your anwser
... View more
06-15-2017
08:15 PM
I am generating my hql dynamically, but PutHiveQL can only executing one statement at the time so i have to split my hql op in separate flowfiles. I do that by splitting following hql code on semicolon which gives my every hql step in each flowfile which i direct to PutHiveQL processor My problem is that it does not sorts it in the order original hql steps was written. It tries to create the table1_t2 before it tries to create table1_his and errors out. Can i do something to make sure that my hql flowfiles comes in the right order ?. my hql CREATE EXTERNAL TABLE IF NOT EXISTS table1
(
col1 STRING,
col2 STRING,
col3 STRING,
col4 STRING
)
COMMENT 'some Data '
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\011'
STORED AS TEXTFILE
location '/datapath';
drop table if exists table1_his;
create table archive_shared_csrp.table1_his STORED AS ORC location '/databank/work/work_ekapital/' as
select cpr_nr
,csrp_koersel_dto
,lag (csrp_koersel_dto,1,date '9999-12-31') over w as ETL_EXPIRATION_DATE
,case when row_number () over w = 1 then 'yes' else 'no' end as ETL_ACTIVE
from table1
window w as (partition by cpr_nr order by csrp_koersel_dto desc)
;
drop table if exists table1_t2;
create table table1_t2 STORED AS ORC location '/datapath/T2' as
select csrp.* ,his.ETL_EXPIRATION_DATE,ETL_ACTIVE,cast(TO_DATE(FROM_UNIXTIME( UNIX_TIMESTAMP() ) ) as string) as ETL_LOAD_DATE, csrp.INPUT__FILE__NAME AS ETL_FILENAME
from table1 csrp
inner join archive_shared_csrp.table1_his his on his.cpr_nr = csrp.cpr_nr and his.csrp_koersel_dto = csrp.csrp_koersel_dto;
drop table if exists table1_t1;
create table table1_t1 STORED AS ORC location '/datapath/T1' as
select *
from table1_t2 csrp_t2
where ETL_EXPIRATION_DATE = '9999-12-31' and ETL_ACTIVE = 'yes';
DROP TABLE IF EXISTS table1_delta;
CREATE EXTERNAL TABLE table1_delta
(
col1 STRING,
col2 STRING,
col3 STRING,
col4 STRING
)
COMMENT 'Data fra CSRP '
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\011'
STORED AS TEXTFILE
location '/datapath/current';
drop table if exists table1_delta_base;
create table table1_delta_base STORED AS ORC location '/datapath/DELTA' as
select *
,cast(TO_DATE(FROM_UNIXTIME( UNIX_TIMESTAMP() ) ) as string) as ETL_LOAD_DATE
,INPUT__FILE__NAME AS ETL_FILENAME
from table1_delta
part of flow SpliContent Configuration
... View more
Labels:
- Labels:
-
Apache NiFi
06-14-2017
01:01 PM
Hi Dennis, Thank you very much that worked
... View more
06-13-2017
09:39 AM
when importing data from sybase all (NULL) values from sybase are written as null strings in the hdfs files I really want sqoop not to write anything when source contains (NULL) values is that tpossible ? sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \
--driver net.sourceforge.jtds.jdbc.Driver \
--connect jdbc:jtds:sybase://127.0.0.1:4000/BAS_csrp \
--username xxxx \
--P \
--query "select cpr_nr ,
csrp_koersel_dto ,
haendelse_kod ,
nyt_cpr_nr ,
omplart_kod ,
status_kod ,
umynd_kod ,
umynd_ins_koersel_dto ,
umynd_del_koersel_dto ,
pers_nvn ,
flyt_dto ,
adr_aendr_aarsag_kod ,
adr_bskyt_kod ,
adr_art_kod ,
bopael_kom_kod ,
vej_kod ,
tsr_nr ,
co_nvn ,
adr1 ,
vej_nvn ,
hus_nr ,
hus_bogst ,
etage_nr ,
side_nr ,
doer_nr ,
by_nvn ,
post_nr ,
post_by ,
aegtf_fra_dto ,
aegtf_til_dto ,
aegtf_cpr_nr ,
civst_fra_dto ,
civst_til_dto ,
civst_kod ,
doed_kod ,
doed_dto ,
genopliv_kod ,
genopliv_dto ,
kirke_fra_dto ,
kirke_til_dto ,
kirke_skat_kod ,
far_cpr_nr ,
mor_cpr_nr ,
pbs1_fra_dto ,
pbs1_til_dto ,
pbs2_fra_dto ,
pbs2_til_dto ,
udl_adr1 ,
udl_adr2 ,
udl_adr3 ,
udl_adr4 ,
udl_adr5 ,
kontakt_adr1 ,
kontakt_adr2 ,
kontakt_adr3 ,
kontakt_adr4 ,
kontakt_adr5 ,
supp_adr1 ,
supp_adr2 ,
supp_adr3 ,
emailadr ,
mobiltlfnr ,
stabokod ,
motagdto_sag ,
efterbskdto ,
notattkst ,
ibertperskod ,
senr_1 ,
pernr_1 ,
fradto_1 ,
tildto_1 ,
foskod_1 ,
ibertperskod_1 ,
senr_2 ,
pernr_2 ,
fradto_2 ,
tildto_2 ,
foskod_2 ,
ibertperskod_2a ,
senr_3 ,
pernr_3 ,
fradto_3 ,
tildto_3 ,
foskod_3 ,
ibertperskod_3a ,
pers_pernr ,
pers_omvalgdto ,
pers_fradto ,
pers_tildto ,
pers_fosskapct ,
pers_ibertperskod ,
fri_pernr ,
fri_fradto ,
fri_tildto ,
fri_ambifriko ,
fri_ibertperskod ,
uden_ordnnr_1 ,
uden_pernr_1 ,
uden_fradto_1 ,
uden_til_dto_1 ,
uden_ibertperskod_1b ,
uden_notat_tekst_1 ,
uden_ordnnr_2 ,
uden_pernr_2 ,
uden_fradto_2 ,
uden_til_dto_2 ,
uden_ibertperskod_2b ,
uden_notat_tekst_2 ,
uden_ordnnr_3 ,
uden_pernr_3 ,
uden_fradto_3 ,
uden_til_dto_3 ,
uden_ibertperskod_3b ,
uden_notat_tekst_3 ,
pal_ordnnr ,
pal_pernr ,
pal_fradto ,
pal_tildto ,
ibertperskod_2 ,
notattkst_1 ,
pernr_fripens_1 ,
se_nr_fripens_1 ,
policektonr_1 ,
fri_fra_dto_1 ,
fri_til_dto_1 ,
skatfri_1 ,
ibertperkod_fri_1 ,
notat_fri_1 ,
pernr_fripens_2 ,
se_nr_fripens_2 ,
policektonr_2 ,
fri_fra_dto_2 ,
fri_til_dto_2 ,
skatfri_2 ,
ibertperkod_fri_2 ,
notat_fri_2 ,
notattkst_3 ,
pernr_fripens_3 ,
policektonr_3 ,
fri_fra_dto_3 ,
fri_til_dto_3 ,
skatfri_3 ,
ibertperkod_fri_3 ,
notat_fri_3 ,
vandre_fra_dto ,
vandre_til_dto ,
ibertperkod_vandre ,
notat_vandre ,
digital_fra_dto ,
digital_til_dto from csrp_total_bas where CAST(csrp_koersel_dto as DATE)<'2017-04-07' AND \$CONDITIONS" \
--fields-terminated-by '\t' \
--split-by cpr_nr \
--target-dir /tmp/test/sqoop/csrp/
... View more
Labels:
- Labels:
-
Apache Sqoop
05-19-2017
08:25 AM
@Greg Keys thank you for your reply, still facing odd behaviour loosing data ind the inputstream, think it is related to howe write outstream works. when i split and merge without execute script everything is ok, when putting my scriopt in between the output number of lines seems random sometimes less, sometime more than the original flowfile. Think it something related to memory in datastream. Anyway i go about using hive to this its easier
... View more
05-19-2017
08:20 AM
@SindhuThank you very much-
... View more