Member since
05-20-2016
13
Posts
4
Kudos Received
0
Solutions
09-20-2017
08:28 PM
Hi, is it possible collect logs from different windows and linux servers by using nifi and minifi ? If there is no way to do that,what is the best practice ? Logs are in different paths. Thank in advance,
... View more
Labels:
- Labels:
-
Apache MiNiFi
-
Apache NiFi
04-21-2017
08:29 PM
Hi Josh, one more question .. Is it possible to join phoenix and hive tables by using spark sql ?
... View more
04-03-2017
04:51 PM
Hi all Environment is HDP 2.4 . Sqoop transfers all rows from rdbms to hdfs approx. in 1 min. CsvBulkLoadTool upsert all rows into table in 9 min. There is no secondary index. csv import path ;
hdfs dfs -ls /user/xxsqoop/TABLE1_NC2
Found 5 items
-rw-r--r-- 3 bigdata hdfs 0 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/_SUCCESS
-rw-r--r-- 3 bigdata hdfs 365,036,758 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/part-m-00000
-rw-r--r-- 3 bigdata hdfs 188,504,177 2017-04-03 19:12 /user/xxsqoop/TABLE1_NC2/part-m-00001
-rw-r--r-- 3 bigdata hdfs 340,190,219 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/part-m-00002
-rw-r--r-- 3 bigdata hdfs 256,850,726 2017-04-03 19:13 /user/xxsqoop/TABLE1_NC2/part-m-00003
phoenix import command ;
HADOOP_CLASSPATH=/etc/hbase/conf:$(hbase mapredcp) hadoop jar
/usr/hdp/2.4.0.0-169/phoenix/phoenix-4.4.0.2.4.0.0-169-client.jar
org.apache.phoenix.mapreduce.CsvBulkLoadTool -Dmapreduce.job.reduces=4
--table TB.TABLE1 --input /user/xxsqoop/TABLE1_NC2 --zookeeper lat01bigdatahwdn:2181:/hbase-unsecure --delimiter '^A'
...
...
17/04/03 19:24:08 INFO mapreduce.HFileOutputFormat2: Configuring 1 reduce partitions to match current region count
phoenix TABLE ddl
CREATE TABLE TB.TABLE1
(
COL1 DATE NOT NULL
,COL2 SMALLINT NOT NULL
,COL3 INTEGER NOT NULL
,COL4 SMALLINT NOT NULL
,COL5 VARCHAR(8)
...
...
,CONSTRAINT pk PRIMARY KEY (COL1,COL2,COL3,COL4)
)
DATA_BLOCK_ENCODING='FAST_DIFF', TTL=604800, COMPRESSION='SNAPPY';
CsvBulkLoadTool takes always 1 reducer and it makes csv import very slow. I thought importer would get 4 reducer jobs (-Dmapreduce.job.reduces=4)
... View more
Labels:
- Labels:
-
Apache Phoenix
03-22-2017
05:39 PM
1 Kudo
Hi all, We have tried to connect phoenix (on hdp 2.4) from websphere application server (ver 8.5.5.10 ) (JEE6 container) using
phoenix jdbc thin and thick client. The same error occured for both ; ClassCastException: org.apache.phoenix.queryserver.client.Driver incompatible with javax.sql.ConnectionPoolDataSource
ClassCastException: org.apache.phoenix.jdbc.PhoenixDriver incompatible with javax.sql.ConnectionPoolDataSource . What schould we do ? Thanks,
... View more
Labels:
- Labels:
-
Apache Phoenix
03-21-2017
05:42 PM
Hi all, what is wrong ? Env. Hdp 2.4 > phoenix
CREATE TABLE TB.TABLE1
(
COL1 VARCHAR(3) NOT NULL
,COL2 VARCHAR(20)
,CONSTRAINT pk PRIMARY KEY (COL1)
)
SELECT T.* FROM (
SELECT * FROM TB.TABLE1
) T
SQL Error [1001] [42I01]: ERROR 1001 (42I01): Undefined column family. familyName=T.null
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null
Thanks,
... View more
Labels:
- Labels:
-
Apache Phoenix
03-14-2017
03:22 PM
1 Kudo
Hi all, We have to ingest very big table from rdbms to phoenix. According to the following document, It is not possible from rdbms to phoenix by using sqoop. https://community.hortonworks.com/questions/41848/sqoop-import-to-a-phoenix-table.html Sqoop Jira SQOOP-2649 for sqoop-phoenix integration is addressed on Sqoop 1.4.7. https://issues.apache.org/jira/browse/SQOOP-2649 In RDBM table there are not only varchar data types but also some numeric and date fields. As far as I understand, hbase serialization is different from phoenix. It'is not possible read some numeric data type underlying hbase by using phoenix with JDBC.
(Experiement 1) I have found some JDBC driver Somewhere on the internet and I used this jdbc driver in the following experiments => phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB I tried two different methods to ingest data from rdbsm to phoenix, but did't not succeed... Experiement 1. 1.1. Create hbase table and phoenix table/view 1.2.Transfer data rdbms to hbase 1.3.Read hbase table from phoenix 1.1. Create hbase table and phoenix table/view hbase > create 'ODS.HBASE_TABLE', 'defcolfam'
CREATE TABLE ODS.HBASE_TABLE (
"defcolfam.COL1" DATE NOT NULL
, "defcolfam.COL2" INTEGER NOT NULL
, "defcolfam.COL3" SMALLINT NOT NULL
, "defcolfam.COL4" SMALLINT NOT NULL
, "defcolfam.COL5" VARCHAR(8)
, "defcolfam.COL6" VARCHAR(3)
, "defcolfam.COL7" INTEGER
, CONSTRAINT pk PRIMARY KEY ("defcolfam.COL1","defcolfam.COL2","defcolfam.COL3","defcolfam.COL4")) ;
1.2. Transfer data rdbms to hbase , sqoop import -connect jdbc:oracle:thin:@111.11.1.111:3043:DBNAME -username XXX -password XXXXXX \
--query "SELECT COL1,COL2,COL3,COL4,COL5,COL6,COL7 FROM RDBMS_TABLE WHERE COL7 ='99995' AND \$CONDITIONS" \
--hbase-table 'ODS.HBASE_TABLE' --column-family 'defcolfam' \
--split-by COL1
1.3.Read hbase table from phoenix with phoenix jdbc driver or with ./sqlline.py with phoenix jdbc driver or with ./sqlline.py
SELECT * FROM ODS.HBASE_TABLE;
0: jdbc:phoenix:localhost> SELECT * FROM ODS.HBASE_TABLE
java.lang.RuntimeException: java.sql.SQLException: ERROR 201 (22000): Illegal data. Expected length of at least 48 bytes, but had 28
at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
at sqlline.TableOutputFormat.print(TableOutputFormat.java:33)
at sqlline.SqlLine.print(SqlLine.java:1653)
Numeric fields look different from hbase shell.
Reason=>
HBase and Phoenix encode floats,integer and some other data types differently. https://community.hortonworks.com/questions/15381/created-phoenix-view-to-map-existing-hbase-table-b.html Experiement 2. 2.1 Create HIVE table 2.2. Transfer from data RDBMs to HIVE table, 2.3. Transfer from HIVE table to PHOENIX by using sqoop , with the following phoenix JDBC driver (phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB) , 2.1. Create HIVE table hive table definition >
CREATE TABLE ODS.HIVE_TABLE_NAME_ORC (
COL1 DATE
,COL2 INT
,COL3 SMALLINT
,COL4 DOUBLE
,COL5 VARCHAR(8)
,COL6 VARCHAR(3)
,COL7 INT
) STORED as ORCFILE;
2.2. Transfer from data rdbms to HIVE table sqoop import -connect jdbc:oracle:thin:@111.11.1.111:3043:DBNAME -username XXX -password XXXXXX \
--query "SELECT COL1,COL2,COL3,COL4,COL5,COL6,COL7 FROM RDBMS_TABLE_NAME WHERE COL1 = '9995' AND \$CONDITIONS" \
--split-by COL1 \
--hcatalog-table HIVE_TABLE_NAME_ORC \
--hcatalog-database "ODS"
2.3. Transfer from HIVE table to PHOENIX by using phoenix JDBC driver (phoenix-4.3.0-clabs-phoenix-1.0.0-SNAPSHOT-client.jar ..... 50,118 KB) phoenix table definition >
CREATE TABLE ODS.PHOENIX_TABLE_NAME
(
COL1 DATE NOT NULL
,COL2 INTEGER NOT NULL
,COL3 SMALLINT NOT NULL
,COL4 SMALLINT NOT NULL
,COL5 VARCHAR(8)
,COL6 VARCHAR(3)
,COL7 INTEGER
,CONSTRAINT pk PRIMARY KEY (COL1,COL2,COL3,COL4)
)
sqoop export --connect jdbc:phoenix:lat01bigdatahwdn:2181:/hbase-unsecure --driver org.apache.phoenix.jdbc.PhoenixDriver \
-username none -password none --table PHOENIX_TABLE_NAME --hcatalog-table HIVE_TABLE_NAME_ORC --hcatalog-database "ODS"
sqoop generates following sql and error ;
17/03/13 18:41:25 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM ODS.PHOENIX_TABLE_NAME AS t WHERE 1=0
and following error occurs ;
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null
at org.apache.phoenix.schema.PTableImpl.getColumnFamily(PTableImpl.java:747)
in ./sqlline command line I got the same error;
SELECT t.* FROM TB.HAREKET_FROM_HIVE AS t WHERE 1=0
0: jdbc:phoenix:localhost> SELECT t.* FROM TB.HAREKET_FROM_HIVE AS t WHERE 1=0;
Error: ERROR 1001 (42I01): Undefined column family. familyName=T.null (state=42I01,code=1001)
org.apache.phoenix.schema.ColumnFamilyNotFoundException: ERROR 1001 (42I01): Undefined column family. familyName=T.null
Solution 3. Am i doing something wrong in Exp.1 and Exp 2. ? Solution 4. Should we use kafka ? Some producer can read parallel from rdbms like sqoop. At the same time some consumer threads can read from kafka and upsert to PHOENIX. Does that make sense? Solution 5. From rdbms to csv on hdfs by using sqoop and than import csv to phoenix ? Solution 6. Can be used in some way NiFi and Kafka together ? Solution 7. Any other solutions ? What should be the best practice and fastest solution for a such ingestion ?
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache Sqoop
03-09-2017
01:33 PM
Is it possible to join phoenix and hive tables in a sql ? We are planning our ODS layer on phoenix because of CRUD operations and DataWerahous layer on Hive (There are only Insert operations) . For example ;
TableHive in Hive.
TablePhoenix in Phoenix. Select * from
TableHive H ,TablePhoenix P
where H.id=P.id; ( is it prossible ? ). Thanks in advance,
... View more
- Tags:
- Data Processing
- Hive
- join
- Phoenix
- Upgrade to HDP 2.5.3 : ConcurrentModificationException When Executing Insert Overwrite : Hive
Labels:
- Labels:
-
Apache Hive
-
Apache Phoenix
03-03-2017
07:45 PM
Hi all, We want to write a simple kafka producer on our HDP 2.4 cluster .( 1 master 3 datanode.) for loop is working but,the messages was not sent to kafka topic. I have tried more different value for bootstrap.servers parameters but still not working. There is no ERROR or WARNING in log files about that. console producer and consumer are working ; .
../bin/kafka-console-producer.sh --broker-list lat01bigdatahwdn:6667 --topic testtopic ../bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic testtopic --from-beginning props.put("bootstrap.servers", "localhost:9092"); with these line, producer is working on MS windows OS. Code ; server.properties and producer.xml are as follows; server-properties-part-01.png server-properties-part-02.png producer-xml.png
... View more
Labels:
- Labels:
-
Apache Kafka
02-27-2017
12:07 PM
We have to read a text file and transfer rows to kafka. TailFile SCHEDULING like following ; Scheduling Strategy => Timer driven Run Schedule => 10 sec ( also we have tried 1 min ) When we stop and start TailFile processor works fine. Nifi does not schedule TailFile job ( maybe all processors ) itself. Where schould be controlled ?
... View more
- Tags:
- Data Processing
- NiFi
Labels:
- Labels:
-
Apache NiFi
02-25-2017
07:04 PM
Nifi is running on host name nifihost (host1). Some log text files stored on texfilehost (host2). These are different hosts . TailFile only works on a local system (nifihost). Is it possible to ingest data from the files on the other host (texfilehost / host2) with nifi ?
... View more
Labels:
- Labels:
-
Apache NiFi
02-24-2017
04:50 PM
1 Kudo
Our use case ; In some text files coming data from some mission critical applications,these are not click stream data or something like that.
We have to catch every row without data losing.
At the beginning, daily approximately 15,000,000 rows expected. 30,000 rows/minute. Somehow we have to use kafka to store data. Some consumers take data from kafka topics and than write to hbase or phoenix.Here is clear for us. The most important thing is all rows in these text files must be readed anyway. Question 1. Which solution is best practice for that ? 1. Flume & Kafka ? 2. Spark streaming & Kafka ? 3. Only Spark streaming ? 4. Storm & Kafka ? 5. Flume --> to hbase or phoenix ? 6. any other solutions ?
Question 2. Can we use best practice solution with Nifi ?
Thanks in advance,
... View more
Labels:
02-24-2017
10:01 AM
You are right about that. Maven artifact and group id does not work
.phoenix/phoenix-server-client -->
<dependency>
<groupId>org.apache.phoenix</groupId>
<artifactId>phoenix-server-client</artifactId>
<version>4.7.0-HBase-1.1</version>
</dependency>
.
How can be found right phoenix jdbc driver and related jars to connect phoenix using java ?
Is it possibble to connect phoenix in any java application?
... View more
02-23-2017
07:59 PM
1 Kudo
How can be found Phoenix jdbc drivers on maven repository with group and artifact id or direct phoenix_jdbc_bla_bla.jar like ojdbc6.jar oracle jdbc driver ?
Thanks in advance
... View more
Labels:
- Labels:
-
Apache Phoenix