Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Join two tables into one HBase table

Solved Go to solution
Highlighted

Join two tables into one HBase table

I have two telecom csv files : one is for CDR(Call Data Records) and one for CRM(Customer Relationship Management).

The CDR file has the following columns : Anumber, ContractCode, Aoperator, Bnumber, Boperator, Direction, Type, Month, Category, numberOfCalls, duration, longitude, latitude.

The CRM file has the following columns : Anumber, age, day_c, month_c, year_c, zip_code, offerType, offer, gender.

I want to create a HBase table joining the data from the CRM table to the corresponding rows, for machine learning purposes.

Any ideas ? Thanks !

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Join two tables into one HBase table

You can create a table with some column family of interest and run the ImportTsv on the both the files separately by specifying the Anumber field as HBASE_ROW_KEY then columns in the same Anumber will be combined into single row. Is this something you are looking?

Ex:

 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:age,cf:day_c,cf:month_c,cf:year_c,cf:zip_code,cf:offerType,cf:offer,cf:gender -Dimporttsv.bulk.output=hdfs://storefileoutput datatsv hdfs://inputfile
 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ContractCode,cf:Aoperator,cf:Bnumber, cf:Boperator, cf:Direction, cf:Type, cf:Month, cf:Category, cf:numberOfCalls, cf:duration, cf:longitude, cf:latitude -Dimporttsv.bulk.output=hdfs://storefileoutput datatsv hdfs://inputfile

View solution in original post

5 REPLIES 5
Highlighted

Re: Join two tables into one HBase table

@Houssem Alayet is Anumber common and going to be row key for the table?

Highlighted

Re: Join two tables into one HBase table

Anumber can be found in multiple rows, it is the phone number of the caller and yes I want to use it as row key

Highlighted

Re: Join two tables into one HBase table

You can create a table with some column family of interest and run the ImportTsv on the both the files separately by specifying the Anumber field as HBASE_ROW_KEY then columns in the same Anumber will be combined into single row. Is this something you are looking?

Ex:

 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:age,cf:day_c,cf:month_c,cf:year_c,cf:zip_code,cf:offerType,cf:offer,cf:gender -Dimporttsv.bulk.output=hdfs://storefileoutput datatsv hdfs://inputfile
 HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-VERSION.jar importtsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:ContractCode,cf:Aoperator,cf:Bnumber, cf:Boperator, cf:Direction, cf:Type, cf:Month, cf:Category, cf:numberOfCalls, cf:duration, cf:longitude, cf:latitude -Dimporttsv.bulk.output=hdfs://storefileoutput datatsv hdfs://inputfile

View solution in original post

Re: Join two tables into one HBase table

In the CDR file, a caller could make multiple calls to different numbers (Bnumber), what I want is to add the corresponding CRM information to each row and not combining them all into one. Do you see what I am after ?

Highlighted

Re: Join two tables into one HBase table

Ya got it. Just combining that from raw files is not possible. Instead you need to create two tables for CDR data and CRM data and you can write MR job or java client based on data size with following steps.

1) Scan CDR table and get Bnumber

2) Call get on CRM table to get the corresponding details.

3) Prepared puts from the get call on CRM and add them to the ROW get from CDR and write the CDR.

or else you can use Apache Phoenix so that you can utilise UPSERT SELECT features which simplify things.

http://phoenix.apache.org/

http://phoenix.apache.org/language/index.html#upsert_select

Don't have an account?
Coming from Hortonworks? Activate your account here