Member since
02-15-2017
7
Posts
0
Kudos Received
0
Solutions
05-12-2017
01:57 PM
I have a table with multiple columns and I want to make a Python script that processes the value of the "age" column and add another column to the table containing the membership strings : "Kid" if age between 2 and 11, "Teen" if age between 12 and 19 and so on. Any help ?
... View more
04-20-2017
05:57 PM
I have a phoenix table in which I have combine the CDR (Call Detail Records) and CRM (Customer Relation Management) data. My table has 18 columns : number, operator_a, operator_b, call_direction, call_type, call_category, number_calls, duration, longitude, latitude, age, day, month, year, zip_code,type_offer, offer, gender. I am a beginner in machine learning and I am having trouble loading the table into my python script for clustering purpose. This is what I have got so far, I am sure how to proceed or if I am doing it wrong from matplotlib import pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
import numpy as np
import phoenixdb
from sklearn import preprocessing
from sklearn.cluster import KMeans
database_url = 'http://localhost:8765/'
conn = phoenixdb.connect(database_url, autocommit=True)
cursor = conn.cursor()
cursor.execute("SELECT * FROM X LIMIT 45000")
df = pd.DataFrame(cursor.fetchall())
df.columns = [i[0] for i in cursor.description]
label_encoder = preprocessing.LabelEncoder()
df["ANUMBER"] = label_encoder.fit_transform(df["ANUMBER"])
df["AOPERATOR"] = label_encoder.fit_transform(df["AOPERATOR"])
df["BOPERATOR"] = label_encoder.fit_transform(df["BOPERATOR"])
df["DIRECTION"] = label_encoder.fit_transform(df["DIRECTION"])
df["TYPE"] = label_encoder.fit_transform(df["TYPE"])
df["CAT"] = label_encoder.fit_transform(df["CAT"])
df["NBR"] = label_encoder.fit_transform(df["NBR"])
df["DUREE"] = label_encoder.fit_transform(df["DUREE"])
df["LON"] = label_encoder.fit_transform(df["LON"])
df["LAT"] = label_encoder.fit_transform(df["LAT"])
df["AGE"] = label_encoder.fit_transform(df["AGE"])
df["DAY"] = label_encoder.fit_transform(df["DAY"])
df["MONTH"] = label_encoder.fit_transform(df["MONTH"])
df["YEAR"] = label_encoder.fit_transform(df["YEAR"])
df["ZIP"] = label_encoder.fit_transform(df["ZIP"])
df["TOFFER"] = label_encoder.fit_transform(df["TOFFER"])
df["OFFER"] = label_encoder.fit_transform(df["OFFER"])
df["GENDER"] = label_encoder.fit_transform(df["GENDER"])
df =df.dropna()
... View more
Labels:
03-23-2017
10:36 AM
In the CDR file, a caller could make multiple calls to different numbers (Bnumber), what I want is to add the corresponding CRM information to each row and not combining them all into one. Do you see what I am after ?
... View more
03-23-2017
10:22 AM
Anumber can be found in multiple rows, it is the phone number of the caller and yes I want to use it as row key
... View more
03-23-2017
08:41 AM
I have two telecom csv files : one is for CDR(Call Data Records) and one for CRM(Customer Relationship Management). The CDR file has the following columns : Anumber, ContractCode, Aoperator, Bnumber, Boperator, Direction, Type, Month, Category, numberOfCalls, duration, longitude, latitude. The CRM file has the following columns : Anumber, age, day_c, month_c, year_c, zip_code, offerType, offer, gender. I want to create a HBase table joining the data from the CRM table to the corresponding rows, for machine learning purposes. Any ideas ? Thanks !
... View more
Labels:
03-07-2017
05:17 PM
I used the following : [ahoussem@namenode ~]$ hadoop jar /usr/hdp/2.5.3.0-37/phoenix/phoenix-4.7.0.2.5.3.0-37-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table c_values --input /user/houssem/c_values.csv and I got this : 17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:os.version=2.6.32-642.13.1.el6.x86_64
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:user.name=ahoussem
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/ahoussem
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Client environment:user.dir=/home/ahoussem
17/03/07 16:38:49 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=90000 watcher=org.apache.hadoop.hbase.zookeeper.PendingWatcher@247e2ef3
17/03/07 16:38:49 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
17/03/07 16:38:49 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
17/03/07 16:38:49 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x25aa85b5762013f, negotiated timeout = 40000
17/03/07 16:38:49 INFO client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null
17/03/07 16:38:49 INFO metrics.Metrics: Initializing metrics system: phoenix
17/03/07 16:38:49 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
17/03/07 16:38:49 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
17/03/07 16:38:49 INFO impl.MetricsSystemImpl: phoenix metrics system started
Exception in thread "main" java.sql.SQLException: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2590)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.util.PhoenixContextExecutor.call(PhoenixContextExecutor.java:78)
at org.apache.phoenix.query.ConnectionQueryServicesImpl.init(ConnectionQueryServicesImpl.java:2327)
at org.apache.phoenix.jdbc.PhoenixDriver.getConnectionQueryServices(PhoenixDriver.java:233)
at org.apache.phoenix.jdbc.PhoenixEmbeddedDriver.createConnection(PhoenixEmbeddedDriver.java:142)
at org.apache.phoenix.jdbc.PhoenixDriver.connect(PhoenixDriver.java:202)
at java.sql.DriverManager.getConnection(DriverManager.java:664)
at java.sql.DriverManager.getConnection(DriverManager.java:208)
at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:305)
at org.apache.phoenix.util.QueryUtil.getConnection(QueryUtil.java:296)
at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.loadData(AbstractBulkLoadTool.java:209)
at org.apache.phoenix.mapreduce.AbstractBulkLoadTool.run(AbstractBulkLoadTool.java:183)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
Caused by: java.lang.RuntimeException: java.lang.NullPointerException
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208)
at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:327)
at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:302)
at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:167)
at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:162)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:794)
at org.apache.hadoop.hbase.MetaTableAccessor.fullScan(MetaTableAccessor.java:602)
at org.apache.hadoop.hbase.MetaTableAccessor.tableExists(MetaTableAccessor.java:366)
at org.apache.hadoop.hbase.client.HBaseAdmin.tableExists(HBaseAdmin.java:405)
at org.apache.phoenix.query.ConnectionQueryServicesImpl$13.call(ConnectionQueryServicesImpl.java:2358)
... 21 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.getMetaReplicaNodes(ZooKeeperWatcher.java:395)
at org.apache.hadoop.hbase.zookeeper.MetaTableLocator.blockUntilAvailable(MetaTableLocator.java:562)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getMetaRegionLocation(ZooKeeperRegistry.java:61)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateMeta(ConnectionManager.java:1192)
at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1159)
at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:300)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:156)
at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:60)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)
... 30 more
... View more
Labels: