Archives of Support Questions (Read Only)

ramesh_veerappa · ‎04-04-2016

Getting the following error when trying to bulk load HDFS data into Phoenix. The data is separated by Cntl-A delimiter.

Command used is as follows:

hadoop jar /usr/hdp/2.3.4.0-3485/phoenix/phoenix-4.4.0.2.3.4.0-3485-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table INTM.TEST_DATA --input /data/test_data/20160329145829/HDFS_TEST_DATA.csv --zookeeper localhost:2181:/hbase

Error Message=

============

16/04/04 08:00:46 INFO mapreduce.Job: Task Id : attempt_1459451088217_0193_m_000008_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.RuntimeException: Error on record, CSV record does not have enough values (has 1, but needs 14), record =[2808976522139491A0301939984852009-08-22 08:49:46.961000UMEMCVSNRAIL2009-08-22 08:49:46.961000UMEMCVSNRAILNative\etlload2016-03-29 14:58:31.763751] at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:176) at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: java.lang.RuntimeException: Error on record, CSV record does not have enough values (has 1, but needs 14), record =[2808976522139491A0301939984852009-08-22 08:49:46.961000UMEMCVSNRAIL2009-08-22 08:49:46.961000UMEMCVSNRAILNative\etlload2016-03-29 14:58:31.763751] at org.apache.phoenix.mapreduce.CsvToKeyValueMapper$MapperUpsertListener.errorOnRecord(CsvToKeyValueMapper.java:261) at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:168) at org.apache.phoenix.util.csv.CsvUpsertExecutor.execute(CsvUpsertExecutor.java:136) at org.apache.phoenix.mapreduce.CsvToKeyValueMapper.map(CsvToKeyValueMapper.java:157)

ramesh_veerappa · ‎04-13-2016

It was a permission issue in Ranger. The issue was been resolved. Thank you very much for your help.

View solution in original post

pminovic · ‎04-05-2016

Hi @Ram Veer, CsvBulkLoadTool already supports custom delimiter using the '-d' option. To set Ctrl-A add this at the end of your command:

-d '^v^a' ... inside quotes click Ctrl-v followed by Ctrl-a, as the result '^A' will appear

ramesh_veerappa · ‎04-05-2016

Thank you for your response. I am able to parse it based on your suggestion. However, the loader is not able to load into Phoenix table.

16/04/05 10:16:15 INFO mapreduce.CsvBulkLoadTool: Loading HFiles from /tmp/c0fdbfa0-383d-4f7a-bb0a-d41c58f3742b/INTM.EQUIP_KEY 16/04/05 10:16:15 WARN mapreduce.LoadIncrementalHFiles: managed connection cannot be used for bulkload. Creating unmanaged connection. 16/04/05 10:16:15 INFO zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x30022bf5 connecting to ZooKeeper ensemble=ucschdpdev01.railinc.com:2181 16/04/05 10:16:15 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=ucschdpdev01.railinc.com:2181 sessionTimeout=90000 watcher=hconnection-0x30022bf50x0, quorum=ucschdpdev01.railinc.com:2181, baseZNode=/hbase 16/04/05 10:16:15 INFO zookeeper.ClientCnxn: Opening socket connection to server ucschdpdev01.railinc.com/10.160.230.141:2181. Will not attempt to authenticate using SASL (unknown error) 16/04/05 10:16:15 INFO zookeeper.ClientCnxn: Socket connection established to ucschdpdev01.railinc.com/10.160.230.141:2181, initiating session 16/04/05 10:16:15 INFO zookeeper.ClientCnxn: Session establishment complete on server ucschdpdev01.railinc.com/10.160.230.141:2181, sessionid = 0x153c6f93c7f1f5a, negotiated timeout = 40000 16/04/05 10:16:15 WARN mapreduce.LoadIncrementalHFiles: Skipping non-directory hdfs://ucschdpdev01.railinc.com:8020/tmp/c0fdbfa0-383d-4f7a-bb0a-d41c58f3742b/INTM.EQUIP_KEY/_SUCCESS 16/04/05 10:16:15 INFO hfile.CacheConfig: CacheConfig:disabled 16/04/05 10:16:16 INFO mapreduce.LoadIncrementalHFiles: Trying to load hfile=hdfs://ucschdpdev01.railinc.com:8020/tmp/c0fdbfa0-383d-4f7a-bb0a-d41c58f3742b/INTM.EQUIP_KEY/0/344df66eb2644474b3ac5da7ecbe767c first=1 last=999999 16/04/05 10:27:29 INFO client.RpcRetryingCaller: Call exception, tries=10, retries=35, started=673701 ms ago, cancelled=false, msg=row '' on table 'INTM.EQUIP_KEY' at region=INTM.EQUIP_KEY,,1459834472681.60a4b8f4fad454e419242d08a51660ad., hostname=ucschdpdev03.railinc.com,16020,1459451121171, seqNum=2

pminovic · ‎04-05-2016

It failed after 11 minutes, so there is maybe a permission issue on HFiles (see the details on the Tool page). Can you try to add this to your command, and retry:

-Dfs.permissions.umask-mode=000

or if possible run the command as hbase user.

ramesh_veerappa · ‎04-13-2016

It was a permission issue in Ranger. The issue was been resolved. Thank you very much for your help.

pminovic · ‎04-13-2016

Hi @Ram Veer, great news! Please consider to accept/up-vote my answer above. Tnx!

pminovic · ‎04-13-2016

Well, I thought my answer of Apr. 5 ... 🙂

Cloudera Community

Archives of Support Questions (Read Only)

Phoenix Bulk Load on Ctrl-A delimiter (error code 143)