Member since
07-28-2016
27
Posts
8
Kudos Received
0
Solutions
11-22-2018
10:52 AM
using PUT command, need to submit the curl twice. There is "negotiate" curl command which does the same in single submission. curl --negotiate -u : -L "http://namenode:50070/webhdfs/v1/user/username/余宗阳视频审核稿-1024.docx?op=CREATE&user.name=username" -T 余宗阳视频审核稿-1024.docx
... View more
11-22-2018
10:47 AM
I found the root cause of the issue. Should use namenode with 50070 port. I was using edge node and hence the failure. Thanks!
... View more
11-21-2018
12:50 PM
After encoding it is working for me. But the first command, it's still throwing the error "illegal character found at index 62". 62 is where the filename will start in the destination path. i checked the $LANG, and it is UTF-8. What was the exact output for you when executed the first curl without encoding?
... View more
11-21-2018
09:54 AM
@Jagadeesan A S That's working thanks! I am trying to put the same file to hdfs using Curl via webhdfs and getting error ">HTTP Status 500 - Illegal character in path at index " curl -i -H 'content-type:application/octet-stream' -H 'charset:UTF-8' -X PUT -T '余宗阳视频审核稿-1024.docx' 'http://hostname:14000/webhdfs/v1/user/username/余宗阳视频审核稿-1024.docx?op=CREATE&data=true&user.name=username&overwrite=true' Any other header to be passed to recognize the chinese character here?
... View more
11-20-2018
09:00 AM
I am trying to put a file in hadoop with the filename in chinese characters. file: 余宗阳视频审核稿-1024.docx but the file name is looking vaguely in hadoop as Óà×ÚÑôÊÓÆµÉóºË¸å-1024.docx Any hints to solve this issue?
... View more
- Tags:
- files
- Hadoop Core
- HDFS
Labels:
- Labels:
-
Apache Hadoop
07-17-2018
01:36 PM
Matt, if you have the detailed document on importing data from Salesforce to Hadoop using nifi, please share.
... View more
08-06-2017
07:07 PM
Sqoop Incremental update with Merge-key failing with "Error: java.lang.RuntimeException: Can't parse input data:" For the first time job was successful. But since 2nd run it is failing. Below is the sqoop job used. Sqoop job: sqoop job -Dhadoop.security.credential.provider.path=jceks://hdfs/user/username/username.password.jceks --create temp_xxxx_update --meta-connect jdbc:hsqldb:hsql://xxxxxx.xx.xxxx.com:16000/sqoop -- import --connect "jdbc:oracle:thin:@yyyy.yyy.yyyy:yyyy" --username username --password-alias PWD --target-dir /hdfs/xxxx/staging/temp_xxxx_update --split-by ID --merge-key ID --table TEMP_xxxx --incremental lastmodified --check-column LAST_MODIFIED_COL --last-value '2017-08-04 05:53:10.0' --input-lines-terminated-by '\n' --input-null-string "\\\\N" --input-null-non-string "\\\\N" --direct --fetch-size 1000 -m 4 Error: 17/08/06 13:50:44 INFO mapreduce.Job: Task Id : attempt_1501783998019_3059_m_000000_0, Status : FAILED
Error: java.lang.RuntimeException: Can't parse input data: '5821C57D7-4471-61E9-E040-7DE36F600F65<SYS>2010-03-18 19:10:56<SYS>2017-07-25 18:09:4411D2010-03-18 19:15:25IPMX2010-03-18 19:10:56004OK2010-03-18 19:10:56Material no=<AZ5409>0nullZPE_MATMAS04650MERGE\N4file=../down/ZPE_MATMAS04/20100318/1268953856729_0.zipnull\Nnull'
at TEMP_FSMSG.__loadFromFields(TEMP_FSMSG.java:1706)
at TEMP_FSMSG.parse(TEMP_FSMSG.java:1504)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:53)
at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.NumberFormatException
at java.math.BigDecimal.<init>(BigDecimal.java:550)
at java.math.BigDecimal.<init>(BigDecimal.java:383)
at java.math.BigDecimal.<init>(BigDecimal.java:806)
at TEMP_FSMSG.__loadFromFields(TEMP_FSMSG.java:1553)
... 11 more
... View more
Labels:
- Labels:
-
Apache Sqoop
04-17-2017
05:04 AM
> I have millions of records in each table and hundreds of tables, so first option might not be optimal for big tables. > will try out the other options thank you
... View more
04-13-2017
09:07 AM
Hi Is there a way to compare the whole data of a table in hive and the same table in Oracle?
... View more
12-02-2016
11:20 AM
Sqoop import failing with "exception: Java Heap space" when there are no records in Oracle source table. I used a fetchsize of 75000 in sqoop import. sqoop import successfully ran when I removed fetch-size 75000 though no records in source table. I am standardizing the Sqoop import job which can be used for many other tables. And suppose in production, if there are no records, job should not fail. how to avoid this situation. And why it is failing when using bigger fetch-size. Thanks
... View more
Labels:
- Labels:
-
Apache Sqoop
11-25-2016
05:26 AM
Yes, that worked. after giving SELECT_CATALOG_ROLE privileges, direct option is working.
... View more
11-23-2016
07:06 AM
Hi
I am getting error when trying to import the table from Oracle database to Hadoop using Sqoop with --direct utility. The error is, "ERROR manager.SqlManager: Error executing statement: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist" When I take off --direct from sqoop statement, then started importing data. Is there any other property to be added to the Sqoop statement when using --direct utility? Thanks!
... View more
Labels:
- Labels:
-
Apache Sqoop
10-27-2016
04:35 PM
1 Kudo
Try giving it just after sqoop job. Eg: Sqoop job "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" -- import ...
... View more
10-24-2016
08:24 AM
We are not using Sqoop2. Does the security guide applies to Sqoop too?
... View more
10-21-2016
11:42 AM
I'm using HDP2.5 , sqoop 1.4.6. full log: $ sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" --connect jdbc:oracle:thin:@xxxxxx.xx.xxx.xxx:1111:XXXXXXX --table tablename --username <username> -password <password> --hive-import --hive-table <hivetable> --split-by <col> -m 8
Warning: /usr/hdp/2.5.0.0-1245/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME -- --
16/10/21 07:25:01 INFO oracle.OraOopManagerFactory: Data Connector for Oracle and Hadoop is disabled.
16/10/21 07:25:01 INFO manager.SqlManager: Using default fetchSize of 1000
16/10/21 07:25:01 INFO tool.CodeGenTool: Beginning code generation
16/10/21 07:25:03 INFO manager.OracleManager: Time zone has been set to GMT
16/10/21 07:25:03 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "db"."tablename" t WHERE 1=0
16/10/21 07:25:05 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/hdp/2.5.0.0-1245/hadoop-mapreduce
Note: /tmp/sqoop-<username>/compile/163383944ed0d448144da421e24c5571/tablenae.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/10/21 07:25:06 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-<username>/compile/163383944ed0d448144da421e24c5571/db.tablename.jar
16/10/21 07:25:06 INFO mapreduce.ImportJobBase: Beginning import of db.tablename
16/10/21 07:25:06 INFO manager.OracleManager: Time zone has been set to GMT
16/10/21 07:25:08 INFO impl.TimelineClientImpl: Timeline service address: http://xxxxxx.xx.xx.xxx:8188/ws/v1/timeline/
16/10/21 07:25:08 INFO client.AHSProxy: Connecting to Application History server at xxxxxxx.xx.xxxxx.xxx/ipno:10200
16/10/21 07:25:08 WARN ipc.Client: Failed to connect to server: xxxxxxx.xx.xxxxx.xxx/ipno:8032: retries get failed --
--
at org.apache.hadoop.ipc.Client.call(Client.java:1449)
at org.apache.hadoop.ipc.Client.call(Client.java:1396)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy23.getNewApplication(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:221)
-- -- --
http://xxxxxxx.xx.xxxxx.xxx:8088/proxy/application_1476174512012_0126/
16/10/21 07:25:12 INFO mapreduce.Job: Running job: job_1476174512012_0126
16/10/21 07:25:18 INFO mapreduce.Job: Job job_1476174512012_0126 running in uber mode : false
16/10/21 07:25:18 INFO mapreduce.Job: map 0% reduce 0%
16/10/21 07:25:25 INFO mapreduce.Job: map 10% reduce 0%
16/10/21 07:25:26 INFO mapreduce.Job: map 70% reduce 0%
16/10/21 07:25:27 INFO mapreduce.Job: map 90% reduce 0%
16/10/21 07:25:51 INFO mapreduce.Job: map 100% reduce 0%
16/10/21 07:25:51 INFO mapreduce.Job: Job job_1476174512012_0126 completed successfully
16/10/21 07:25:51 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1676345
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1483
HDFS: Number of bytes written=32451988
HDFS: Number of read operations=40
HDFS: Number of large read operations=0
HDFS: Number of write operations=20
Job Counters
Launched map tasks=10
Other local map tasks=10
Total time spent by all maps in occupied slots (ms)=81510
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=81510
Total vcore-milliseconds taken by all map tasks=81510
Total megabyte-milliseconds taken by all map tasks=333864960
Map-Reduce Framework
Map input records=116058
Map output records=116058
Input split bytes=1483
Spilled Records=0
--
GC time elapsed (ms)=769
CPU time spent (ms)=27350
Physical memory (bytes) snapshot=4567121920
Virtual memory (bytes) snapshot=56302190592
Total committed heap usage (bytes)=5829558272
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=32451988
16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Transferred 30.9486 MB in 42.8346 seconds (739.8552 KB/sec)
16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Retrieved 116058 records.
16/10/21 07:25:51 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners
--
16/10/21 07:25:51 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "db"."tablename" t WHERE 1=0
16/10/21 07:25:52 WARN hive.TableDefWriter: Column col1 had to be cast to a less precise type in Hive
16/10/21 07:25:52 INFO hive.HiveImport: Loading uploaded data into Hive Logging initialized using configuration in jar:file:/usr/hdp/2.5.0.0-1245/hive/lib/hive-common-1.2.1000.2.5.0.0-1245.jar!/hive-log4j.properties
OK
Time taken: 1.168 seconds
Loading data to table hivedb.hivetable
Failed with exception java.util.ConcurrentModificationException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
... View more
10-21-2016
11:17 AM
Yes, I can access oracle and using sqoop I can import to HDFS directory by specifying --target-directory in sqoop import. I can access hive too, I created a db, table. in our cluster hive warehouse dir is: /apps/hive/warehouse. why will username comes into warehouse directory. I can't see any userid's under warehouse directory.
... View more
10-21-2016
11:03 AM
Yes, I do have access to that table. I tried "insert overwrite table <managed_table> select * from ext_table;". This has worked. But I also tried, loading data from HDFS path(same path pointed to ext_table in prev query) to managed_table, but failed with the same error.
... View more
10-19-2016
08:53 AM
I want to kerberize the sqoop job. What is the process? What are the things to be taken care to run the sqoop job in Kerberos environement? I didn't find any documentation on this. Your help is most important.
... View more
Labels:
- Labels:
-
Apache Sqoop
10-17-2016
10:20 AM
I am trying to import oracle table to HDFS directory, but getting the error "Generating splits for a textual index column allowed only in case of "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" property passed as a parameter" I fixed the import issue by giving "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" in sqoop import. But why do we need to set this property? I imported other tables without setting this property. When should we set this property?
... View more
Labels:
- Labels:
-
Apache Sqoop
10-17-2016
10:14 AM
1 Kudo
I am trying to import RDBMS Oracle table to Hive using Sqoop --hive-import option.The Sqoop importing process went fine but at the end error'd out saying "Failed with exception java.util.ConcurrentModificationException
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask". When I opened Hive terminal, I could see table created in Hive database, but no records were inserted. Below is the code: sqoop import "-Dorg.apache.sqoop.splitter.allow_text_splitter=true" \
--connect <jdbc:oracle:thin:@connectionstring:portno> \
--table tablename --username <username> -password <Password> \
--hive-import \
--hive-table <hivedb.hivetable> \
--split-by <column> \
-m 8 Do I need to set any parameters? Or Hive Internal tables will have such issues.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Sqoop
09-26-2016
01:41 AM
2 Kudos
I am creating a common Data Ingestion framework using Oozie by making my workflow.xml as a standard one using the parameters and these parameters can be passed thru job.properties file. Now the challenge is: when following the coding standards, I can't add comments for a partiucalar table/project in workflow.xml as it is a standard one, so can I add in the job.properties file which can vary table to table. If you have any other ways please let me know
... View more
Labels:
- Labels:
-
Apache Oozie
09-20-2016
11:40 AM
1 Kudo
@njayakumar passed the Generic arguments first to the sqoop job, now its working fine. Thanks
... View more
09-20-2016
10:30 AM
3 Kudos
I am using password encryption method in Sqoop job for data ingestion into Hadoop. Used Dhadoop.security.credential.provider.path to encrypt the password. But when I try to create the Sqoop job in CLI, it is unable to parse the arguments. Below is the code I used and error I got also mentioned below. CODE sqoop job --create password-test --meta-connect jdbc:hsqldb:hsql://<hostname>:<port>/sqoop -- import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/<username>/<username>.password.jceks --connect "jdbc:oracle:thin:<hostname>:<Port>:<sid>" --username <username> --table <tablename> --password-alias <password-alias-name> --fields-terminated-by '\001' --null-string '\N' --null-non-string '\N' --lines-terminated-by '\n' --target-dir '/user/<username>/<staging loc>' --incremental append --check-column <colname> --last-value <value> --num-mappers 8 ERROR
... View more
Labels: