Member since
04-11-2016
174
Posts
29
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3307 | 06-28-2017 12:24 PM | |
2494 | 06-09-2017 07:20 AM | |
6983 | 08-18-2016 11:39 AM | |
5048 | 08-12-2016 09:05 AM | |
5244 | 08-09-2016 09:24 AM |
05-13-2016
09:11 AM
Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1 The source DB schema is on sql server and it contains several tables which either have primary key as : A varchar Composite - two varchar columns or one varchar + one int column or two int columns. There is a large table with ? rows which has three columns in the PK one int + two varchar columns As per the Sqoop documentation : Sqoop cannot currently split on multi-column indices. If your table has no index column, or has a multi-column key, then you must also manually choose a splitting column. The first question is : What is expected by 'manually choose a splitting column' - how can I sacrifice the pk and just use one column or am I missing some concept ? I proceeded with the import, the source table has 7909097 records : sqoop import --connect 'jdbc:sqlserver://somedbserver;database=somedb' --username someusname --password somepass --as-textfile --fields-terminated-by '|&|' --table ChassiECU --num-mappers 8 --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose The worrisome warnings and the incorrect mapper inputs and records : 16/05/13 10:59:04 WARN manager.CatalogQueryManager: The table ChassiECU contains a multi-column primary key. Sqoop will default to the column ChassiNo only for this job.
16/05/13 10:59:08 WARN db.TextSplitter: Generating splits for a textual index column.
16/05/13 10:59:08 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
16/05/13 10:59:08 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
16/05/13 10:59:38 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=1168400
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1128
HDFS: Number of bytes written=209961941
HDFS: Number of read operations=32
HDFS: Number of large read operations=0
HDFS: Number of write operations=16
Job Counters
Launched map tasks=8
Other local map tasks=8
Total time spent by all maps in occupied slots (ms)=62785
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=62785
Total vcore-seconds taken by all map tasks=62785
Total megabyte-seconds taken by all map tasks=128583680
Map-Reduce Framework
Map input records=15818167
Map output records=15818167
Input split bytes=1128
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=780
CPU time spent (ms)=45280
Physical memory (bytes) snapshot=2219433984
Virtual memory (bytes) snapshot=20014182400
Total committed heap usage (bytes)=9394716672
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=209961941
16/05/13 10:59:38 INFO mapreduce.ImportJobBase: Transferred 200.2353 MB in 32.6994 seconds (6.1235 MB/sec)
16/05/13 10:59:38 INFO mapreduce.ImportJobBase: Retrieved 15818167 records. Created table : CREATE EXTERNAL TABLE IF NOT EXISTS ChassiECU(`ChassiNo` varchar(8),
`ECU_Name` varchar(15)) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LOCATION '/dataload/tohdfs/reio/odpdw/may2016/ChassiECU'; Awful result(without errors) --PROBLEM : 15818167 vs 7909097(sql server) records: > select count(1) from ChassiECU;
Query ID = hive_20160513110313_8e294d83-78aa-4e52-b90f-b5640268b8ac
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1446726117927_0059)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 14 14 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 6.12 s
--------------------------------------------------------------------------------
OK
_c0
15818167 Surprisingly, I got either accurate or a mismatch of less than 10 records if the composite key consisted of an int(which was used for splitting) but I am still apprehensive about those as well ! How shall I proceed ?
... View more
Labels:
- Labels:
-
Apache Hive
05-12-2016
12:13 PM
1 Kudo
Stack : Installed HDP-2.3.2.0-2950 using Ambari 2.1 sqoop import
sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username username --password password --as-textfile --fields-terminated-by '|' --table DimECU --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose create external table CREATE EXTERNAL TABLE IF NOT EXISTS DimECU (`ECU_ID` int,`ECU_Name` varchar(15),`ECU_FAMILY_NAME` varchar(15),`INSERTED_BY`varchar(64),`INSERTION_DATE` timestamp) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '|' STORED AS ORC LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimECU'; Can't select the data : hive (odp_dw_may2016_orc)>
>
> select * from DimECU limit 5;
OK
dimecu.ecu_id dimecu.ecu_name dimecu.ecu_family_name dimecu.inserted_by dimecu.insertion_date
Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript.
Time taken: 0.074 seconds The exception is : 2016-05-12 13:17:26,334 ERROR [main]: CliDriver (SessionState.java:printError(960)) - Failed with exception java.io.IOException:org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript.
java.io.IOException: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript.
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.hive.ql.io.FileFormatException: Malformed ORC file hdfs://l1031lab.sss.se.scania.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimECU/part-m-00000. Invalid postscript.
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.ensureOrcFooter(ReaderImpl.java:251)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:376)
at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:317)
at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:237)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getReader(OrcInputFormat.java:1208)
at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getRecordReader(OrcInputFormat.java:1117)
at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:674)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:324)
at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
... 15 more
2016-05-12 13:17:26,334 INFO [main]: exec.TableScanOperator (Operator.java:close(613)) - 0 finished. closing...
2016-05-12 13:17:26,334 INFO [main]: exec.SelectOperator (Operator.java:close(613)) - 1 finished. closing...
2016-05-12 13:17:26,334 INFO [main]: exec.LimitOperator (Operator.java:close(613)) - 2 finished. closing...
2016-05-12 13:17:26,334 INFO [main]: exec.ListSinkOperator (Operator.java:close(613)) - 4 finished. closing...
2016-05-12 13:17:26,334 INFO [main]: exec.ListSinkOperator (Operator.java:close(635)) - 4 Close done
2016-05-12 13:17:26,334 INFO [main]: exec.LimitOperator (Operator.java:close(635)) - 2 Close done
2016-05-12 13:17:26,335 INFO [main]: exec.SelectOperator (Operator.java:close(635)) - 1 Close done
2016-05-12 13:17:26,335 INFO [main]: exec.TableScanOperator (Operator.java:close(635)) - 0 Close done
2016-05-12 13:17:26,352 INFO [Atlas Logger 0]: security.SecureClientUtils (SecureClientUtils.java:getClientConnectionHandler(91)) - Real User: hive (auth:SIMPLE), is from ticket cache? false
2016-05-12 13:17:26,353 INFO [Atlas Logger 0]: security.SecureClientUtils (SecureClientUtils.java:getClientConnectionHandler(94)) - doAsUser: hive
2016-05-12 13:17:26,356 INFO [main]: CliDriver (SessionState.java:printInfo(951)) - Time taken: 0.065 seconds
2016-05-12 13:17:26,356 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 13:17:26,356 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1463051846356 end=1463051846356 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2016-05-12 13:17:26,989 INFO [Atlas Logger 0]: hook.HiveHook (HiveHook.java:run(168)) - Atlas hook failed
org.apache.atlas.AtlasServiceException: Metadata service API SEARCH_GREMLIN failed with status 400(Bad Request) Response Body ({"error":"javax.script.ScriptException: javax.script.ScriptException: com.thinkaurelius.titan.core.TitanException: Could not start new transaction","stackTrace":"org.apache.atlas.discovery.DiscoveryException: javax.script.ScriptException: javax.script.ScriptException: com.thinkaurelius.titan.core.TitanException: Could not start new transaction\n\tat org.apache.atlas.discovery.graph.GraphBackedDiscoveryService.searchByGremlin(GraphBackedDiscoveryService.java:175)\n\tat org.apache.atlas.GraphTransactionInterceptor.invoke(GraphTransactionInterceptor.java:41)\n\tat org.apache.atlas.web.resources.MetadataDiscoveryResource.searchUsingGremlinQuery(MetadataDiscoveryResource.java:155)\n\tat sun.reflect.GeneratedMethodAccessor74.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)\n\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:820)\n\tat com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)\n\tat com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)\n\tat com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)\n\tat com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)\n\tat org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:67)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)\n\tat com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)\n\tat com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)\n\tat com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)\n\tat org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)\n\tat org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)\n\tat org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)\n\tat org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)\n\tat org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)\n\tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)\n\tat org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat org.mortbay.jetty.Server.handle(Server.java:326)\n\tat org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)\n\tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)\n\tat org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)\n\tat org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)\n\tat org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)\n\tat org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)\nCaused by: javax.script.ScriptException: javax.script.ScriptException: com.thinkaurelius.titan.core.TitanException: Could not start new transaction\n\tat com.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:94)\n\tat javax.script.AbstractScriptEngine.eval(AbstractScriptEngine.java:233)\n\tat org.apache.atlas.discovery.graph.GraphBackedDiscoveryService.searchByGremlin(GraphBackedDiscoveryService.java:172)\n\t... 48 more\nCaused by: javax.script.ScriptException: com.thinkaurelius.titan.core.TitanException: Could not start new transaction\n\tat com.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:221)\n\tat com.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:90)\n\t... 50 more\nCaused by: com.thinkaurelius.titan.core.TitanException: Could not start new transaction\n\tat com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.newTransaction(StandardTitanGraph.java:276)\n\tat com.thinkaurelius.titan.graphdb.transaction.StandardTransactionBuilder.start(StandardTransactionBuilder.java:220)\n\tat com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.newThreadBoundTransaction(StandardTitanGraph.java:265)\n\tat com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.getAutoStartTx(TitanBlueprintsGraph.java:104)\n\tat com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.query(TitanBlueprintsGraph.java:225)\n\tat com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.query(TitanBlueprintsGraph.java:27)\n\tat com.tinkerpop.pipes.transform.GraphQueryPipe.processNextStart(GraphQueryPipe.java:34)\n\tat com.tinkerpop.pipes.transform.GraphQueryPipe.processNextStart(GraphQueryPipe.java:17)\n\tat com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)\n\tat com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)\n\tat com.tinkerpop.pipes.AbstractPipe.next(AbstractPipe.java:89)\n\tat com.tinkerpop.pipes.IdentityPipe.processNextStart(IdentityPipe.java:19)\n\tat com.tinkerpop.pipes.AbstractPipe.hasNext(AbstractPipe.java:98)\n\tat com.tinkerpop.pipes.util.Pipeline.hasNext(Pipeline.java:105)\n\tat org.codehaus.groovy.runtime.DefaultGroovyMethods.toList(DefaultGroovyMethods.java:1946)\n\tat org.codehaus.groovy.runtime.dgm$836.invoke(Unknown Source)\n\tat org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite$PojoMetaMethodSiteNoUnwrapNoCoerce.invoke(PojoMetaMethodSite.java:271)\n\tat org.codehaus.groovy.runtime.callsite.PojoMetaMethodSite.call(PojoMetaMethodSite.java:53)\n\tat org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:42)\n\tat org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)\n\tat org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:112)\n\tat Script180.run(Script180.groovy:1)\n\tat com.tinkerpop.gremlin.groovy.jsr223.GremlinGroovyScriptEngine.eval(GremlinGroovyScriptEngine.java:219)\n\t... 51 more\nCaused by: com.thinkaurelius.titan.diskstorage.PermanentBackendException: Could not start BerkeleyJE transaction\n\tat com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:144)\n\tat com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:34)\n\tat com.thinkaurelius.titan.diskstorage.keycolumnvalue.keyvalue.OrderedKeyValueStoreManagerAdapter.beginTransaction(OrderedKeyValueStoreManagerAdapter.java:52)\n\tat com.thinkaurelius.titan.diskstorage.Backend.beginTransaction(Backend.java:465)\n\tat com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.openBackendTransaction(StandardTitanGraph.java:282)\n\tat com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.newTransaction(StandardTitanGraph.java:272)\n\t... 73 more\nCaused by: com.sleepycat.je.LogWriteException: (JE 5.0.73) Environment must be closed, caused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE 5.0.73) \/var\/lib\/atlas\/data\/berkeley java.io.IOException: No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must be closed.\n\tat com.sleepycat.je.LogWriteException.wrapSelf(LogWriteException.java:72)\n\tat com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1512)\n\tat com.sleepycat.je.Environment.checkEnv(Environment.java:2185)\n\tat com.sleepycat.je.Environment.beginTransactionInternal(Environment.java:1313)\n\tat com.sleepycat.je.Environment.beginTransaction(Environment.java:1284)\n\tat com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJEStoreManager.beginTransaction(BerkeleyJEStoreManager.java:134)\n\t... 78 more\nCaused by: com.sleepycat.je.LogWriteException: Environment invalid because of previous exception: (JE 5.0.73) \/var\/lib\/atlas\/data\/berkeley java.io.IOException: No space left on device LOG_WRITE: IOException on write, log is likely incomplete. Environment is invalid and must be closed.\n\tat com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1652)\n\tat com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:260)\n\tat com.sleepycat.je.log.LogBufferPool.writeCompleted(LogBufferPool.java:345)\n\tat com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:716)\n\tat com.sleepycat.je.log.LogManager.serialLogInternal(LogManager.java:493)\n\tat com.sleepycat.je.log.SyncedLogManager.serialLog(SyncedLogManager.java:42)\n\tat com.sleepycat.je.log.LogManager.multiLog(LogManager.java:395)\n\tat com.sleepycat.je.log.LogManager.log(LogManager.java:335)\n\tat com.sleepycat.je.txn.Txn.logCommitEntry(Txn.java:957)\n\tat com.sleepycat.je.txn.Txn.commit(Txn.java:719)\n\tat com.sleepycat.je.txn.Txn.commit(Txn.java:584)\n\tat com.sleepycat.je.Transaction.commit(Transaction.java:317)\n\tat com.thinkaurelius.titan.diskstorage.berkeleyje.BerkeleyJETx.commit(BerkeleyJETx.java:81)\n\tat com.thinkaurelius.titan.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:198)\n\tat com.thinkaurelius.titan.diskstorage.BackendTransaction.commitStorage(BackendTransaction.java:117)\n\tat com.thinkaurelius.titan.graphdb.database.StandardTitanGraph.commit(StandardTitanGraph.java:670)\n\tat com.thinkaurelius.titan.graphdb.transaction.StandardTitanTx.commit(StandardTitanTx.java:1337)\n\tat com.thinkaurelius.titan.graphdb.blueprints.TitanBlueprintsGraph.commit(TitanBlueprintsGraph.java:60)\n\tat org.apache.atlas.GraphTransactionInterceptor.invoke(GraphTransactionInterceptor.java:42)\n\tat org.apache.atlas.services.DefaultMetadataService.createEntity(DefaultMetadataService.java:231)\n\tat org.apache.atlas.web.resources.EntityResource.submit(EntityResource.java:96)\n\tat sun.reflect.GeneratedMethodAccessor41.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)\n\tat com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$ResponseOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:205)\n\tat com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)\n\tat com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)\n\tat com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)\n\tat com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)\n\tat com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349)\n\tat com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339)\n\tat com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:409)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:558)\n\tat com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:733)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:820)\n\tat com.google.inject.servlet.ServletDefinition.doServiceImpl(ServletDefinition.java:287)\n\tat com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:277)\n\tat com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:182)\n\tat com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:85)\n\tat org.apache.atlas.web.filters.AuditFilter.doFilter(AuditFilter.java:67)\n\tat com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:82)\n\tat com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:119)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:133)\n\tat com.google.inject.servlet.GuiceFilter$1.call(GuiceFilter.java:130)\n\tat com.google.inject.servlet.GuiceFilter$Context.call(GuiceFilter.java:203)\n\tat com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:130)\n\tat org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)\n\tat org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)\n\tat org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)\n\tat org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)\n\tat org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)\n\tat org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)\n\tat org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)\n\tat org.mortbay.jetty.Server.handle(Server.java:326)\n\tat org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)\n\tat org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)\n\tat org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)\n\tat org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)\n\t... 3 more\nCaused by: java.io.IOException: No space left on device\n\tat java.io.RandomAccessFile.writeBytes0(Native Method)\n\tat java.io.RandomAccessFile.writeBytes(RandomAccessFile.java:520)\n\tat java.io.RandomAccessFile.write(RandomAccessFile.java:550)\n\tat com.sleepycat.je.log.FileManager.writeToFile(FileManager.java:1757)\n\tat com.sleepycat.je.log.FileManager.writeLogBuffer(FileManager.java:1637)\n\t... 65 more\n"})
at org.apache.atlas.AtlasClient.callAPIWithResource(AtlasClient.java:365)
at org.apache.atlas.AtlasClient.callAPIWithResource(AtlasClient.java:346)
at org.apache.atlas.AtlasClient.searchByGremlin(AtlasClient.java:294)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.getEntityReferenceFromGremlin(HiveMetaStoreBridge.java:227)
at org.apache.atlas.hive.bridge.HiveMetaStoreBridge.getProcessReference(HiveMetaStoreBridge.java:183)
at org.apache.atlas.hive.hook.HiveHook.registerProcess(HiveHook.java:297)
at org.apache.atlas.hive.hook.HiveHook.fireAndForget(HiveHook.java:202)
at org.apache.atlas.hive.hook.HiveHook.access$200(HiveHook.java:54)
at org.apache.atlas.hive.hook.HiveHook$2.run(HiveHook.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
... View more
Labels:
- Labels:
-
Apache Hive
05-10-2016
08:38 AM
Edited my original post with Sqoop and Hive outputs - the nvarchar is auto. converted to string - can you help ?
... View more
05-10-2016
08:00 AM
I am novice to data loading and Avro so I can't verify the avro schema etc. provided in the post but I did face the same issue (java.io.IOException: java.io.IOException: Not a data file). I am listing below the steps I carried out to load a sql server table to avro and then to a Hive external table, maybe, it will provide some pointers : --create dir. for sqoop import [hdfs@l1031lab root]$ hadoop fs -mkdir -p /dataload/tohdfs/reio/odpdw/may2016/
[hdfs@l1031lab root]$ --grant permissions to sqoop [hdfs@l1031lab root]$ hadoop fs -chown -R sqoop /dataload
[hdfs@l1031lab root]$
[hdfs@l1031lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/
[hdfs@l1031lab root]$ --sqoop import sqoop import --connect 'jdbc:sqlserver://dbserver;database=dbname' --username someusername --password somepassword --as-avrodatafile --num-mappers 8 --table DimSampleDesc --warehouse-dir /dataload/tohdfs/reio/odpdw/may2016 --verbose An exception was thrown but the avro files were created : Writing Avro schema file: /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/DimSampleDesc.avsc
16/05/09 13:09:00 DEBUG mapreduce.DataDrivenImportJob: Could not move Avro schema file to code output directory.
java.io.FileNotFoundException: Destination directory '.' does not exist [createDestDir=true]
at org.apache.commons.io.FileUtils.moveFileToDirectory(FileUtils.java:2865)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.writeAvroSchema(DataDrivenImportJob.java:146)
at org.apache.sqoop.mapreduce.DataDrivenImportJob.configureMapper(DataDrivenImportJob.java:92)
at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:260)
at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
at org.apache.sqoop.manager.SQLServerManager.importTable(SQLServerManager.java:163)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:148)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:184)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:226)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:235)
at org.apache.sqoop.Sqoop.main(Sqoop.java:244) error in moving the schema file, hence, do manually. If this step is skipped, the table will still get created but when you do a 'select' on the table, you get the 'Not a data file' error message. [sqoop@l1038lab root]$ hadoop fs -copyFromLocal /tmp/sqoop-sqoop/compile/e64596608ce0247bf2233353991b20fd/* /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/ to allow hive to write, give write permissions to all [sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/
[sqoop@l1038lab root]$ view the warehouse-dir [sqoop@l1038lab root]$ hadoop fs -ls /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc
Found 15 items
-rw-rw-rw- 3 sqoop hdfs 3659 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
-rw-rw-rw- 3 sqoop hdfs 28540 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.class
-rw-rw-rw- 3 sqoop hdfs 9568 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.jar
-rw-rw-rw- 3 sqoop hdfs 61005 2016-05-09 16:18 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.java
-rw-rw-rw- 3 sqoop hdfs 0 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/_SUCCESS
-rw-rw-rw- 3 sqoop hdfs 2660 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro
-rw-rw-rw- 3 sqoop hdfs 1465072 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00001.avro
-rw-rw-rw- 3 sqoop hdfs 3577459 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00002.avro
-rw-rw-rw- 3 sqoop hdfs 1203520 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00003.avro
-rw-rw-rw- 3 sqoop hdfs 236282 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00004.avro
-rw-rw-rw- 3 sqoop hdfs 638532 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00005.avro
-rw-rw-rw- 3 sqoop hdfs 850454 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00006.avro
-rw-rw-rw- 3 sqoop hdfs 257025 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00007.avro
-rw-rw-rw- 3 sqoop hdfs 341204 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00008.avro
-rw-rw-rw- 3 sqoop hdfs 4792 2016-05-09 16:10 /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00009.avro
[sqoop@l1038lab root]$ view the schema file [sqoop@l1038lab root]$ hadoop fs -cat /dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
{
"type" : "record",
"name" : "DimSampleDesc",
"doc" : "Sqoop import of DimSampleDesc",
"fields" : [ {
"name" : "SmapiName_ver",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName_ver",
"sqlType" : "12"
}, {
"name" : "SmapiColName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiColName",
"sqlType" : "12"
}, {
"name" : "ChartType",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "ChartType",
"sqlType" : "4"
}, {
"name" : "X_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_Indexet",
"sqlType" : "4"
}, {
"name" : "Y_Indexet",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_Indexet",
"sqlType" : "4"
}, {
"name" : "X_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Tick",
"sqlType" : "-9"
}, {
"name" : "Y_Tick",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Tick",
"sqlType" : "-9"
}, {
"name" : "X_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRange",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "X_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRange",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRange",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeFrom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeFrom",
"sqlType" : "-9"
}, {
"name" : "Y_TickRangeTom",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_TickRangeTom",
"sqlType" : "-9"
}, {
"name" : "IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "IndexCount",
"sqlType" : "4"
}, {
"name" : "X_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "X_IndexCount",
"sqlType" : "4"
}, {
"name" : "Y_IndexCount",
"type" : [ "null", "int" ],
"default" : null,
"columnName" : "Y_IndexCount",
"sqlType" : "4"
}, {
"name" : "X_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_Symbol",
"sqlType" : "-9"
}, {
"name" : "X_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolName",
"sqlType" : "-9"
}, {
"name" : "X_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "X_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "Y_Symbol",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_Symbol",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolName",
"sqlType" : "-9"
}, {
"name" : "Y_SymbolDescr",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "Y_SymbolDescr",
"sqlType" : "-9"
}, {
"name" : "SmapiName",
"type" : [ "null", "string" ],
"default" : null,
"columnName" : "SmapiName",
"sqlType" : "12"
}, {
"name" : "Incorrect_Ver_FL",
"type" : [ "null", "boolean" ],
"default" : null,
"columnName" : "Incorrect_Ver_FL",
"sqlType" : "-7"
} ],
"tableName" : "DimSampleDesc"
}[sqoop@l1038lab root]$ create the hive table in the schema i.e first do a 'use database' on the hive prompt and then create the table : hive>
> CREATE EXTERNAL TABLE DimSampleDesc ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc');
OK
Time taken: 0.37 seconds
hive> --describe the table hive>
>
> describe formatted DimSampleDesc;
OK
# col_name data_type comment
smapiname_ver string
smapicolname string
charttype int
x_indexet int
y_indexet int
x_tick string
y_tick string
x_tickrange string
x_tickrangefrom string
x_tickrangetom string
y_tickrange string
y_tickrangefrom string
y_tickrangetom string
indexcount int
x_indexcount int
y_indexcount int
x_symbol string
x_symbolname string
x_symboldescr string
y_symbol string
y_symbolname string
y_symboldescr string
smapiname string
incorrect_ver_fl boolean
# Detailed Table Information
Database: odp_dw_may2016
Owner: hive
CreateTime: Mon May 09 16:37:08 CEST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://l1031lab.sss.se.com:8020/apps/hive/warehouse/odp_dw_may2016.db/dimsampledesc
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
avro.schema.url hdfs://l1031lab.sss.se.com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/DimSampleDesc.avsc
numFiles 6
numRows -1
rawDataSize -1
totalSize 8566342
transient_lastDdlTime 1462804628
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.avro.AvroSerDe
InputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
serialization.format 1
Time taken: 0.431 seconds, Fetched: 56 row(s)
hive> --execute some select queries hive>
> select * from DimSampleDesc limit 5;
OK
Abnormal_ShutDown_ct_1 Abnormal_ShutDown_ct_1_000 7 0 0 1 1 1 times counts - Abnormal_ShutDown_ct NULL
Abnormal_ShutDown_ct_2 Abnormal_ShutDown_ct_2_000 7 0 0 1 1 1 % percent - Abnormal_ShutDown_ct NULL
ABS_ActivCOO_ct_1 ABS_ActivCOO_ct_1_000 7 0 0 1 1 1 times counts - ABS_ActivCOO_ct NULL
ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_000 1 0 0 0 0 0 0 3 3 0 times counts - ABS_ActivCOO_ct NULL
ABS_ActivCOO_ct_2 ABS_ActivCOO_ct_2_001 1 1 0 1 1 1 1 3 3 0 times counts - ABS_ActivCOO_ct NULL
Time taken: 0.095 seconds, Fetched: 5 row(s)
hive> The core is to have the avsc and avro files in place and ensure proper permissions. I hope that helps.
... View more
05-09-2016
11:39 AM
I proceeded with varchar, what is your opinion ? Shall I use String or VARCHAR ?
... View more
05-09-2016
11:12 AM
I think I have found the root cause - the .avsc file is created in the tmp dir. and results in error while moving but I will put this in a separate post and once that is solved, will resume this.
... View more
05-09-2016
08:21 AM
Yeah even I share your opinion. I am uncomfortable with installing any new tools on the cluster machines, I did find jars like /usr/hdp/2.3.2.0-2950/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar and /usr/hdp/2.3.2.0-2950/pig/lib/avro-tools-1.7.5-nodeps.jar but I am unable to find the classes/commands to be used merely to view the avro files already present on HDFS.
... View more
05-06-2016
03:18 PM
@Predag Minovic Couldn't comment on the post despite several attempts, hence, this 'answer' Changed the permissions for Hive to have write access [sqoop@l1038lab root]$ hadoop fs -chmod -R a+w /dataload/[sqoop@l1038lab root]$ The table gets created but all fields are NULL : hive> > > CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, ORC table' STORED AS Avro LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/';OKTime taken: 0.206 secondshive> hive> > select * from DimSampleDesc limit 10 ;OKNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLTime taken: 0.099 seconds, Fetched: 10 row(s)hive> > Even providing all options doesn't help : hive> > > drop table DimSampleDesc;OKTime taken: 0.422 secondshive> > > CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, AVRO table' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/';OKTime taken: 0.169 secondshive> > select * from DimSampleDesc limit 10 ;OKNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLNULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL NULLTime taken: 0.107 seconds, Fetched: 10 row(s)hive> This NULL is bothering me - I didn't get any errors in Sqoop while importing the file onto HDFS. Is there an issue - in data import in the first place or this Hive table creation ?
... View more
05-06-2016
11:25 AM
Another update :
The empty table is getting created but data load is failing :
hive>
>
> CREATE EXTERNAL TABLE DimSampleDesc ( SmapiName_ver varchar(30) ,SmapiColName varchar(35),ChartType int,X_Indexet int,Y_Indexet int,X_Tick varchar(20),Y_Tick varchar(20),X_TickRange varchar(40),X_TickRangeFrom varchar(20),X_TickRangeTom varchar(20),Y_TickRange varchar(40),Y_TickRangeFrom varchar(20),Y_TickRangeTom varchar(20),IndexCount int,X_IndexCount int,Y_IndexCount int,X_Symbol varchar(10),X_SymbolName varchar(40),X_SymbolDescr varchar(40),Y_Symbol varchar(10),Y_SymbolName varchar(40),Y_SymbolDescr varchar(40),SmapiName varchar(30),Incorrect_Ver_FL boolean ) COMMENT 'EXTERNAL, ORC table' ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS ORC LOCATION '/datastore/hdfstohive/reio/odpdw/may2016';
OK
Time taken: 0.187 seconds
hive>
> select * from DimSampleDesc;
OK
Time taken: 0.367 seconds
hive>
hive>
> LOAD DATA INPATH '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/*.avro' INTO TABLE DimSampleDesc;
Loading data to table odp_dw_may2016.dimsampledesc
Failed with exception Unable to move source hdfs://l1031lab.sss.se..com:8020/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc/part-m-00000.avro to destination hdfs://l1031lab.sss.se.com:8020/datastore/hdfstohive/reio/odpdw/may2016/part-m-00000.avro
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
hive>
>
... View more
05-06-2016
11:08 AM
I partially understood the solution that you are suggesting : Does it mean that there are two tables that I need to create - first say DimSampleDesc_avro(external table) and then say DimSampleDesc_orc(MANAGED table) and then drop DimSampleDesc_avro. Doesn't it defeat the original purpose of creating an external table, moreover, are two tables mandatory ? Is it possible to provide Hive another location to write, I mean will it meddle with '/dataload/tohdfs/reio/odpdw/may2016/DimSampleDesc' or ? As shown in the pic. below, I have created another location where I wish Hive to store the tables
... View more
- « Previous
- Next »