Member since
01-23-2016
51
Posts
41
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2058 | 02-18-2016 04:34 PM |
07-27-2023
07:42 AM
is this solution is fit in streaming more than puthive3ql for about 10 GB during the day???
... View more
03-21-2018
02:01 PM
I ended up not using NiFi for this. Looking back I tried forcing a solution out of NiFi thst wasn’t a good fit. I spent several weeks and entirely too long trying to solve the most simple case of this project (formatting some text and dumping it to a db). I could certainly see NiFi being useful for moving source data files around from the folders I’m working with (copying, moving etc.) but doing any amount of logic or manipulation of anything but a happy path is extremely tedious and seemingly difficult to do. Knowing that I was going to have to do a lot more work on the data to make it even close to usable, I just scrapped NiFi and implement it in Python. After dealing with this data and running into edge cases over and over again that I wasn’t even aware about when I wrote this topic, the data IMO was just too dirty and had too many exceptions to deal with, with NiFi. On top of that this wasn’t just the import of the data, not even using it so I would have had to have another tool to actually process the data to put it into a usable form anyways. Appreciate the response. You took the time to respond so I figured it was reasonable to respond even though I didn’t end up using the solution.
... View more
01-04-2018
05:25 PM
1 Kudo
Thanks! That seems to work correctly. I'll mark this as the answer as it produces the answer I'm looking for.
... View more
11-28-2017
05:05 PM
I have faced same issue .Please increase the memory before running the hive query .But if you are not able to do grep on xml then you have to split your file on the basis of tags by using gawk. set mapreduce.map.memory.mb=9000; set mapreduce.map.java.opts=-Xmx7200m;
set mapreduce.reduce.memory.mb=9000; set mapreduce.reduce.java.opts=-Xmx7200m;
... View more
03-05-2016
02:46 AM
1 Kudo
I can't seem to reply to your last comment but that was exactly the problem.
... View more
03-04-2016
05:47 PM
1 Kudo
I'm going to accept your answer for this question as I ended up writing a UDF to solve the potential slow issue doing all the XPaths multiple times. But the general gist of the thread still applies just different problems.
I ended up partially "solving" the issue with having 300 columns (in HiveCLI) in a table by disabling Apache Atlas in HDP. Apparently Atlas was intercepting the queries and blowing up when the query become too long. I would venture to guess this is a bug in Atlas. After fixing that, I worked on writing the UDF and making it permanent so it could be used by the application using an ODBC connection. I used the CREATE FUNCTION statement and that works....except it only made the function permanent in the HiveCLI context, an ODBC or even Hue context the function doesn't exist. Ended up having to just run the CREATE FUNCTION statement in the Hue/ODBC Application context. Unless im missing a configuration setting that I'm not aware of I assume this is another bug. Once I did that I was able to get the HiveCLI to work with all 400+ columns with the UDF. I thought I was done but unfortunately, ran into another issue when I tried to run the same query that worked in the HiveCLI in Hue/ODBC App. This issue is a similar issue with the first error...if I only have ~250 columns in the query it works in Hue/ODBC application. Currently investigating this problem. But these are examples of the original sentiment of the original post. 2016-03-04 10:47:55,417 WARN [HiveServer2-HttpHandler-Pool: Thread-34]: thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error fetching results:
org.apache.hive.service.cli.HiveSQLException: Expected state FINISHED, but found ERROR
at org.apache.hive.service.cli.operation.Operation.assertState(Operation.java:161)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:334)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:221)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:685)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy19.fetchResults(Unknown Source)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:171)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
... View more
02-25-2016
02:12 PM
@Neeraj Sabharwal There are two errors I've been fighting with on getting access to all of these columns in the same query. The second one I *thought* I had a work around for by disabling security (unchecking the security box in Ambari for Hive) but it keeps showing back up. Here is the defect which I think I'm running into for the FULL head issue. UPDATE: I'm about 99.99% sure I figured out the problem! I started looking further into the ERROR logs. This line here "at org.apache.atlas.security.SecureClientUtils$1$1.run(SecureClientUtils.java:103)" tipped me off that in some way ATLAS was being interacted with. I disabled ATLAS by turning off the atlas service and removing hive.exec.failure.hooks=org.apache.atlas.hive.hook.HiveHook I ran my entire query and it worked without issue! I would venture to say that this is an issue with ATLAS not being able to handle really long queries. https://issues.apache.org/jira/browse/HIVE-11720 1. Error writing to server https://gist.github.com/kur1j/513e5a1499eef6c727a1 2. FULL head https://gist.github.com/kur1j/217eae2065c7953d9cf7
... View more
02-18-2016
05:45 PM
Actually slightly different issue now though. some other error.
... View more
02-12-2016
05:10 AM
4 Kudos
@Kevin Vasko Knox provides REST API, so just consider Knox to be an https or http site and you want to access it using GET, PUT, POST etc with simple authentication. You can use C# classes like WebRequest and NetworkCredentials, here is one good looking sample. Try first to list a directory (GET). And note that file upload is a 2-step PUT process.
... View more
02-02-2016
05:10 PM
@Kevin Vasko has this been resolved? Can you accept the best answer or provide your own solution?
... View more