Member since
01-23-2016
51
Posts
41
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
466 | 02-18-2016 04:34 PM |
05-09-2018
03:27 PM
I found this https://stackoverflow.com/questions/43991845/kafka10-1-heartbeat-interval-ms-session-timeout-ms-and-max-poll-interval-ms?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa and it gives the description between the 3 but I don't quite understand why I'm still getting timeouts. I don't understand it in the context of my code why I have to increase the session_timeout_ms. I would expect to have to change the max_poll_interval_ms. I have a consumer... def my_consumer(self):
consumer = KafkaConsumer("topic", bootstrap_servers=self.kafka_server + ":" + str(self.kafka_port),
enable_auto_commit=False,
group_id="topic_group",
session_timeout_ms=300000)
for msg in consumer:
dict = json.loads(msg.value.decode('UTF-8'))
self.lots_of_work(dict["id"]) //this can last for 5-15 minutes
consumer.commit() I start up my consumer, and it starts working on stuff. It will process a few messages and then when it runs into a message that runs over the default 10s session_timeout_ms it blows an error stating its getting kicked out of group. If I increase the session_timeout_ms it works (until it goes past the new longer timeout). The heartbeat thread should be in its own thread started by the KafkaConsumer call is it not? Where is the heartbeat thread being started and maintained? What can I check to figure out why the heartbeat is timing out? I could completely understand increasing the max_poll_interval_ms as thats "my" thread (e.g. the lots_of_work), but I don't quite get why the session_timeout_ms would need to be changed.
... View more
Labels:
04-03-2018
08:52 PM
I am looking for suggestions on how to avoid a script soup mess to setup a pipeline to send files down through a processing. Unstructured files in my case are binary files which in this case are proprietary CAD files. pipeline. A 30k foot view currently. The CAD files are in a PostgreSQL database that have table references to all the metadata and a file pointer to the actual file on the file system. What this pipeline needs to do is: 1. Read each record and do a conversion to a standard (non proprietary) format (we already have a tool to do the actual conversion of the file). 2. Insert a record into another table with references to the part on the FS along with all of its metadata (filename, hash, log of conversion etc.) 3. Take each record from part and do CAD extraction on the data (again tools already built to do this extraction of the data we want it just needs to be called through the pipeline). 4. Insert the extracted data into the database (keeping references all intact through the process). 5. Lots of other extraction/conversion/calculations...that I won't go into. Once I can get an architecture/pattern working, I feel adding on to this will be fairly trivial. We are doing all of this in Python and unfortunately has to be on Windows. I built a pipeline with Luigi to do the conversion before we had the database and it worked, but after about 5-10k tasks it started to choke under its own weight. Once we started putting the metadata and stuff into the database I figured out that its a LOT harder writing tasks with Luigi when dealing with the database. I don't think NiFi gives me any benefit. Would something like Kafka provide the structure for a pipeline of what I'm wanting? After reading about it, but seems like I could still end up with a bit of script soup potentially? For example I was thinking something for step 1) have a producer sending each record to a Kafka "step 1 topic". On the other end, I could have a python script (could there be something used in place to manage these scripts?) as the consumer reading the topic and processing each one as it hits the queue? At that point just sending messages and having producers and consumers processing items as each their respective topics. Would there be a better way of doing this? Managing this?
... View more
Labels:
03-21-2018
02:01 PM
I ended up not using NiFi for this. Looking back I tried forcing a solution out of NiFi thst wasn’t a good fit. I spent several weeks and entirely too long trying to solve the most simple case of this project (formatting some text and dumping it to a db). I could certainly see NiFi being useful for moving source data files around from the folders I’m working with (copying, moving etc.) but doing any amount of logic or manipulation of anything but a happy path is extremely tedious and seemingly difficult to do. Knowing that I was going to have to do a lot more work on the data to make it even close to usable, I just scrapped NiFi and implement it in Python. After dealing with this data and running into edge cases over and over again that I wasn’t even aware about when I wrote this topic, the data IMO was just too dirty and had too many exceptions to deal with, with NiFi. On top of that this wasn’t just the import of the data, not even using it so I would have had to have another tool to actually process the data to put it into a usable form anyways. Appreciate the response. You took the time to respond so I figured it was reasonable to respond even though I didn’t end up using the solution.
... View more
01-05-2018
08:51 PM
I've googled everywhere for this and everything I run across its super complicated. It should be relatively simple to do. The recommendations show to look at the "Example_With_CSV.xml" from here. So given a flowfile thats a CSV. 2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True)" I need $date = 2017-09-20 23:49:49:38.637 $id = 162929511757 ... $instanceid = 36095 $comment = "Failed, to fit max attempts (1=>3), fit failing, entirely (Fit Failure=True)" OR $csv.date = ... $csv.id = ... ... $csv.instanceid = ... $csv.comment = .. Is there another easier option to do this besides RegEx? I can't stand to do anything with RegEx as how unreadable, and overly complicated they are. To me there should be a significantly easier way of doing this than with RegEx. https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates but it doesn't have anything in there related to actually getting the columns of each value out.
... View more
Labels:
01-05-2018
08:34 PM
There is no example in the "Working_With_CSV" template of how to extract each individual field into attributes.
... View more
01-04-2018
07:36 PM
I'm struggling with setting up data ingestion pipeline/ETL pipelines/processing pipelines/architectures. I've had MANY tasks over the past few years that deal with ETLing data and getting it put into a usable form and I'm struggling with setting up a reasonable process that is general enough, easy enough, reusable enough to take care of the tasks I throw at it. These projects are normally completely separate and have NO overlap (project 1 does not necessarily have anything to do with project 2 and can be completely unrelated, with completely different set of circumstances, restrictions etc). In the past for most of these projects I've written an application in python (normally) to do everything from pulling the files from their source (XML, CSV, other databases), processed the data accordingly, and pushed it to their respective destinations. This works, and while having to write everything with python from pulling the data, to processing it, to pushing it to the destination is tedious at times, it provides an EXTREMELY high level of flexibility to handle all of the "issues" that arise when you actually get in the middle of the data that you can't foresee when someone says "hey can you capture this data and make it available for us"? I have a lot of confidence in Python and the libraries that I can handle almost any exceptions without hacks and tons of inefficient work. So take this work as example. I have a CSV that is messy. A few of the "translations" that needs to take place: You have to extract dates from the filename, prepend that to all of the rows of the CSV. Do some massaging of each record to "fix" each records "string" issue (each record has strings with commas in the string itself). The only constant is that I know that there are 25 "columns" in the csv and then the 26th is the string. The string can contain multiple commas Parsing out the content of the string itself. All records are not the same. Each record will have some type of "key" in one of the fields, I have to look for that in all of the records, if it does, I need to extract/parse information out of that field and then have that stored into the database. This is not complex and the example of extraction I have to work from is about 300 lines of MATLAB code that I'm basically going to be implementing again to get what they want. It has a decent amount of logic. Using Nifi, Shu (much appreciated) from my example thread was able to help me do #1 and #2. But #3 (which I didn't ask) is not going to be feasible at all. Sure, I can go call an external processing script with Nifi, but then that comes into the area of having all of these "random" scripts inside of Nifi that just handle random different parts of the process that makes things difficult to test and maintain. It almost feels like I need something of a "programmable" nifi/libarary/framework. I can tell the framework to simply "go tail these sets of files with X parameters" on this schedule, when it runs I get the context of that information back similar to nifi but I have it available directly in python (can be any just an example) where I can use pandas or some other libraries to manipulate that data as I see fit, do my processing on it, and then call other "nifi like" processors from within python directly to putfile, insert into a database, pass it to a message queue, or some other action. Is there something like this that makes it easier from a programmatical sense? What are my options? Am I missing what I should be doing with nifi on ingesting/processing/shipping data? The only reasonable solutions I can think of with using nifi would be. 1. Use nifi to pull data, do as little processing, manipulation on source data as possible be put into a staging storage system, and then use python (or some other language) to pull the data from the storage system, process it as needed and then insert into back into a result storage system (database, processed csv, etc.) 2. Use nifi to pull the data, do basically no processing on data, send it over some messaging queue, consume it with python, do more complex manipulation of the data in an environment more appropriate to actually data processing and then either write that out to a permanent data store (db, etc.) I'm struggling with finding a robust, maintainable, readable, reusuable (within reason), testable, programmable, organized, process for doing pipeline/ETL/processing/data shipping architecture. Any suggestions would be appreicated.
... View more
01-04-2018
05:25 PM
1 Kudo
Thanks! That seems to work correctly. I'll mark this as the answer as it produces the answer I'm looking for.
... View more
01-03-2018
10:25 PM
2 Kudos
@Shu Thank you for the great detailed response. The first part does work but I don't think the regex will work for my
case. (Side bit, no fault of yours, I just absolutely despise regex as its
unreadable to me and extremely difficult to debug (if at all).) I should have mentioned this, but the only thing I know about the CSV
file is that there are X number of columns before the string. So I could see something like..
23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max, attempts,(1=>3), fit failing entrely,(FitFailure=True),
The only thing I know is that there are 13 columns (commas) before
the string and the string will always have a trailing "," (It has always
been the last column in the row from what I have seen). The other issue is I tried doing
(.*),
for all of the columns so I could then put it into a database query
to insert the data but the regex seems to blowup and not function with
so many columns (the original data has about 150 columns in it and I
just truncated it down here).
... View more
01-03-2018
04:46 PM
1 Kudo
I have a CSV file that is messy. I need to: 1. Get the date from the filename and use that as my date and append that to one of the columns. 2. Parse the CSV file to get the columns as the very last column is a string which has separators in the string ",". The data looks like this. Filename: ExampleFile_2017-09-20.LOG Content: 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,Failed to fit max attempts (1=>3), fit failing entirely (Fit Failure=True), 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,Command Measure, Targets complete - Elapsed: 76064 ms, The following is what will need to be inserted into the database: 2017-09-20 23:49:38.637,162929511757,$009389BF,36095,,,,,,,,,,"Failed to fit max
attempts (1=>3), fit failing entirely (Fit Failure=True)" 2017-09-20 23:49:38.638,162929512814,$008EE9F6,-16777208,,,,,,,,,,"Command Measure, Targets complete - Elapsed: 76064 ms" Would I need to do this inside of NiFi or some external script by calling some type of ExecuteScript?
... View more
Labels:
10-17-2017
02:57 PM
So
the picture is getting quite blurry between all of the pipeline/etl tools
available. Specifically: * NiFi * StreamSets * Kafka (?) * Luigi * Airflow * Falcon * Oozie * A Microsoft solution? I've got several projects that I
could see a use for a pipeline/flow tool where ETLing is the point of the
entire project. So what are the strengths and weaknesses of each? Where should
I be using one or the other? Where does one shine where the other would be
difficult to manage or be overkill for the project? Which would be the most
light-weight of the tools? I have several projects but have two
stick out in my mind. They are completely unrelated to each other at all. They
do NOT overlap at all. 1) The project is a simple ETL for
XML data. In simple terms, 20 or so machines write out XML log data to their
local drive that is shared on the network. A python application connects to
each machine's share, copies the data to the local system for archival purposes
of the raw data. The same application reads the XML data from the files,
extracts all of the relevant content from the XML files and stores it into a
Microsoft SQL Server database. Currently the application gets run every 20
minutes through a Huey cronjob task in Python to look for new data on the
share. This is a Windows-only application/ecysystem so using something in the
MS world isn't out of the question either (hence why I included it). 2) The second project is more
"pipeline". We have about 2 million files that will need to run
through a process of a) Original Format --> b)Converted to an industry
standard format --> c) data massaged to fit our need --> d)Data converted
--> e) intermediate results are written out to disk --> f)data use to
train deep learning model to train model. For inference of a file, steps a),
b), c), d), e), f) would be performed. Step f) would be replaced with the
inference of the model and then f) would pass results down to g) (another
application). This is initially going to be done on Linux that they want to end
up (potentially) on Windows with so that could be a consideration. So
for these two items what would you end up choosing? From everything I have read and researched NiFi would be able to handle the get and put of the data files easily, but calling the python code to extract the data and put it in the database, how would NiFi handle that? I also looks to me that NiFi/StreamSet are a lot more heavy weighted and are usually operating within the Hadoop ecosystem. I'm not working with Hadoop/HDFS on either of these two applications. Any input on the strengths/weaknesses/specific use case for these examples would be greatly appreciated!
... View more
Labels:
11-16-2016
08:10 PM
Earlier in the day the NN ran out of disk space (someone copied large files on to the server and filled up the drive). I restarted the NN, and apparently the DN didn't come back for some reason. I saw in the DN log 2016-11-16 18:18:25,556 FATAL datanode.DataNode (BPServiceActor.java:run(833)) - Initialization failed for Block pool <registering> (Datanode Uuid a3e170b3-a1f1-405f-9627-751eba7973 Googling around I saw that the recommendation was to chnage the the clusterID of the DN's to the same thing as the NN. I'm speculating that this was what caused it, but I have no idea how to get it back. NN: https://gist.githubusercontent.com/vaskokj/0fe9879d6d1d9919c65dbfa819e68476/raw/415485732c3c9aba507306a5fa91a14bea0620f8/gistfile1.txt DN: https://gist.githubusercontent.com/vaskokj/a09a948ccfcb41f5967c4d4f3cd65038/raw/3dc34c0d19e7e3e6f0064e857bcc20ffac17144f/gistfile1.txt
... View more
11-16-2016
07:10 PM
1 Kudo
Using Ambari 2.2.0.0
1 Master node running
NameNode
SecondaryNameNode
JournalNode
Ambari
ZooKeeper
Hive Metastore
HiveServer
YARN Timeline Server
MapReduce History Server
ResourceManager
Oozie
2 Slaves running
DataNode
NodeManager
On my slaves dfs.datanode.data.dir it was originally /mnt/hdfs/data. I changed it to add more disk space (just added a mount point). /mtn/hdfs/data,/data/0. restarted the hdfs.
After doing that, there is nothing in hdfs. All the data is gone, did some digging and on the datanodes there are two folders...
BP-1957273147-192.168.1.12-1479316206670/ BP-456812458-192.168.1.12-1469823616747/ How did these get created? I did not format my namenode, and I did not change my datanodes. I just restarted HDFS. How can I fix it?
... View more
- Tags:
- Hadoop Core
- HDFS
Labels:
06-01-2016
01:32 PM
@mbalakrishnan I'll try that but that does look to be just a different method of adding parameters to the OdbcCommand object. At this point I am thinking this is an issue with the Hive ODBC driver itself. I cannot find any documentation related to this subject for the Hive ODBC and the things I found people are having the same issue.
... View more
05-31-2016
08:43 PM
@mbalakrishnan
Thanks but we tried it both ways and it still does not work.
... View more
05-31-2016
05:50 PM
1 Kudo
Wanting to run a query in a C# application against hive. OdbcCommand cmd = conn.CreateCommand();
cmd.CommandText = "SELECT * FROM user WHERE id = ?";
cmd.Parameters.Add("?id", OdbcType.Int).Value = 4;
OdbcDataReader reader = cmd.ExecuteReader();
But end up getting an error from the ODBC driver
ERROR [HY000]
[Hortonworks][HiveODBC] (80) Syntax or semantic analysis error thrown in server
while execurint query. Error message from server: Error while compiling
statement: FAILED: ParseException line 1:42 cannot recognize input near '?'
'<EOF>' in expression specification
... View more
Labels:
05-25-2016
05:34 AM
what do I need to set in hive-env.sh? It seems that anything I touch it gets overwritten. This has to be a bug in ambari where it won't save the hive.heapsize value. How can I get it to persist?
... View more
05-25-2016
04:49 AM
The configuration of hive.heapsize does not exist in my hive-site.xml for some reason and whenever I add it to the file it keeps getting overwritten.
... View more
05-25-2016
04:09 AM
@Divakar AnnapureddyCorrect but if you look at my comments i posted a picture and it shows, it is changed to 12GB in the UI. The services have been restarted (complete server has been restarted).
... View more
05-25-2016
03:27 AM
hive 24964 0.2 1.7 2094636 566148 ? Sl 17:03 0:56 /usr/lib/jvm/ja va-1.7.0-openjdk-1.7.0.91.x86_64/bin/java -Xmx1024m -Dhdp.version=2.3.2.0-2950 - Djava.net.preferIPv4Stack=true -Dhdp.version=2.3.2.0-2950 -Dhadoop.log.dir=/var/ log/hadoop/hive -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/usr/hdp/2.3.2.0- 2950/hadoop -Dhadoop.id.str=hive -Dhadoop.root.logger=INFO,console -Djava.librar y.path=:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64:/usr/hdp/2.3.2. 0-2950/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.prefe rIPv4Stack=true -Xmx1024m -XX:MaxPermSize=512m -Dhadoop.security.logger=INFO,Nul lAppender org.apache.hadoop.util.RunJar /usr/hdp/2.3.2.0-2950/hive/lib/hive-serv ice-1.2.1.2.3.2.0-2950.jar org.apache.hive.service.server.HiveServer2 --hiveconf hive.aux.jars.path=file:///usr/hdp/current/hive-webhcat/share/hcatalog/hive-hca talog-core.jar -hiveconf hive.metastore.uris= -hiveconf hive.log.file=hiveserve r2.log -hiveconf hive.log.dir=/var/log/hive So I can see that it is set at 1024m, however it is set to some really large value. http://imgur.com/3oXfpPj
... View more
05-25-2016
01:48 AM
I am issuing a command that is executing about 1500 xpaths on a single XML file (it is about 10MB in size). I am getting the error in the title. I have tried increasing just about every configuration setting I know related to Hive/Tez's java heap space. e.g. https://community.hortonworks.com/questions/5780/hive-on-tez-query-map-output-outofmemoryerror-java.html Nothing seems to work. I restart the server after every configuration change. I also went and changed hive-env.sh to -Xmx8g and it still doesn't seem to fix the issue. I ran -verbose:gc and see that the gc stops at ~1000MB. Why wouldn't that go on up to 8G if I changed -Xmx to be 8g? Is there anyway to tell if it is the client breaking and needing more heap or the map jobs?
... View more
Labels:
03-14-2016
03:51 AM
3 Kudos
I have Knox setup that is authenticating with my test Windows AD server. Right now I can use the domain account to authenticate but would like to be able to do it with an x509 cert. Is that even possible right now?
... View more
Labels:
03-05-2016
02:46 AM
1 Kudo
I can't seem to reply to your last comment but that was exactly the problem.
... View more
03-05-2016
02:45 AM
1 Kudo
Thanks, found it and it was already set to true and that still wasn't the issue. I went into hue and ran the create function command (same command as I did in HiveCLI ) and the command worked and I was able to run the function within hue. this to me looks like some type of context issue where the persistent function that is added in the CLI doesn't work in the other contexts (ODBC and Hue). I have no idea how to solve that.
... View more
03-04-2016
05:47 PM
1 Kudo
I'm going to accept your answer for this question as I ended up writing a UDF to solve the potential slow issue doing all the XPaths multiple times. But the general gist of the thread still applies just different problems.
I ended up partially "solving" the issue with having 300 columns (in HiveCLI) in a table by disabling Apache Atlas in HDP. Apparently Atlas was intercepting the queries and blowing up when the query become too long. I would venture to guess this is a bug in Atlas. After fixing that, I worked on writing the UDF and making it permanent so it could be used by the application using an ODBC connection. I used the CREATE FUNCTION statement and that works....except it only made the function permanent in the HiveCLI context, an ODBC or even Hue context the function doesn't exist. Ended up having to just run the CREATE FUNCTION statement in the Hue/ODBC Application context. Unless im missing a configuration setting that I'm not aware of I assume this is another bug. Once I did that I was able to get the HiveCLI to work with all 400+ columns with the UDF. I thought I was done but unfortunately, ran into another issue when I tried to run the same query that worked in the HiveCLI in Hue/ODBC App. This issue is a similar issue with the first error...if I only have ~250 columns in the query it works in Hue/ODBC application. Currently investigating this problem. But these are examples of the original sentiment of the original post. 2016-03-04 10:47:55,417 WARN [HiveServer2-HttpHandler-Pool: Thread-34]: thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error fetching results:
org.apache.hive.service.cli.HiveSQLException: Expected state FINISHED, but found ERROR
at org.apache.hive.service.cli.operation.Operation.assertState(Operation.java:161)
at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:334)
at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:221)
at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:685)
at sun.reflect.GeneratedMethodAccessor31.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy19.fetchResults(Unknown Source)
at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454)
at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TServlet.doPost(TServlet.java:83)
at org.apache.hive.service.cli.thrift.ThriftHttpServlet.doPost(ThriftHttpServlet.java:171)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:565)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:479)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:349)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:449)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:925)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:76)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:609)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:45)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
... View more
03-04-2016
03:21 PM
1 Kudo
It was not created with a specific database. If I run SHOW FUNCTION within the HiveCLI it shows up as default.<myfunction>. If I run SHOW FUNCTION in Hue, the function does NOT show up even though I'm using the "default" database. Is there a way I can make it not be under "default." and just "<function>"? Hue/App using ODBC has no problem using those functions (e.g. count()). If I add the jar file in Hue (one the left sidebar) and the function/class information it all works.
... View more
03-04-2016
03:19 PM
1 Kudo
I don't even see hive.server2.enable.doAs. Would it be under the Hive configuration settings?
... View more
03-01-2016
03:55 AM
1 Kudo
I'll check this. I am using HDP 2.3.2 (sandbox) which I believe comes with Hive 1.2.1 so that defect *shouldn't* be the problem.
... View more
02-26-2016
10:56 PM
1 Kudo
Logging in with the same username with Hue as I am with HiveCLI. Getting this error Error occurred executing hive query: Error while compiling
statement: FAILED: SemanticException [Error 10011]: Line 1:155 Invalid
function
... View more
02-26-2016
10:51 PM
1 Kudo
Sorry, yeah using HUE or through my ODBC application it says it can't find the function. I'm logging into the application in HUE with the same username I am with through the HiveCLI. To be specific: Error occurred executing hive query: Error while compiling statement: FAILED: SemanticException [Error 10011]: Line 1:155 Invalid function
... View more