Member since
08-16-2016
642
Posts
131
Kudos Received
68
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3432 | 10-13-2017 09:42 PM | |
6184 | 09-14-2017 11:15 AM | |
3176 | 09-13-2017 10:35 PM | |
5100 | 09-13-2017 10:25 PM | |
5733 | 09-13-2017 10:05 PM |
01-19-2017
09:12 AM
1 Kudo
Ah that will do it as all new tables inherent the DB path unless specified in the Create table statement. There is no way to alter it through HIve/Impala. You will need to log into the metastore DB and change it there. You can find it in the <metastore_db_name>.DBS and I believe the column is just called LOCATION. Find the id for the default DB and run something like 'update DBS set LOCATION = 'hdfs://NN_URI:8020/user/hive/warehouse' where id = <default_db_id>;'
... View more
01-19-2017
09:08 AM
Does the user 'administrator' exist on the HS2 node, and preferable the rest of the nodes. Does the user have a HDFS user directory, /user/administrator, with full access to it? These items are what is needed for users to access the cluster and run jobs regardless of the means of authentication.
... View more
01-19-2017
08:47 AM
This is going to be rough. You could manually copy the data from the CM server over to each node. You could also deploy a new cluster to those some nodes. I got a feeling that either way the old configs will not be present any longer. Before doing anything I would try to take a backup of the cluster using the CM API. Then you can try to restore the configs from that if you end up with a new cluster with default configs. https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_intro_api.html
... View more
01-19-2017
08:42 AM
This may be a silly question, but does the test table exist prior to running the CTAS statement?
... View more
01-19-2017
08:35 AM
It sounds like Hive Impersonation is not turned on. Can you verify? Do you have this same issue from Beeline or other JDBC connections? hive.server2.enable.doAs=true https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-Impersonation
... View more
01-17-2017
10:59 PM
I was wondering if stats were needed to have describe extended output the actual file size. I recall something like that.
... View more
01-17-2017
11:50 AM
On the setting changes, stats, as stated will help with counts as that info is precalculates and stored in the metadata. The CBO and stats also help a lot with joins. It is possible that the OS cache is more to do with the improvement if this was a subsequent run with little activity. You could look at Hive on Spark for better consistent performance. Set hive.execution.engine = spark; On the times, the big impact between job submission and start is the the scheduler. That is a deep topic. It is best if you read up on them and review your settings and ask any specific questions that come up, preferably in a new topic. The other factor, not captured on the job stats, is the time it takes to return the results to the client. This will vary depending on the client and there isn't much to do about it. In general small result sets can be handle by the hive CLI. You can increase the client heap if needed. Otherwise use HS2 connections like beeline or HUE.
... View more
01-16-2017
11:08 PM
Yeah, that is expected behavior. Each batch writes to the staging directory and when it is done, the data is moved to the actual table/partition directory. I have experienced these same staging directories being left behind. In general, if the data is successfully moved then there will be no data left behind. I ended up having a separate process that would check for entries since the last run (regular Spark jobs, not streaming) and then check if the directory was empty; remove, and repeat. I also employed these directories checks to see if something had go wrong in a job as the data would remain.
... View more
01-16-2017
10:47 PM
The configuration file it is set in is hive-site.xml. CM provides Advance Configuration snippets where this can be added. The trick is getting it in the right one. I don't know for sure as I haven't tested it. The settings would apply to the specific jobs being launched through Hive. I would think that at a minimum you needed it on the Gateway and also the HiveServer2. You could play it safe and filter by Advance and then search for hive-site.xml in CM and then add it to all ACS.
... View more
01-16-2017
10:43 PM
Disclaimer: I haven't done this at all. Did you change the service.sdl so that HDFS wasn't a requirement or so that Isilon is one? I don't think that matters for what you are experiencing as the dependencies and everything else in service.sdl comes into play when you go through the Add a Service wizard in CM. The parcel.json should define what comes in the parcel. I would think it being listed in the packages/components is what is need to have it show up as a service that can be added. Did it create the NIFI user? Did it unpack the parcel on all nodes? Those are a couple of items that may point to whether the parcel deployed correctly. "users": { "nifi": { "longname" : "NiFi", "home" : "/var/lib/nifi", "shell" : "/bin/bash", "extra_groups": [] }
... View more