Member since
07-20-2018
35
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
848 | 10-01-2018 10:34 PM | |
956 | 09-07-2018 07:52 PM |
11-27-2018
10:05 PM
1 Kudo
You can use NiFi, it has the ability to read data from hive and then use Execute the CQL statements with PutCassandraQL to post to Cassandra
... View more
11-27-2018
03:43 AM
I saw the "could not write to" error in u your logs and figured it would be worth confirming, if user hive or end user can read /write to that location it's probably not the problem There are some gateway related errors in there as well, is this happening via Knox gateway?
... View more
11-26-2018
11:39 PM
Try to write to that hdfs folder using each of these users (hdfs dfs -put) for example, whichever user hive impersonates to access hdfs likely lacks permissions on that folder To get around this I usually just do (hdfs dfs -chmod 777) on that folder in hdfs, you may not want open permissions on that file/folder but that's a good way to confirm the issue is actually file permissions When impersonation is disabled the user that needs access to the folder is 'hive', when enabled it is the user logged in to ambari, so check your impersonation settings for hive views and hive itself
... View more
11-26-2018
10:13 PM
you also need to add the Ambari admin user as a proxy user (the same way as root), or ensure that admin itself has access to read/write in that HDFS location the error comes down to the user that hive is executing as on HDFS doesn't have read/write access to the files, this could be hive (if impersonation is disabled) or the end user that you are signed into Ambari as (if impersonation is enabled)
... View more
11-26-2018
10:06 PM
as alternative to Sandbox you can use Cloudbreak to deploy in AWS or Azure on virtual infrastructure, recent releases of HDP doesn't do as well on smaller/single node istances https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.7.0/content/index.html#get-started
... View more
11-16-2018
08:51 PM
Another option could be to use Ambari log search
... View more
10-02-2018
10:58 PM
Using Hive Web UI (Hive View) does not mimic the Pentaho DoAs command correctly, Hive View will execute the DoAs as the "admin" user, while impersonating the end user (user logged in to Ambari), "admin" would by default have the privilege to do this You need to test this on command line using the beeline utility, specifically with a JDBC connection that invokes the impersonation command on behalf of the user that Pentaho is configured to connect as (if the Pentaho processor has a specific connection string you can use that as well for your jdbc connection string in beeline) The exercise here is to connect to hive exactly the same way that Pentaho would, using Hive view does not (necessarily) do that an example of kerberos authenticated user Hive impersonating user "testuser": jdbc:hvie2://HiveHost:10001/default;principal=hive/_host@HOST1.COM;hive.server2.proxy.user=testuser for more information see the below article on impersonation in the zeppelin notebook interface: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_zeppelin-component-guide/content/config-hive-access.html
... View more
10-02-2018
04:52 PM
There's some info on this thread that should point you in the right direction, let me know if you get stuck: https://community.hortonworks.com/questions/67900/startstop-processor-via-nifi-api.html
... View more
10-02-2018
09:16 AM
You can generate a flow file that contains your sql commands and then execute with the PutHiveQL, I've done this by replacing flow file content in an existing flow with sql commands (replace processor, and just type the sql in the processor config). If you start your flow with the create flowfile with sql commands it will just re-execute continuously. you can also read an external file containing your sql using get file, and then execute that with PutHiveQL
... View more
10-01-2018
10:46 PM
How large is the cluster? how many nodes/how much memory? I've seen similar issues on smaller clusters with hive views because hive views does not reuse yarn containers, so if a new query is executed before previous containers get cleaned up they can remain in the accepted state indefinitely
... View more
10-01-2018
10:39 PM
have you tried executing the same insert using beeline with the same credentials? it looks like the kettle engine is invoking doAs (execute a command on bahalf of user x while logged in as some superuser), confirm the doAs is in fact enabled /possible for that admin user you can invoke the same doAs in your beeline connection when testing in doing this you should see the actual hive error (if any) that's happening
... View more
10-01-2018
10:34 PM
You should be able to call a nifi process by api on a shell executed script from within Nifi
... View more
09-18-2018
03:01 AM
In HDP 3.0 and above there should be no significant impact to tables that have ACIDv2 applied, in fact, the standard update process to HDP 3.0 will enable this globally by default, keep in mind, tables that have significant amounts of updates/deletes issued to them may see some degradation in performance in various scenarios, please review the 3.0 documentation for more detail on this. the ability to update/delete on a given table is controlled by ranger and users will require the appropriate ranger permissions in order to use these capabilities
... View more
09-10-2018
06:38 PM
Install ldap utils via yum (you may need to Google or check whatprovides ldapsearch, I think it's ldaputils), or you can test it from another server that does have ldapsearch installed. If you're not familiar with the ldapsearch syntax you will have to do a bit of research on how to structure the group membership filter correctly
... View more
09-10-2018
06:33 PM
This would depend on your Ranger configuration, your Group dn and membership/member of attribute need to be correct for your AD setup, try testing your configuration using ldap search commands with the parameters you've configured to ensure you are getting the proper membership response from your config
... View more
09-07-2018
07:52 PM
The problem here could be that the external table isn't structured to make the filter/split of this file optimal, for example " WHERE department = 'xxx' AND time='yyyy';" executed against a non-partitioned external table causes a complete file scan of the 10gb for each statement (so you're reading the 10GB entirely every time) You may want to read the file into nifi flow file with a configured buffer as actual data in stead of taking the external table approach, alternatively, you can use an intermediate orc table that inserts the entire external file in some sort of sorted manner, before splitting it into multiple tables based on some filter (which you would optimize for in your intermediate table structure), I'd personally recommend the first approach though
... View more
09-07-2018
06:38 PM
Then yes, that is supported as far as I'm aware
... View more
09-07-2018
06:00 PM
how large is this cluster? There's another thread on here discussing similar behavior on small clusters
... View more
09-07-2018
05:52 PM
Nifi can absolutely do this, but you may want to look at skipping the hdfs layer and going directly to a hive managed orc table from the rdbms step: rdbms --(via data in Nifi flow file)--> orc table --> R Model Layer -> model output hive tables? the reporting layer should probably independent and access the hive layer on it's own?
... View more
09-07-2018
05:41 PM
You should be able to use ranger group policies in this scenario..., are your end users using any edge nodes to access the HDP environment? they would still need to kinit (which would require AD credentials) on those edge nodes, so the local host account may not be AD authorized but access into any HDP service would be
... View more
08-21-2018
01:42 AM
In 2.6 ACID needs to be enabled on a per table basis in addition to enabling ACID transactions globally in Ambari. It's worth noting that you should consider your workload before enabling ACID on a table, for tables with large volume updates/deletes this could cause issues related to performance HDP 3.0 features ACID v2 for Hive, with significant improvements, when upgrading to 3.0 ACID v2 is enabled globally since these performance impacts have been reduced to negligible levels
... View more
08-21-2018
01:36 AM
1 Kudo
It would depend on how the CLI is invoked, as long as consecutive queries happen within the same session the existing container should be re-used, which is essentially the behavior you would see in beeline. Do you recall if the first time you queried from hive view succeeded, and subsequent queries never gets executed (just submitted) - that would confirm the lack of available memory for new containers.
... View more
08-17-2018
05:39 PM
See https://community.hortonworks.com/questions/212611/hivepartitionssmall-filesconcatenate.html
... View more
08-17-2018
05:37 PM
try to manually ssh between the ambari host and new host using the private/public key pair via terminal, in some cases a first time connection needs to be established to add the host to the known hosts file
... View more
08-17-2018
05:35 PM
I've had success in the past by first using all the ambari required details to run an ldapsearch query in terminal, do this from the host where you are configuring ambari, if there are any issues with the credentials or any of the configuration parameters, the ldapsearch query should highlight these (openldap utilities need to be installed to access ldapsearch) Here's some ldapsearch examples: ldapsearch
... View more
08-17-2018
05:25 PM
Yes, as long as the appropriate clients are installed on the slave node, if you also have the /etc/config populated with the correct details for connecting to your instance, then no connection parameters need to be specified for the clients (this is automatically populated if the slave node is deployed/configured by ambari). in that case you submit the job exactly as you would on any other node
... View more
08-17-2018
05:23 PM
As far as I'm aware the hive database engine uses the metastore to build the query plan, this would also be where the optimization process happens I'm curious as to why the specifics of this would impact your use case?
... View more
08-17-2018
05:20 PM
Can maxSplitSize be set globally for the cluster to allow for a size large enough to combine those two files?
... View more
08-09-2018
11:05 PM
I agree that's the expected behavior, merely suggesting some troubleshooting to dig into exactly why the issue is happening
... View more
08-09-2018
11:04 PM
1 Kudo
I've seen this happen on smaller clusters because the ambari hive view (and hive view view v2) doesn't reuse containers, it requires a new container for each query when previous containers aren't completely released yet, do you see the same behavior on beeline for example? In some cases reserving a number of tez containers for hive can help assuming the resources for the container are typically available
... View more