About Jim_B

HadoopHelp · ‎02-12-2020

Hi @vignesh_radhakr . you can simply access your hive by using below :- URL:- conn = hive.Connection(host="masterIP", port=10000, username="cdh123") note:- MasterIP need to pass with port 10000 Thanks HadoopHelp

Jim_B · ‎08-23-2018

After some research and help, I found that I had incorrectly set the following nifi.remote.input.host property. It should be set as follows in Advance nifi properties: nifi.remote.input.host={{nifi_node_host}} Each node should have a different value for nifi.remote.input.host, it's the value the current node is going to advertise for s2s comms... if you set that the same on all nodes then they are all advertising the same hostname and thus all data going to same host. You have to set the other multi-threading parameters such as "Maximum Timer Driven Thread Count" in controller settings, and "Concurrent Tasks" in the appropriate processors. But, this gets multiple nodes to listen for the RPG requests.

Jim_B · ‎12-31-2017

Hive is very powerful, but sometimes you need to add some procedural code for a special circumstance such as complex parsing of a field. Hive provides the ability to easily create User Defined Table Functions (UDFT’s). These allow you to transform your Hive results, pass them through the UDTF and return data as a set of rows that can then be used like any other Hive result set. These can be written in Java or Python, and we will use Python for this article. However, the techniques here are applicable to both with some syntax changes. There are a lot of great articles on building these such as https://community.hortonworks.com/articles/72414/how-to-create-a-custom-udf-for-hive-using-python.html. These pretty much work as advertised, but don’t get into how to debug or troubleshoot your code. However, when you get into any significant logic (and sometimes not so significant!), you are likely to create a few bugs. This can be an issue parsing, or in the case of Python, can even be syntax errors in your code! So we are going to look at two techniques that can be used to debug these UDTF’s. The problem When Hive encounters an error in the UDTF it simply blows up with a pretty confusing error. The underlying error is hidden, and you are left scratching your head. The error will probably look something like: Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script. Or from a Yarn perspective: Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143 NOT VERY HELPFUL!!! Our test scenario We need to parse json text stored in a column of a Hive table into columns. Hive serde’s can handle most json formats, but there are still a few outlier situations, where you need a more flexible json parser. Quick UDTF Review First, two important things to remember about Hive UDTF’s: Hive sends data in to your UDTF through stdin as strings of column values separated by tabs Your UDTF sends rows back to Hive through stdout as column values separated by tabs So, this means that if you are getting an error in your UDTF, you can’t just print debug statements to stdout. Hive is expecting its’ output here, and just won’t print them. Rather, it would cause a format error. The Table CREATE EXTERNAL TABLE `default.events`(`json_text` string) STORED AS TEXTFILE LOCATION '/tmp/events'; The Data { "deviceId": "13a46b21-9528-4eb1-93bd-303a3b3e6b6a", "events": [ { "Process_Started": { "timestamp": "2017-06-01T18:26:24.444Z" } }, { "Process_Stopped": { "timestamp": "2017-06-01T18:26:24.444Z", "errorReason": "-1", "errorMsg": "The operation couldn’t be completed." } } ] }{ "deviceId": "9cd57d50-4d0e-457e-9fd3-05b9e56644e6", "events": [ { "Process_Started": { "timestamp": "2017-06-02T00:20:20.400Z" } }, { "Process_Completed": { "timestamp": "2017-06-02T02:20:29.020" } } ] } The Query We will save this in select_json.hql DELETE FILE /home/<your id>/parse_events.py; ADD FILE /home/<your id>/parse_events.py; SELECT TRANSFORM (json_text) USING 'python parse_events.py' AS deviceId, eventType, eventTime, errorReason, errorMsg FROM default.events; The UDFT #!/usr/bin/python ################################################################################################## Hive UDTF to parse json data ################################################################################################## import sys import json reload(sys) sys.setdefaultencoding('utf8') def parse_json(json_string): j = json.loads(json_string) deviceId=j["deviceId"] events=j["events"] # Force a stupid error! x=1 y=0 z=x/y # Flatten Events Array for evt in events: try: eventType = evt.keys()[0] e = evt[eventType] edata = []edata.append(eventType) edata.append(e.get("timestamp",u'')) edata.append(e.get("errorReason",u'')) edata.append(e.get("errorMsg",u'')) # Send a tab-separated string back to Hive print u'\t'.join(edata) except Exception as ex: sys.stderr.write('AN ERROR OCCURRED IN PYTHON UDTF\n %s\n' % ex.message) def main(argv): # Parse each line sent from Hive (note we are only receiving 1 column, so no split needed) for line in sys.stdin: parse_json(line) if __name__ == "__main__": main(sys.argv[1:]) Let's Run It! Here's a hint, Python should throw and error "ZeroDivisionError: integer division or modulo by zero". Assuming you have saved the query in select_json.hql, this would go something like this: hive -f select_json.hql ... Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator ... Task with the most failures(4): ----- Task ID: task_1514310228021_3433_m_000000 ----- Diagnostic Messages for this Task: Error: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:210) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 20003]: An error occurred when trying to close the Operator running your custom script. at org.apache.hadoop.hive.ql.exec.ScriptOperator.close(ScriptOperator.java:560) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:631) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:192) ... 8 more Nothing about divide by Zero anywhere. Ugh! TECHNIQUE 1 - Forget Hive! You are writing a Hive UDTF, but you are also just writing a program that reads from stdin and writes to stdout. So, it is a great idea to develop your logic completely outside of Hive, and once you have adequately tested you can plug it in and continue development. The easiest way to do this, which also allows you to test later with no changes is to pull out the data that Hive would send your UDTF and feed it to stdin. Given our table, it could be done like this: hive -e "INSERT OVERWRITE LOCAL DIRECTORY '/tmp/events' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' SELECT json_text FROM default.events;" -bash-4.1$ ls -l /tmp/events total 4 -rw-r--r-- 1 screamingweasel screamingweasel 486 Dec 31 23:02 000000_0 cat /tmp/events/* { "deviceId": "13a46b21-9528-4eb1-93bd-303a3b3e6b6a", "events": [ { "Process_Started": { "timestamp": "2017-06-01T18:26:24.444Z" } }, { "Process_Stopped": { "timestamp": "2017-06-01T18:26:24.444Z", "errorReason": "-1", "errorMsg": "The operation couldn’t be completed." } } ] } { "deviceId": "9cd57d50-4d0e-457e-9fd3-05b9e56644e6", "events": [ { "Process_Started": { "timestamp": "2017-06-02T00:20:20.400Z" } }, { "Process_Completed": { "timestamp": "2017-06-02T02:20:29.020" } } ] } # DO THE ACTUAL TEST (note there may be >1 file in the directory) cat /tmp/events/* | python parse_events.py Traceback (most recent call last): File "parse_events.py", line 42, in <module> main(sys.argv[1:]) File "parse_events.py", line 39, in main parse_json(line) File "parse_events.py", line 18, in parse_json z=x/y ZeroDivisionError: integer division or modulo by zero Simple as that! Export the columns you will be passing to the UDTF to a tab-separated file and pipe it into your UDTF. This will simulate Hive calling your UDTF, but doesn't bury any error messages. In addition, you can print whatever debug messages you like to stdout or stderr to help in debugging. TECHNIQUE 2 - stderr is your friend! As noted, Hive expects the results from the UDTF to be in stdout. The stderr file is fair game for writing debug statements. This is pretty old school debugging, but it's still effective. Print out values and locations in your code to help you determine where the error occurs or what values are in variables at certain times. For example, you might add the following to the UDTF script to help identify where the issue is happening: sys.stderr.write("Before stupid error\n") x=1 y=0 z=x/y sys.stderr.write("After stupid error!\n") The trick is to find these in the logs when running on a Yarn cluster. These scripts are set to use mapreduce, which makes it a little easier, but basically, you find the Yarn job, drill down on one of the failed containers and examine its' stderr. Attached are some screen prints from the Yarn RM showing this process. Winner, Winner, Here are our debugging statements! SUPPORTING FILES FOR THIS ARTICLE ARE AVAILABLE ON GITHUB AT https://github.com/screamingweasel/articles/tree/master/udtf_debugging

Jim_B · ‎10-07-2017

Thanks! Very subtle difference, but obviously important to Spark! For everyone's reference, this tar command can be used to create a tar.gz with the jars in the root of the archive: cd /usr/hdp/current/spark2-client/jars/ tar -zcvf /tmp/spark2-hdp-yarn-archive.tar.gz * # List the files in the archive. Note that they are in the root! tar -tvf /tmp/spark2-hdp-yarn-archive.tar.gz -rw-r--r-- root/root 69409 2016-11-30 03:31 activation-1.1.1.jar -rw-r--r-- root/root 445288 2016-11-30 03:31 antlr-2.7.7.jar -rw-r--r-- root/root 302248 2016-11-30 03:31 antlr4-runtime-4.5.3.jar -rw-r--r-- root/root 164368 2016-11-30 03:31 antlr-runtime-3.4.jar ... # Then upload to hdfs, fix ownership and permissions if needed, and good to go!

Jim_B · ‎09-30-2017

Bridging the Process Time – Event Time gap with Hive (Part 1) Synopsis Reconciling the difference between event time and collection/processing time is critical to understand for any system that analyses event data. This is important whether events are processed in batch or near real-time streams. This post focuses on batch processing with Hive, and demonstrates easily replicable mechanisms for bridging this gap. We will look at the issues surrounding this and prevent two repeatable solution patterns using Hive and Hive ACID. This first post will look at the issue and present the solution using Hive only, and the follow-up article will introduce Hive ACID and a solution using that technology. Overview One of the most common big data ingestion cases is event data, and as IoT becomes more important, so does this use case. This is one of the most common Hadoop use cases, but I have not found many detailed step by step patterns for implementing it. In addition, I think it is important to understand some of the thinking around events, and specifically, the gap between event time and processing times. One of the key considerations in event analysis is the difference between data collection time (process time) and the time that the event occurred (event time.) A more formal definition might be: Event Time – The time that the event occurred Processing Time – The time that the event was observed in the processing system In an ideal world, these two times would be the same or very close. However, in the real world there is always some time lag or “skew”. And, this skew may be significant, and this exists whether you are processing events in batches or in near real-time. This skew can be caused by many different factors including Resource Limitations – Bandwidth, CPU, etc. may not allow events to be immediately forwarded and processed. Software Features/Limitations – Software may be intentionally programmed to queue events and send them at predetermined times. For example, cable TV boxes that report information once or twice a day, or fitness trackers that send some information, such as sleep data only daily. Network Discontinuity – Any mobile application needs to plan for disruptions in Internet connectivity. Whether because of dead-spots in wireless coverage, airplane-mode, or dead batteries, these interruptions can happen regularly. To mitigate these, any good mobile app will queue event messages for sending the next time that a connection is available, which may be minutes or months! Time Windows Much of the current interest is around near real-time ingestion of event data. There are many advantages to this, but a lot of use cases only require event data to be processed in larger windows of data. That’s is the focus of the remainder of this article. I was surprised to find a lack of posts about the mechanics of dealing with event skew and reporting by event time in batch systems. So, I wanted to layout some repeatable patterns that can be used for this. As you probably know, event streams are essentially unbounded stream of logs. We often deal with this as a series of bounded datasets each representing some time period. Our main consideration here is a batched process that deals with large windows (15 min to 1 hour), but applies down to any level, since we almost always analyze event data by time in the target system. The Problems There are two main issues in dealing with this—Completeness and Restatement. Completeness—When event data can come in for some time past the end of a time window, it is very difficult to assess the completeness of the data. Most the data may arrive within a period (hour or day) of the time window. However, data may continue to trickle in for quite some time afterwards. This presents issues of Processing and combining data that arrives over time Determining a cutoff when data is considered complete As we can see in this figure, most event data is received in the few windows after the event time. However, data continues to trickle in, and in fact, 100% theoretical completeness may never be achieved! So, if we were to report on the event data at day 3 and at day 7 the results would be very different. Restatement—By this we mean the restatement of data that has arrived and been grouped by process time into our desired dimension of event time. This would not be an issue if we could simply scan through all the data each time we want to analyze it, but this becomes unworkable as the historical data grows. We need to find a way to process just the newly arrived data and combine it with the existing data. Other Requirements In addition, with dealing with our two main issues, we want to a solution that will Be Scalable – Any solution must be able to scale to large volumes of data, particularly as event history grows over time. Any solution that relies on completely reprocessing the entire set of data will quickly become unworkable. Provide the ability to reprocess data – Restating event data by Event Time is pretty straightforward if everything goes right. However, if we determine that source data was corrupt or needs to be reloaded for any reasons, things get messy. In that case, we potentially have data from multiple processing periods co-mingled for the same event time partition. So, to reprocess a process period, we need to separate out those rows for the process period and replace them, while leaving the other rows in the partition intact. Not always an easy task with HDFS! As an aside, to reprocess data, you need to keep the source around for a while. Pretty obvious, but just saying! Sample Use Case and Data For an example use case we will use events arriving for a mobile device representing streaming video viewing events. For this use case, we will receive a set of files hourly and place them in a landing folder in HDFS with an external Hive table laid on top. The processing (collection) time is stamped into the filename using the format YYYYMMDDHH-nnnn.txt. This external table will contain one period’s data at a time and serves as an initial landing zone. We are also going to assume that we need to save this data in detail, and that analysis will be done directly on the detailed data. Thus, we need to restate the data by event time in the detail store. Raw Input Source Format Of particular interest is the event_time columns which is an ISO timestamp in the form: YYYY-MM-DDTHH:MM:SS.sssZ CREATE EXTERNAL TABLE video_events_stg ( device_id string, event_type string, event_time string, play_time_ms bigint, buffer_time_ms bigint) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE LOCATION '/landing/video_events_stg'; https://raw.githubusercontent.com/screamingweasel/sample-data/master/schema/video_events_stg.hql Detailed Table Format CREATE TABLE video_events ( device_id string, event_type string, event_time string, play_time_ms bigint, buffer_time_ms bigint) PARTITIONED BY ( event_year string, event_month string, event_day string, event_hour string, process_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; wget https://raw.githubusercontent.com/screamingweasel/sample-data/master/schema/video_events.hql Sample Data I have put together three files, each containing one hour of processing data. You can pull them from GitHub and load the first hour into hdfs. mkdir -p /tmp/video cd /tmp/video wget https://raw.githubusercontent.com/screamingweasel/sample-data/master/video/2017011200-00001.txt wget https://raw.githubusercontent.com/screamingweasel/sample-data/master/video/2017011201-00001.txt wget https://raw.githubusercontent.com/screamingweasel/sample-data/master/video/2017011202-00001.txt hadoop fs -rm -r /landing/video_events_stg hadoop fs -mkdir -p /landing/video_events_stg hadoop fs -put /tmp/video/2017011200.00001.txt /landing/video_events_stg/ Solutions Let’s look at two possible solutions that meet our criteria above. The first utilizes Hive without the newer ACID features. The second post in this series details how to solve this using Hive ACID. Per our requirements, both will have to restate the data as it is ingested into the detailed Hive table and both must support reprocessing of data. Solution 1 This solution uses pure Hive and does not rely on the newer ACID transaction feature. As noted one hour’s worth of raw input may contain data from any number of event times. We want to reorganize this and store it in the detailed table partitioned by event time for easy reporting. This can be visualized as: Loading Restatement We are going to achieve this through Hive Dynamic Partitioning. Later versions of Hive (0.13+) support efficient dynamic partitioning that can accomplish this. Dynamic partitioning is, unfortunately, a bit slower than inserting to a static fixed partition. Our approach of incrementally ingesting should mitigate this, but you would need to benchmark this with your volume. set hive.exec.dynamic.partition.mode=nonstrict; set hive.optimize.sort.dynamic.partition=true; INSERT INTO TABLE default.video_events PARTITION (event_year, event_month, event_day, event_hour, process_time) SELECT device_id,event_type, CAST(regexp_replace(regexp_replace(event_time,'Z',''),'T',' ') as timestamp) as event_time, play_time_ms, buffer_time_ms, substr(event_time,1,4) AS event_year, substr(event_time,6,2) AS event_month, substr(event_time,9,2) AS event_day,substr(event_time,12,2) AS event_hour,substr(regexp_extract(input__file__name, '.*\/(.*)', 1),1,10) AS process_timeFROM default.video_events_stg; You can see from a “show partitions” that three partitions were created, one for each event time period. Show partitions default.video_events; event_year=2017/event_month=01/event_day=11/event_hour=21/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011200 Now let’s process the rest of the data and see the results: hadoop fs -rm -skipTrash /landing/video_events_stg/* hadoop fs -put /tmp/video/2017011201-00001.txt /landing/video_events_stg/ hive -f video_events_insert.hql hadoop fs -rm -skipTrash /landing/video_events_stg/* hadoop fs -put /tmp/video/2017011202-00001.txt /landing/video_events_stg/ hive -f video_events_insert.hql show partitions default.video_events; event_year=2017/event_month=01/event_day=11/event_hour=21/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011201 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011201 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011202 event_year=2017/event_month=01/event_day=12/event_hour=00/process_time=2017011201 event_year=2017/event_month=01/event_day=12/event_hour=00/process_time=2017011202 event_year=2017/event_month=01/event_day=12/event_hour=01/process_time=2017011202 select count(*) from default.video_events 3000 So, we can see that our new data is being nicely added by event time. Note that now there are multiple partitions for the event hour, each corresponding to a processing event. We will see how that is used in the next section. Reprocessing In order to reprocess input data for a specific process period, we need to be able to identify that data in the restated detail and remove it before reprocessing. The approach we are going to take here is to keep the process period as part of the partition scheme, so that those partitions can be easily identified. In this case, the partitioning would be: Event Year Event Month Event Day Event Hour Process Timestamp (concatenated) Ex. year=2017/month=01/day=10/hour=01/process_date=2017011202 year=2017/month=01/day=12/hour=01/process_date=2017011202 year=2017/month=01/day=12/hour=02/process_date=2017011202 This makes it fairly simple to reprocess a period of source data. 1.List all the partitions of the table and identify ones from the specific processing hour to be reprocessed. 2.Manually drop those partitions. 3.Restore the input data and reprocess the input data as normal Let’s assume that the data for hour 2017-01-12 01 was incorrect and needs reprocessed. From the show partitions statement, we can see that there are three partitions containing data from that processing time. event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011201 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011201 event_year=2017/event_month=01/event_day=12/event_hour=00/process_time=2017011201 Let’s drop ‘em and see what we get ALTER TABLE default.video_events DROP PARTITION (event_year='2017',event_month='01',event_day='1',event_hour='22',process_time='2017011201'); ALTER TABLE default.video_events DROP PARTITION (event_year='2017',event_month='01',event_day='11',event_hour='23',process_time='2017011201'); ALTER TABLE default.video_events DROP PARTITION (event_year='2017',event_month='01',event_day='12',event_hour='00',process_time='2017011201'); show partitions video_events; event_year=2017/event_month=01/event_day=11/event_hour=21/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011202 event_year=2017/event_month=01/event_day=12/event_hour=00/process_time=2017011202 event_year=2017/event_month=01/event_day=12/event_hour=01/process_time=2017011202 select count(*) from default.video_events 2000 Now, finally let’s put that data back and reprocess it. hadoop fs -rm -skipTrash /landing/video_events_stg/* hadoop fs -put /tmp/video/2017011201-00001.txt /landing/video_events_stg/ hive -f video_events_insert.hql show partitions default.video_events; event_year=2017/event_month=01/event_day=11/event_hour=21/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=22/process_time=2017011201 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011200 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011201 event_year=2017/event_month=01/event_day=11/event_hour=23/process_time=2017011202 event_year=2017/event_month=01/event_day=12/event_hour=00/process_time=2017011201 event_year=2017/event_month=01/event_day=12/event_hour=00/process_time=2017011202 event_year=2017/event_month=01/event_day=12/event_hour=01/process_time=2017011202 select count(*) from default.video_events 3000 Comments on this Solution One drawback of this solution is that you may end up with small files as event trickle in for older event times. For example, if you only get a handful of events that come in 4 weeks after the event time, you are going to get some very small files, indeed! Our next solution will overcome that issue by using Hive ACID. Conclusion When handling event data, we must always be aware of the skew between event time and processing time in order to provide accurate analytics. Our solution to restating the data in terms of event time must be scalable, performant, and allow for reprocessing of data. We looked at one solution using plain Hive and partitioning. In the next of this series we will look at Hive ACID transactions to develop a more advanced and simpler solution. Accompanying files can be found at: https://github.com/screamingweasel/articles/tree/master/event_processing_part_1

Jim_B · ‎02-23-2017

Yes, if you want to be more restrictive you could use the user hdfs or @hadoop to indicate any user in the hadoop group.

sze · ‎12-18-2016

Hi @jbarnett, In order to run HDFS balancer, the new conf dfs.internal.nameservices, which distinguishes internal and remote clusters, needs to be set so that Balancer will use it to locate the local file system. Alternatively, Balancer and distcp need not share the same conf since distcp may be used for multiple remote clusters. When adding a new remote cluster, we need to add it to the distcp conf. However, it does not make sense to change the Balancer conf. If we are going to use a separated conf for Balancer, we may put only one file system (i.e. the local fs but not the remote fs) in dfs.nameservices . As a summary, there are two ways to fix the conf. Set all the local and the remote file systems in dfs.nameservices and then set the local file system in dfs.internal.nameservices. The conf will work for both distcp and Balancer. Set only the local file system in dfs.nameservices in the Balancer conf. Use a different conf for distcp. Hope it helps.

elserj · ‎09-06-2017

"as per above my understanding is any user needs to have full permissions on the system tables while connecting to sqlline for the first time and then just granting read access on the system tables should help him re-establish the session." -- correct. "Also can you please point me to document that can provide information around restricting access via Ranger for Phoenix." -- I'd suggest you ask a new question for help on using Ranger. I am not familiar with the project.

achandra · ‎05-16-2018

So I identified that this is a bug which they will fix only in HDP 3. Looks like there is work around which worked on HDP 2.6.0 but it stopped working in 2.6.1. I upgraded my stack to 2.6.2 and it works fine now.

james_bashforth · ‎05-01-2018

I'm having this same problem. I recently move our cluster to Ubuntu. When using the previous Centos it was working fine. I have tried the case conversion options with no luck. I can however access everything if I add the user to ranger and not the group.

Online	Offline
Last Visited	‎11-12-2020 12:26 AM

Member Since	‎05-22-2019 10:28 AM
Last Visited	‎11-12-2020 12:26 AM
Posts	70
Kudos received	22

Cloudera Community

Re: Hive queries are failing in Ineractive quey HD...

Re: Disable Hive shell for user and provide access...

Re: Unable to get Nifi site-to-site RPG to balance...

Re: Ranger permissions to create temporary functio...

Re: Ambari client install performs unwanted JDK up...

Re: How do I access and query a HDFS table from Py...

Re: Unable to get Nifi site-to-site RPG to balance...

How to troubleshoot Hive UDTF functions

Re: Spark2 - Getting 'Could not find or load main ...

Bridging the gap between event and process date wi...

Re: Using HDFS Centralized Cache Management

Re: With two HA clusters configured for cross-clus...

Re: Phoenix security and initial system table crea...

Re: Ranger permissions to create temporary functio...

Re: Ranger Group Permissions issue - AD and SSSD