Member since
06-13-2016
34
Posts
9
Kudos Received
0
Solutions
07-11-2018
05:45 PM
It's been a while since this question was asked. Does Knox now "play nicer" with Kafka? Regardless of volume, it would sure be nice to insulate sources that want to push data to the HDP/HDF ecosystem from kerberos.
... View more
12-26-2017
10:10 PM
Anyone find an answer to this? We're going to need to do something similar....
... View more
12-22-2017
04:36 PM
Is it practical/possible to have an ambari-managed client/edge node that can talk to 2 different hadoop clusters, on different versions of HDP? Scenario would be a 3rd-party ETL product being used for ingestion on a few edge nodes talking to cluster A, running HDP 2.4.2 on RHEL 6. We're standing up a new cluster B running HDP 2.6.x on RHEL 7, and need to "move" data ingest feeds from A to B. To minimize the provisioning of extra edge nodes (and perhaps additional upstream infrastructure) the preference would be to have edge nodes write to both clusters for a period of time. Anyone been thru something similar? Know of online resources describing options? Thoughts/suggestions?
... View more
Labels:
05-08-2017
07:50 PM
1 Kudo
In a nutshell, few were using it, and it really was just a bit easier-to-use skin over other tools (primarily Oozie); and as they attempted to add features, it was looking more like the underlying tools, so.... It will be interesting to see how the marketing/messaging shifts over the coming months. This related question and answer seems to suggest Atlas isn't going anywhere, and it seems Atlas and Oozie were the primary underpinnings of Falcon (I think): https://community.hortonworks.com/questions/97570/apache-falcon-in-hdp-30.html
... View more
04-04-2017
08:06 PM
I'll reach out to my account team, but it still seems odd to announce a component as deprecated when you can't point to an announced product that will be taking on its functionality. The whole point of deprecating is normally to give folks a heads-up they should start moving to the replacement....
... View more
04-03-2017
09:22 PM
1 Kudo
Documentation for HDP 2.6.0 was recently posted, and Falcon is marked as deprecated as of 2.6.0 with 3.0.0 listed as the Target Version for Removal: link to deprecation notice in release notes There are no comments indicating where Falcon's functionality is moving, so I'm wondering what are the plans for Falcon's functionality?...
... View more
03-22-2017
07:21 PM
Looks like openscoring also offers jpmml under a BSD license for a fee, see below. Unfortunately, it appears there's a gray area between "we just want to use the software" and "want to redistribute proprietary software based on this code." The wording of the attached blurb from openscoring suggests they think "use" of AGPL code is fine, even though the FSF stance seems to be that GNU AGPL is only compatible w/ GPL: https://www.gnu.org/licenses/why-affero-gpl.en.html
... View more
03-20-2017
01:40 PM
@ibalaji9, have you found a solution for redirecting the sqoop working/scratch directory? We're running into issues where users moving a lot of data w/ sqoop are bumping into personal quotas in their /user/ directory even though there is plenty of space in /tmp and in the destination folder where they're trying to land data. Thx, -Vince
... View more
03-17-2017
08:49 PM
hive.exec.scratchdir on this cluster is /tmp/hive. Don't know why the user appears to be exceeding quota on a personal directory.
... View more
01-30-2017
08:38 PM
Using HDP 2.4.2, Ambari 2.2.2.0, I see nimbus.authorizer: org.apache.ranger.authorization.storm.authorizer.RangerStormAuthorizer
... View more
01-30-2017
07:47 PM
nimbus.supervisor.users and nimbus.admins need to be added manually even when Ranger is being used? I'm in a group that has the following Permissions: Submit Topology, File Upload, Get Nimbus Conf, Get Cluster Info, File Download, Kill Topology, Rebalance, Activate, Deactivate, Get Topology Conf, Get Topology, Get User Topology, Get Topology Info. And 'Delegate Admin' is checked.
... View more
01-30-2017
06:59 PM
I was suspecting configs were not correct because trying to run hortonworks' WordCountTopology sample was not working. Here is what I was seeing: -bash-4.1$ storm jar storm-starter-0.0.1-storm-0.9.0.1.jar storm.starter.WordCountTopology WordCount Running: /opt/java/hotspot/7/64_bit/jdk1.7.0_79/bin/java -client -Ddaemon.name= -Dstorm.options= -Dstorm.home=/usr/hdp/2.4.2.0-258/storm -Dstorm.log.dir=/var/hadoop/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.4.2.0-258/storm/lib/cheshire-5.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/hadoop-auth-2.7.1.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/storm/lib/clojure-1.6.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/clj-stacktrace-0.2.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.4.2.0-258/storm/lib/oncrpc-1.0.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/jackson-core-2.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/clout-1.0.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-servlet-1.3.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-json-0.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/kryo-2.21.jar:/usr/hdp/2.4.2.0-258/storm/lib/jline-0.9.94.jar:/usr/hdp/2.4.2.0-258/storm/lib/tigris-0.1.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/reflectasm-1.07-shaded.jar:/usr/hdp/2.4.2.0-258/storm/lib/tools.namespace-0.2.4.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-devel-1.3.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/java.classpath-0.2.2.jar:/usr/hdp/2.4.2.0-258/storm/lib/javax.servlet-2.5.0.v201103041518.jar:/usr/hdp/2.4.2.0-258/storm/lib/compojure-1.1.3.jar:/usr/hdp/2.4.2.0-258/storm/lib/core.incubator-0.1.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-core-1.1.5.jar:/usr/hdp/2.4.2.0-258/storm/lib/gmetric4j-1.0.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/ns-tracker-0.2.2.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.4.2.0-258/storm/lib/commons-codec-1.6.jar:/usr/hdp/2.4.2.0-258/storm/lib/disruptor-2.10.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/asm-4.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/zookeeper.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/storm-core-0.10.0.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/storm/lib/tools.logging-0.2.3.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/hiccup-0.3.6.jar:/usr/hdp/2.4.2.0-258/storm/lib/minlog-1.2.jar:/usr/hdp/2.4.2.0-258/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/jackson-dataformat-smile-2.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-jetty-adapter-1.3.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/clj-time-0.8.0.jar:storm-starter-0.0.1-storm-0.9.0.1.jar:/usr/hdp/current/storm-supervisor/conf:/usr/hdp/2.4.2.0-258/storm/bin -Dstorm.jar=storm-starter-0.0.1-storm-0.9.0.1.jar storm.starter.WordCountTopology WordCount
18:49:19.481 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
18:49:19.551 [main] INFO b.s.u.Utils - Using storm.yaml from resources
18:49:19.611 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
18:49:19.631 [main] INFO b.s.u.Utils - Using storm.yaml from resources
18:49:19.648 [main] INFO b.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -6595191808170807148:-7705041539986139533
18:49:19.649 [main] INFO b.s.s.a.AuthUtils - Got AutoCreds []
18:49:19.664 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:49:19.723 [main] WARN b.s.s.a.k.ClientCallbackHandler - Could not login: the client is being asked for a password, but the client code does not currently support obtaining a password from the user. Make sure that the client is configured to use a ticket cache (using the JAAS configuration setting 'useTicketCache=true)' and restart the client. If you still get this message after that, the TGT in the ticket cache has expired and must be manually refreshed. To do so, first determine if you are using a password or a keytab. If the former, run kinit in a Unix shell in the environment of the user who is running this client using the command 'kinit <princ>' (where <princ> is the name of the client's Kerberos principal). If the latter, do 'kinit -k -t <keytab> <princ>' (where <princ> is the name of the Kerberos principal, and <keytab> is the location of the keytab file). After manually refreshing your cache, restart this client. If you continue to see this message after manually refreshing your cache, ensure that your KDC host's clock is in sync with this host's clock.
18:49:19.725 [main] ERROR b.s.s.a.k.KerberosSaslTransportPlugin - Server failed to login in principal:javax.security.auth.login.LoginException: No password provided
javax.security.auth.login.LoginException: No password provided .... Now that I've tried passing the path to the client_jaas.conf on the command line, it seems things get a little further, and there is a different error: -bash-4.1$ storm jar storm-starter-0.0.1-storm-0.9.0.1.jar storm.starter.WordCountTopology WordCount -c java.security.auth.login.config=/etc/storm/conf/client_jaas.conf Running: /opt/java/hotspot/7/64_bit/jdk1.7.0_79/bin/java -client -Ddaemon.name= -Dstorm.options=java.security.auth.login.config%3D%2Fetc%2Fstorm%2Fconf%2Fclient_jaas.conf -Dstorm.home=/usr/hdp/2.4.2.0-258/storm -Dstorm.log.dir=/var/hadoop/log/storm -Djava.library.path=/usr/local/lib:/opt/local/lib:/usr/lib:/usr/hdp/current/storm-client/lib -Dstorm.conf.file= -cp /usr/hdp/2.4.2.0-258/storm/lib/cheshire-5.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-core-2.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/hadoop-auth-2.7.1.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/storm/lib/clojure-1.6.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/clj-stacktrace-0.2.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/servlet-api-2.5.jar:/usr/hdp/2.4.2.0-258/storm/lib/oncrpc-1.0.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/jackson-core-2.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/clout-1.0.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-servlet-1.3.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-json-0.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/kryo-2.21.jar:/usr/hdp/2.4.2.0-258/storm/lib/jline-0.9.94.jar:/usr/hdp/2.4.2.0-258/storm/lib/tigris-0.1.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/reflectasm-1.07-shaded.jar:/usr/hdp/2.4.2.0-258/storm/lib/tools.namespace-0.2.4.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-devel-1.3.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/java.classpath-0.2.2.jar:/usr/hdp/2.4.2.0-258/storm/lib/javax.servlet-2.5.0.v201103041518.jar:/usr/hdp/2.4.2.0-258/storm/lib/compojure-1.1.3.jar:/usr/hdp/2.4.2.0-258/storm/lib/core.incubator-0.1.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-core-1.1.5.jar:/usr/hdp/2.4.2.0-258/storm/lib/gmetric4j-1.0.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/ns-tracker-0.2.2.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-over-slf4j-1.6.6.jar:/usr/hdp/2.4.2.0-258/storm/lib/commons-codec-1.6.jar:/usr/hdp/2.4.2.0-258/storm/lib/disruptor-2.10.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/asm-4.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/zookeeper.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-slf4j-impl-2.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/storm-core-0.10.0.2.4.2.0-258.jar:/usr/hdp/2.4.2.0-258/storm/lib/tools.logging-0.2.3.jar:/usr/hdp/2.4.2.0-258/storm/lib/log4j-api-2.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/hiccup-0.3.6.jar:/usr/hdp/2.4.2.0-258/storm/lib/minlog-1.2.jar:/usr/hdp/2.4.2.0-258/storm/lib/slf4j-api-1.7.7.jar:/usr/hdp/2.4.2.0-258/storm/lib/jackson-dataformat-smile-2.3.1.jar:/usr/hdp/2.4.2.0-258/storm/lib/ring-jetty-adapter-1.3.0.jar:/usr/hdp/2.4.2.0-258/storm/lib/clj-time-0.8.0.jar:storm-starter-0.0.1-storm-0.9.0.1.jar:/usr/hdp/current/storm-supervisor/conf:/usr/hdp/2.4.2.0-258/storm/bin -Dstorm.jar=storm-starter-0.0.1-storm-0.9.0.1.jar storm.starter.WordCountTopology WordCount
18:54:08.293 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
18:54:08.362 [main] INFO b.s.u.Utils - Using storm.yaml from resources
18:54:08.421 [main] INFO b.s.u.Utils - Using defaults.yaml from resources
18:54:08.441 [main] INFO b.s.u.Utils - Using storm.yaml from resources
18:54:08.458 [main] INFO b.s.StormSubmitter - Generated ZooKeeper secret payload for MD5-digest: -6645485375203566088:-8607446551035289369
18:54:08.459 [main] INFO b.s.s.a.AuthUtils - Got AutoCreds []
18:54:08.474 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:54:08.532 [main] INFO o.a.s.z.Login - successfully logged in.
18:54:08.782 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:54:08.783 [main] INFO o.a.s.z.Login - successfully logged in.
18:54:08.895 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:54:08.896 [main] INFO o.a.s.z.Login - successfully logged in.
18:54:09.015 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:54:09.017 [main] INFO o.a.s.z.Login - successfully logged in.
18:54:09.137 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:54:09.138 [main] INFO o.a.s.z.Login - successfully logged in.
18:54:09.261 [main] INFO b.s.u.StormBoundedExponentialBackoffRetry - The baseSleepTimeMs [2000] the maxSleepTimeMs [60000] the maxRetries [5]
18:54:09.262 [main] INFO o.a.s.z.Login - successfully logged in.
Exception in thread "main" java.lang.RuntimeException: AuthorizationException(msg:fileUpload is not authorized)
at backtype.storm.StormSubmitter.submitJarAs(StormSubmitter.java:399)
at backtype.storm.StormSubmitter.submitTopologyAs(StormSubmitter.java:229)
at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:271)
at backtype.storm.StormSubmitter.submitTopology(StormSubmitter.java:157)
at storm.starter.WordCountTopology.main(WordCountTopology.java:77)
Caused by: AuthorizationException(msg:fileUpload is not authorized)
at backtype.storm.generated.Nimbus$beginFileUpload_result$beginFileUpload_resultStandardScheme.read(Nimbus.java:13616)
at backtype.storm.generated.Nimbus$beginFileUpload_result$beginFileUpload_resultStandardScheme.read(Nimbus.java:13594)
at backtype.storm.generated.Nimbus$beginFileUpload_result.read(Nimbus.java:13536)
at org.apache.thrift7.TServiceClient.receiveBase(TServiceClient.java:78)
at backtype.storm.generated.Nimbus$Client.recv_beginFileUpload(Nimbus.java:462)
at backtype.storm.generated.Nimbus$Client.beginFileUpload(Nimbus.java:450)
at backtype.storm.StormSubmitter.submitJarAs(StormSubmitter.java:370)
... 4 more
... View more
01-30-2017
06:36 PM
Based on additional digging since I posted this question, it looks like the HWX docs are wrong to refer to port 6667, it appears 6627 is the default and what is currently set on my cluster. I'm still fuzzy on whether I should change java.security.auth.login.config in storm.yaml to point to client_jaas.conf rather than storm_jaas.conf, and whether that would mean the node where this change is made would ONLY function as a client.
... View more
01-30-2017
04:07 PM
I've read the documentation https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-storm-ambari/content/ch_secure-storm-designating-node.html, but am confused. The section in the link above "Use an Existing Storm Node" mentions creating a .storm directory for each user, but doesn't say whether or not anything should be put in the new directory. Also, the next step talks about Adding settings to /etc/storm/conf/storm.yaml, but it looks like all of those settings already exist, but 2 have different values: setting current value docs suggest nimbus.thrift.port 6627 6667 java.security.auth.login.config '/usr/hdp/current/storm-supervisor/conf/storm_jaas.conf' "/etc/storm/conf/client_jaas.conf" It's unclear to me whether or not I should change the settings to match what the docs suggest. The port difference could just be how it happens to be set on my cluster, or it could be that a different port is used by the client vs. storm processes communicating with each other. If I make the java.security change, with this node only be usable as a client?
... View more
Labels:
01-23-2017
06:19 PM
I think @Eric Brosch's question is around multi-tenancy... I found the following link, but none of the answers really get to the details of running topologies in an enterprise multi-tenant environment: https://community.hortonworks.com/questions/1705/storm-multi-tenancy-best-practices.html The primary recommendations seem to be that one must 1. have a secure cluster and 2. set supervisor.worker.run.as.user to true. In the docs I've seen, it's not clear whether there's a good way to have groups of users where they can manage topologies within the group, but not mess with topologies belonging to another group.
... View more
10-31-2016
02:58 PM
I thought those 2 settings pre-dated the introduction of ACID tables. I can understand the "External tables cannot be ACID tables..." part, but I would think those settings could be used to allow users to issue an "exclusive lock" on an external table to prevent reading from it thru hive while external jobs manipulate the underlying files....
... View more
10-27-2016
07:51 PM
From what I'm hearing from other sources this answer was inaccurate and totally fails to take into consideration how our cluster is being used. I disagree with it being tagged "best answer".
... View more
10-04-2016
06:18 PM
Appreciate the quick response, but if that jira is out there, I'm not seeing it.
... View more
10-04-2016
05:43 PM
1 Kudo
Beeline has had an issue with column widths as noted in https://issues.apache.org/jira/browse/HIVE-14135. This can make a simple dump of settings take 53k+ because each line is over 33k characters long. I'm wondering if this jira has already been incorporated into a released version of HDP, and if not, when it can be expected. Thx, -Vince
... View more
Labels:
09-07-2016
07:37 PM
Looks like the features I was after have been back-ported. It gets confusing when the primary documentation hortonworks points users to is the Apache docs, which states these features are in 1.6... I don't understand why hwx seems to be avoiding 1.6.
... View more
09-06-2016
07:38 PM
Any word on when Flume 1.6 is likely be rolled into HDP?
I haven't switched to 2.5.0 yet, but was disappointed to see the release notes indicate flume 1.5.2 is still the latest in HDP 2.5 even tho flume 1.6 has been out for over a year....
... View more
09-01-2016
09:57 PM
1 Kudo
We seem to be getting bitten by HIVE-10809, where pig scripts using org.apache.hive.hcatalog.pig.HCatStorer are leaving behind empty _scratch directories with every run. Does anyone have a suggested workaround, or someone in the HWX community working on pig have any update on this jira? (It has a status of patch available, but hasn't been updated since May 2015.)
... View more
Labels:
09-01-2016
05:47 PM
2 Kudos
I've been asked to set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager and hive.support.concurrency = true, because a subset of users is concerned about dirty reads on an external table while an external job runs to consolidate small files within a partition, so they want to do an exclusive lock during the consolidation.... Anyone no of a reason I should be wary of the above settings? Is there potential for performance impacts for other jobs/users that might have had no need for the above settings? I guess another question would be does "lock table" even work on an external table? Thx, -Vince
... View more
Labels:
09-01-2016
05:33 PM
I think the question above says it all, but to add more background, we have pig scripts landing data in ORC files hourly and are ending up with numerous small files in a partition directory. alter table partition concatenate appears to run (we end up with different file names in the partition folder), but doesn't seem to be actually concatenating into fewer files. I'm wondering if the problem is the fact that these are external tables.... but the fact that files within the partition are getting renamed suggests it's not. Thoughts?
... View more
Labels:
09-01-2016
05:08 PM
I'm interested in the telecom use case, too.... We're dealing with hourly ingests that result in a number of small files we'd like regularly compacted...
... View more
08-23-2016
05:04 PM
I think our hive-env template is missing the section related to setting the heapsize.... We're reaching out to support for a fresh copy of that.
... View more
08-23-2016
04:44 PM
1 Kudo
Ambari's hive config includes an entry for "Metastore Heap Size" and the context help says this corresponds to hive.metastore.heapsize, but I can't find any other reference to this parameter in Apache or any hadoop vendor's documentation. Is this actually a parameter? Where is it set? The value we are setting in Ambari does not appear to be affecting the actual heap being used on our server. (Ambari 2.2.2.0)
... View more
Labels:
08-16-2016
01:30 PM
1 Kudo
Unfortunately, my group was using external tables as easier way to deal with quotas in a "multi-tenant" cluster and impose some governance on hive. (i.e. Most users/groups can only create external tables, and the files need to be landed in their assigned folder in HDFS. DBAs control internal tables in hive.) Somewhere, we missed the "basic premise" that the data in external tables won't change....
... View more
08-15-2016
09:00 PM
1 Kudo
A user recently asked about locking hive tables to make sure reads are consistent, and that led me to the Apache documentation on hive transactions where I saw the following: External tables cannot be made ACID tables since the changes on external tables are beyond the control of the compactor (HIVE-13175). This leads me to wonder whether updated/comprehensive documentation exists on the differences between internal and external tables in hive. Traditionally, the explanation of the difference between the two has been that hive maintains both the data and metadata with internal tables, so dropping an internal table will drop the data and metadata, while dropping an external table will only drop the metadata, but otherwise, they're functionally equivalent. The note above regarding ACID/transactions suggests internal and external table capabilities/features are diverging.... Thoughts? Thanks in advance!
... View more
- Tags:
- Data Processing
- Hive
Labels:
08-11-2016
04:28 PM
@Vamsi Jonnadula, Did this get resolved? We're seeing issues with small ORC files within a partition as well.
... View more