About cstanca

cstanca · ‎09-01-2016

@deepak sharma. Thanks. I am aware that audit to hdfs is enabled for ranger kakfa plugin, and the audit to hdfs is failing for kafka, that's how we extracted the exception, from Kafka logs. Let me check if symlink hdfs-site.xml in kafka conf does it. Stay tuned.

cstanca · ‎09-01-2016

@Praneender Vuppala Thanks. Got it. Let's go step by step. 1. Check that agent and server are up and running. If they are not started, try to start them. Attempt your scenario again as such we can capture some logged errors at the step 2. Write down the approximate time of your action, as such you can go over the logs more time-oriented. 2. Check ambari-server and ambari-agent logs in the specific hosts. You can find them under /var/log/ambari-server and, respectively, /var/log/ambari-agent cat ambari-server.log | grep ERROR cat ambari-agent.log | grep ERROR If it is too much, you could use something like tail tail -10 ambari-server.log | grep ERROR If your error occurred in a different day than the current one, then you would need to grep through the day log. In your case, if you re-attempted the failed action, that should not be the case. The error should be in the current day log. Post the findings to this question and we will take it from there.

cstanca · ‎08-31-2016

@Praneender Vuppala It is not clear from your description if you can ssh from ambari-server to the other host. You stated "I was able to ssh and telnet on port 8080 to the ambari-server from other host", which is irrelevant. The other direction is important.

cstanca · ‎08-31-2016

Question: For the issue described below: Does the data go to trash because the node is unavailable? What could cause this exception in the context of the recent cluster kerberos enabling? Issue Description Here is the issue an organization is facing with Kafka after recently enabling Kerberos in HDP 2.4.2 cluster. They are trying to build a pipeline from Data Center to HDFS. The data is first being mirrored to the cluster using Mirror Maker .8 as the Data Center uses kafka .8. The data is then avro serialized using a Flume agent and dumped into HDFS through the Confluent HDFS connector. However, from the MirrorMaker, they notice that only about half of the data is mirrored. Since Kerberos was enabled in their cluster, they are noticing the following error in the kafka logs: [2016-08-29 16:51:28,479] INFO Returning HDFS Filesystem Config: Configuration: core-default.xml, core-site.xml, hdfs-default.xml, hdfs-site.xml (org.apache.ranger.audit.destination.HDFSAuditDestination) [2016-08-29 16:51:28,496] ERROR Error writing to log file. (org.apache.ranger.audit.provider.BaseAuditHandler) java.lang.IllegalArgumentException: java.net.UnknownHostException: xyzlphdpd1 at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:406) at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:311) at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678) at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619) at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653) at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371) at org.apache.ranger.audit.destination.HDFSAuditDestination.getLogFileStream(HDFSAuditDestination.java:221) at org.apache.ranger.audit.destination.HDFSAuditDestination.logJSON(HDFSAuditDestination.java:123) at org.apache.ranger.audit.queue.AuditFileSpool.sendEvent(AuditFileSpool.java:890) at org.apache.ranger.audit.queue.AuditFileSpool.runDoAs(AuditFileSpool.java:838) at org.apache.ranger.audit.queue.AuditFileSpool$2.run(AuditFileSpool.java:759) at org.apache.ranger.audit.queue.AuditFileSpool$2.run(AuditFileSpool.java:757) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:356) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.ranger.audit.queue.AuditFileSpool.run(AuditFileSpool.java:765) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.UnknownHostException: xyzlphdpd1 ... 22 more [2016-08-29 16:51:28,496] ERROR Error sending logs to consumer. provider=kafka.async.summary.multi_dest.batch, consumer=kafka.async.summary.multi_dest.batch.hdfs (org.apache.ranger.audit.queue.AuditFileSpool)

cstanca · ‎08-31-2016

@sandrine G. Hive 2.1 is part of HDP 2.5 as tech preview only. It can be added from Ambari. HDP 2.5 was launched today. Please don't forget to vote/accept response that addressed your question.

cstanca · ‎08-31-2016

@Jaime On behalf of product marketing, management, and research and development, we are excited to announce that Hortonworks Data Platform 2.5 has been released. This new release continues to deliver against our accelerated release strategy, delivering community-driven innovations to enable organizations to maximize the value of data-at-rest and data-in-motion. Hortonworks Data Platform’s new capabilities around dynamic security, enterprise spark at scale, governance, and streamlined operations empower teams to further scale their big-data projects and modern data applications. Here is the “What’s New in HDP 2.5” information: What’s New in HDP 2.5 presentation, click https://hortonworks.box.com/s/d7ytvajew0dao2d7ozh59m02rqqffwus The Value of HDP 2.5 Feature Matrix, click https://hortonworks.box.com/s/zciulzhzlyy9w68kb8w0yi5u3k4nfqy6 As part of this new release, the marketing team has completed a top-to-bottom website and content refresh around our new capabilities. Please check out the following updates website material: What’s new section of HDP page: http://hortonworks.com/products/data-center/hdp/ Spark page: http://hortonworks.com/apache/spark Zeppelin page: http://hortonworks.com/apache/zeppelin/ HBase: http://hortonworks.com/apache/hbase/#section_4 Hive: http://hortonworks.com/apache/hive/#section_3 Ambari: http://hortonworks.com/apache/ambari/#section_3 Atlas: http://hortonworks.com/apache/atlas/#section_4 Atlas/Ranger integreation: http://hortonworks.com/solutions/security-and-governance/ Smartsense: http://hortonworks.com/products/subscriptions/smartsense/ The team has also published a series of informative blogs on the new release. Please leverage the following material: Integration of Apache Atlas and Apache Ranger to drive dynamic classification-based security Open and comprehensive approach to data governance with cross-component lineage What is Business Catalog and Why do you need one? What’s new Apache Spark 2.0? Apache Zeppelin: The Road Ahead Run Apache Hive Query 25X Faster and More Apache Hive with LLAP enables sub-second SQL on Hadoop (Technical Preview) Advanced visualization Dashboarding with Apache Ambari Incremental Backup and Restore for Apache HBase and Apache Phoenix Please don't forget to VOTE and ACCEPT the best answer.

cstanca · ‎08-31-2016

@shashi cheppela On behalf of product marketing, management, and research and development, we are excited to announce that Hortonworks Data Platform 2.5 has been released. This new release continues to deliver against our accelerated release strategy, delivering community-driven innovations to enable organizations to maximize the value of data-at-rest and data-in-motion. Hortonworks Data Platform’s new capabilities around dynamic security, enterprise spark at scale, governance, and streamlined operations empower teams to further scale their big-data projects and modern data applications. What’s new section of HDP page: http://hortonworks.com/products/data-center/hdp/ Spark page: http://hortonworks.com/apache/spark Zeppelin page: http://hortonworks.com/apache/zeppelin/ HBase: http://hortonworks.com/apache/hbase/#section_4 Hive: http://hortonworks.com/apache/hive/#section_3 Ambari: http://hortonworks.com/apache/ambari/#section_3 Atlas: http://hortonworks.com/apache/atlas/#section_4 Atlas/Ranger integreation: http://hortonworks.com/solutions/security-and-governance/ Smartsense: http://hortonworks.com/products/subscriptions/smartsense/ The team has also published a series of informative blogs on the new release. Please leverage the following material: Integration of Apache Atlas and Apache Ranger to drive dynamic classification-based security Open and comprehensive approach to data governance with cross-component lineage What is Business Catalog and Why do you need one? What’s new Apache Spark 2.0? Apache Zeppelin: The Road Ahead Run Apache Hive Query 25X Faster and More Apache Hive with LLAP enables sub-second SQL on Hadoop (Technical Preview) Advanced visualization Dashboarding with Apache Ambari Incremental Backup and Restore for Apache HBase and Apache Phoenix Please don't forget to VOTE and ACCEPT the best answer.

cstanca · ‎08-30-2016

@Sami Ahmad It would have helped if you could provide some sample data, but as I understand your data, you have a data payload with a key value and a variable number of parameters associated with that key. This is an usual problem of JSON or XML data format, but also with various logs as text. There is an important question to ask yourself: "how do I plan to access the data after I am storing it? Hive is an option, but variable number of attributes is more appropriate for a columnar database like HBase. However, Hive allows you store it as JSON, Avro in a text field and if you know how to parse it you can still achieve your goals. If you want each attribute to be a column and not have to deal with JSON or Avro parsing, then HBase columnar store is another option and you can use apache Phoenix for SQL in top of HBase. It depends on what type of queries you plan to execute. Anyhow, your question is about parsing and making sense of the data, that even before storing it. Let's focus on that. Assuming that your data is just plain text with structure described above, you have many ways to parse the data and format it, however, Hortonworks DataFlow includes apache NiFi which is an awesome tool to take your file split it by line and convert it to JSON, for example. That will include the key, as well as the variable payload. Once you have the data formated as JSON you can use another processor available in NiFi to post it to Hive or HBase. To learn more about NiFi: http://hortonworks.com/apache/nifi/ To see all available processors: https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.2.0.1/bk_Overview/content/index.html In your specific case, assuming your file is text, you would build a template and use processors like: FetchFile, SplitText, etc. and once you have the data in the proper format you can use PutHiveQL, PutHBaseJson ... Look at all processors to see how much productivity you can gain without programming, at most you would have to use Regex. Getting started: https://nifi.apache.org/docs/nifi-docs/html/getting-started.html Look at the following tutorials: http://hortonworks.com/apache/nifi/#tutorials. In your case, the log data tutorials seem close enough match: http://hortonworks.com/apache/nifi/#tutorials You can import several Nifi Templates from https://github.com/hortonworks-gallery/nifi-templates/tree/master/templates and learn even more. If this response was helpful, please vote/accept answer.

cstanca · ‎08-26-2016

@Kumar Veerappan Your question caption asked about dependent components. Your question description asked about list of jobs that use spark currently. I assume that you mean you actually meant spark applications (AKA jobs) running on the cluster. If you have access to Ambari, you could click on Yarn link then on Quick Links and then on Resource Manager UI. That assumes your Spark runs over Yarn. Otherwise, you could go directly to Resource Manager UI. You would need to know the IP Address of the server where ResourceManager runs, as well as the port. Default is 8088.

cstanca · ‎08-26-2016

@shashi cheppela Next week or so

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: Issue with Kafka after Kerberos was enabled: d...

Re: Cannot Register the other hosts in the cluster...

Re: Cannot Register the other hosts in the cluster...

Issue with Kafka after Kerberos was enabled: does ...

Re: when hive 2.0 will be available ?

Re: Release date of HDP 2.5

Re: Installing HDP 2.5 beta version

Re: reading data from a file in specific format

Re: Spark Upgrade - How to get dependent component...

Re: Installing HDP 2.5 beta version