About nsabharwal

amoghsuman · ‎01-10-2018

@Rupinder Singh Can you please elaborate the exact solution to this problem ? I am facing the same issue..

kgautam · ‎10-23-2017

"Follow that path a bit further and you'll find that the impact on the Namenode is exacerbated with all of the intermediate files generated by the mapper for the shuffle/sort phases". Is the Namenode even aware of the files generated in you local filesystem during the shuffle short phase ???

nsabharwal · ‎02-07-2016

Customer: Is Hadoop Enterprise Ready? Me: Standing next to the whiteboard, Yes and that's why we use the term "Enterprise Ready Data Lake" Imagine that there are 3 points Point 1 -> You need to prove your identity to get access to Lake and then need permissions or authority to access data. Point 2 -> Once you proved your authenticity then demands comes to manage the lifecycle of data from it's requirement to retirement "Automated process" Point 3 -> Life Cycle Management process needs to be integrated with a Governance solution to manage data of data "metadata" , data lineage, auditing and more to fullfil security and compliance requirement. Point 1 --> Entry Point: You must have strong Authentication in place to get into the system and more users will be coming in to access data as we move away from silos of data to a centralized repository. The access management must be easier to manage i,e Security solution should have a centralized place toAdmin (create, define and manage) security policies. Once users gets in and has access then we need to track their actions and that's Auditing. At last, Data Encryption in motion & at rest Point 2 --> Security is place and now we know that Data ingestion is occurring with full security. Now, business wants to manage the lifecycle of data in one common place "Data replication, retention, handling late data arrival rules, data mirroring and visualize the complete data pipeline" Point 3 --> Once data lifecycle management in place then we will be generating more data of data "metadata" and there is existing legacy metadata that need to be exchange with Hadoop system. This generates the requirement of Data Governance solution. This solution should provide complete data lineage, exchange, search functionality Customer: Yes, this is exactly what we are looking for. All this must be well integrated and please provide this as 100% open source but enterprise ready solution. Solution: Security Data Lifecycle Management Data Governance Happy Hadooping!!! Kerberos is must in production

nsabharwal · ‎02-07-2016

@Rupinder Singh Adding to this sudo su - hdfs hdfs dfs -mkdir -p /user/root hdfs dfs -chown -R root:hdfs /user/root

nsabharwal · ‎02-07-2016

@Ned Shawa See this slideshare

koosha_tahmaseb · ‎06-16-2016

Thanks for your comment. I just solved the problem after 2 days of struggling. There reason was the proxy settings set on my machine by the company I work for. I just added 'sandbox.hortonworks.com' domain name to the proxy bypass list. Also, in order to make webhdfs connection to sandbox from another CentOS VM I added 'sandbox.hortonworks.com' to no_proxy variable at /etc/bashrc of the CentOS and it worked! Thanks 🙂

nsabharwal · ‎02-06-2016

Problem: File"/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140,in _call_wrapper result = _call(command,**kwargs_copy) File"/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291,in _call raiseFail(err_msg) resource_management.core.exceptions.Fail:Execution of 'yarn resourcemanager -format-state-store' returned 255.15/10/2616:11:16 INFO resourcemanager.ResourceManager: STARTUP_MSG: 15/10/2616:11:17 INFO recovery.ZKRMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread thread interrupted!Exiting! 15/10/2616:11:17 INFO zookeeper.ZooKeeper:Session:0x150a4b3429b0002 closed 15/10/2616:11:17 FATAL resourcemanager.ResourceManager:Error starting ResourceManager org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directorynot empty for/rmstore/ZKRMStateRoot/RMAppRoot at org.apache.zookeeper.KeeperException.create(KeeperException.java:125) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1049) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.recursiveDeleteWithRetriesHelper(ZKRMStateStore.java:1045) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.access$500(ZKRMStateStore.java:89) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1032) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$10.run(ZKRMStateStore.java:1029) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithCheck(ZKRMStateStore.java:1104) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$ZKAction.runWithRetries(ZKRMStateStore.java:1125) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteWithRetries(ZKRMStateStore.java:1029) at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.deleteStore(ZKRMStateStore.java:825) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.deleteRMStateStore(ResourceManager.java:1267) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1190) 15/10/2616:11:17 INFO zookeeper.ClientCnxn:EventThread shut down 15/10/2616:11:17 INFO resourcemanager.ResourceManager: SHUTDOWN_MSG: Solution: Error details: FATAL resourcemanager.ResourceManager:Error starting ResourceManager org.apache.zookeeper.KeeperException$NotEmptyException:KeeperErrorCode=Directory not empty for /rmstore/ZKRMStateRoot/RMAppRoot Please see this. In my case, I have all the application data sitting under that particular location [zk: localhost:2181(CONNECTED) 2] ls /rmstore/ZKRMStateRoot/RMAppRoot [application_1445593412630_0002, application_1445593412630_0001, application_1445366030467_0002, application_1445366030467_0001, application_1445366030467_0004, application_1445366030467_0003, application_1445593412630_0006, application_1445366030467_0005, application_1445593412630_0005, application_1445593412630_0004, application_1445593412630_0003, application_1445173693339_0006, application_1445173693339_0005, application_1445173693339_0004, application_1445173693339_0003, application_1445173693339_0002, application_1445173693339_0001, application_1445394313024_0004, application_1445394313024_0003, application_1445394313024_0002, application_1445394313024_0001, application_1445394313024_0008, application_1445394313024_0007, application_1445394313024_0006, application_1445394313024_0005] [zk: localhost:2181(CONNECTED) 3] quit Quitting... [zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot [zk: localhost:2181(CONNECTED) 4] ls /rmstore/ZKRMStateRoot/RMAppRoot Node does not exist: /rmstore/ZKRMStateRoot/RMAppRoot Restart Yarn and I got the location back [zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot [] [zk: localhost:2181(CONNECTED) 7] [zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot [AMRMTokenSecretManagerRoot, RMAppRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode] [zk: localhost:2181(CONNECTED) 8] You can try this but if you are not sure or its prod then open support ticket. "Consult support before doing this in production"

nsabharwal · ‎02-07-2016

@Rainer Geissendoerfer Thats the easy fix and its because somewhere in the code , it may be looking for localhost:2181

nsabharwal · ‎02-06-2016

OLAP (Online Analytical Processing) is the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning. OLAP is an acronym for Online Analytical Processing. OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling. It is the foundation for may kinds of business applications for Business Performance Management, Planning, Budgeting, Forecasting, Financial Reporting, Analysis, Simulation Models, Knowledge Discovery, and Data Warehouse Reporting. OLAP enables end-users to perform ad hoc analysis of data in multiple dimensions, thereby providing the insight and understanding they need for better decision making. Source OLAP solutions Open source Apache Kylin http://kylin.apache.org/ Apache Kylin™ is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc. Extremely Fast OLAP Engine at Scale: Kylin is designed to reduce query latency on Hadoop for 10+ billions of rows of data - ANSI SQL Interface on Hadoop: Kylin offers ANSI SQL on Hadoop and supports most ANSI SQL query functions - Interactive Query Capability: Users can interact with Hadoop data via Kylin at sub-second latency, better than Hive queries for the same dataset - MOLAP Cube: User can define a data model and pre-build in Kylin with more than 10+ billions of raw data records - Seamless Integration with BI Tools: Kylin currently offers integration capability with BI Tools like Tableau. Integration with Microstrategy and Excel is coming soon - Other Highlights: - Job Management and Monitoring - Compression and Encoding Support - Incremental Refresh of Cubes - Leverage HBase Coprocessor for query latency - Approximate Query Capability for distinct Count (HyperLogLog) - Easy Web interface to manage, build, monitor and query cubes - Security capability to set ACL at Cube/Project Level - Support LDAP Integration Druid http://druid.io/druid.html Druid is an open source data store designed for OLAP queries on event data. This page is meant to provide readers with a high level overview of how Druid stores data, and the architecture of a Druid cluster. This data set is composed of three distinct components. If you are acquainted with OLAP terminology, the following concepts should be familiar. Timestamp column: We treat timestamp separately because all of our queries center around the time axis. Dimension columns: Dimensions are string attributes of an event, and the columns most commonly used in filtering the data. We have four dimensions in our example data set: publisher, advertiser, gender, and country. They each represent an axis of the data that we’ve chosen to slice across. Metric columns: Metrics are columns used in aggregations and computations. In our example, the metrics are clicks and price. Metrics are usually numeric values, and computations include operations such as count, sum, and mean. Also known as measures in standard OLAP terminology. Commercial Atscale http://www.atscale.com/ AtScale turns your Hadoop cluster into scale-out OLAP server. Now you can use your BI tool of choice – from Tableau to Microstrategy to Microsoft Excel – to connect to and query data in Hadoop, with no extra layers in between. Dynamic, virtual cubes present complex data as simple measures and dimensions Support for virtually any BI tool that can talk SQL or MDX Analyze billions of rows of data directly on your Hadoop cluster Eliminate need for costly data marts, extracts, and custom cubes Consistent metric definitions across all users, regardless of BI Kyvos Insights http://www.kyvosinsights.com/solution The cubes Kyvos can build and run on Hadoop are orders of magnitude bigger than what could be built on traditional OLAP gear. Instead of getting rid of the granular level of detail that would ordinarily be summarized or aggregated in a traditional OLAP setup, Kyvos can build a specific dimension for each column or field, whether it’s an individual customer or an individual SKU (stock keeping unit). Source Cloud option Source With Altiscale Data Cloud, the AtScale Intelligence Platform runs on top of enterprise-grade Hadoop in the cloud, reducing time to value, lowering costs and eliminating implementation risk. Since Altiscale runs a complete Hadoop ecosystem for its customers, it also eliminates one of Hadoop’s greatest challenges: ongoing operational risk. This allows customers to focus on their business goals without losing time and effort to the ongoing burden of Hadoop management.

nsabharwal · ‎02-12-2016

@Juan Manuel Perez login as root in your server su - hdfs hdfs dfs -mkdir -p /user/root hdfs dfs -chown -R root:hdfs /user/root

Online	Offline
Last Visited	‎07-18-2019 05:10 PM

Member Since	‎09-18-2015 05:49 PM
Last Visited	‎07-18-2019 05:10 PM
Posts	3,274
Kudos received	1129

Cloudera Community

Re: Is Ranger KMS Encryption FIPS 140-2 compliant ...

Re: How to add another HiveServer for current meta...

Re: FQDNs - are they necessary?

Re: java.io.FileNotFoundException: (Is a director...

Re: Need Design/Architecture Suggestion on HDP & H...

Re: pig script status running but always remain at...

Re: Small Files in Hadoop

Security in "Enterprise Ready" Data Lake

Re: hive view not exploring default databases ?

Re: How can I prevent end-users from using Hadoop ...

Re: adding sandbox.hortonworks.com to /etc/hosts d...

How To Best Resolve - RMStateStore FENCED?

Re: Processing Real-time events with Apache Storm ...

OLAP in Hadoop - Introduction ( Part 1 )

Re: :32: error: value format is not a member of or...