About mkumar13

rich1 · ‎07-20-2016

I wonder if they were just having system errors at the time you tried to login. I just logged in fine. If you are still having the issue, you will need to contact PSI. They will be able to assist: http://it.psionline.com/contact-us/support/

ashneesharma88 · ‎07-25-2016

I advice to use cent os 6.

mkumar13 · ‎07-19-2016

Apache Shiro design is intuitive and simple way to ensure the safety of the application. For more detail on Apache Shiro project go to http://shiro.apache.org/what-is-shiro.html Software design is generally based on user stories to achieve, that is, based on how users interact with the system to design the user interface or service API. For example, a user story will be displayed after a user logs on a button to view personal account information, if the user is not registered, it displays a registration button. This user story implies major application user needs to be accomplished. Even if the user is not a person here, but third-party systems, when coding is also interacting with the system as "user" to deal with. Apache Shiro reaction of this concept in their design by the intuitive notion exposed to developers so that Apache Shiro in almost all applications are easy to use. Outline: Shiro has three main top-level concept: Subject, SecurityManager, Realms. The following diagram describes the interactions between these concepts, the following will introduce 11 to do. Subject: Subject is a microcosm of the current user in the security field. User usually implied meaning people, and Subject can be people, may be a third-party service, the guardian accounts, corn operation. Or any interaction with the system can be called Subject. Subject all instances must be bound to a SecurityManager, so that when interacting with the Subject, in fact, has been transformed into this SecurityManager Subject associated interact. SecurityManager: SecurityManager Shiro as the core framework, as in the form of "umbrella" object exists, it coordinated its internal security components forming an object graph. Upon the completion of its internal configuration SecurityManager and objects in the application, SecurityManager will take a back seat, developers use most of the time in the Subject API. Then-depth understanding of SecurityManager, Again: When interaction with the Subject, in fact, hidden behind heavy Subject SecurityManager responsible for safe operation. This point in the above figure also reflected. Realms: Realms as a bridge Shiro and security applications between data sources or connectors. When you need to get the user accounts for authentication (login) or authorization (access control), Shiro will find in the application configuration is responsible for this work Realm (one or more) to obtain complete data security. In this sense, Realm is essentially a security-related Dao: It encapsulates the details of the data source of the link, and provide data in accordance with the needs of Shiro. When you configure Shiro, the authentication and authorization to provide at least one Reaml. You can configure multiple Realm, but at least one. Shiro built a number of security can be connected to a data source Realm, such as LDAP, relational databases (JDBC), similar to the INI text configuration files resources and properties. If the built-in Realm can not meet the demand, you can also use the custom data source on behalf of their own Realm implementation. And other internal components, like, SecurityManager management Realm how to obtain related Subject security and identity information. The following figure shows the core concepts Shiro framework, followed by a brief description will be made eleven: Original link: http://shiro.apache.org/architecture.html

mkumar13 · ‎07-19-2016

This article is first series of three articles, next coming articles with some code and mechanism present in latest version of HBase supporting HBase Replication. HBase Replication Hbase Replication solution can solve the cluster security, data security, read and write separation, operation and maintenance, and the guest operating errors, and so the ease of management and configuration, provide powerful online applications support. Hbase replication currently used in the industry are rare, because there are many aspects, such as HDFS has multiple backup copies in a way to help security HBASE underlying data, and the relatively small number of companies in the cluster size. Another reason the data is not very high degree of importance, such as some logging system or as a second warehouse of historical data to split a large number of read requests. Such data lost to be present or back up at other places (database cluster). For such cases the Slave Replication cluster become dispensable, the fundamental importance not reflected. Therefore in hbase management platform a low level of security and essential services is area of concern and following discussion of Replication cluster cannot waste time to read. Currently on HBase exists very important applications, both online and off-line applications. So security Hbase data also appears very important. For the problems often come from a single cluster are following:- Failure data managers, irreversible DDL operations. BLOCK underlying HDFS file block corruption Excessive short-term pressure on the cluster read data caused by adding servers to deal with this situation is more a waste of resources. System upgrades, maintenance, diagnose problems will cause the cluster unavailable time to grow. Double the atomic difficult to guarantee. Unpredictable for some reason. (Eg engine room off, large-scale hardware damage, disconnection, etc.) Impact of MR computing offline applications cause larger delay on-line literacy. If you worry about the above questions, then, Replication main cluster is a good choice, and we are in this area to do some simple research. By simply following the problems we encountered in the use and methods taken. It is popular online backup comparison program For backup solutions to a redundant data center there are several angles to analyze like current consistency, transactional delay, throughput, data loss and Failover we have currently several options:- Simple Backup: - Simple backup mode where timing of Dump the cluster is scheduled, usually by snapshot to set the timestamp. We can make an elegant design too for on-line data center with low interference or no interference. However, this scheme is have some disadvantages like just before the time point of snapshot if unexpected events occur inevitable lead to data loss of entire duration, as many people cannot accept that. Master-slave mode: -Master-slave mode (Master-the Slave) This model is simple compared to a lot more advantages backup mode, you can ensure data consistency eventual consistency, data from the primary cluster to the standby cluster low latency, asynchronous writes will not the primary cluster to bring pressure on performance, how much will have a minimal impact on performance, incident comes less data loss, and the main cluster in the standby cluster can also be guaranteed. Usually by constructing better Log system plus check Point to achieve, can read and write separation, the primary cluster can act as reader services, but only to prepare clusters generally bear reading services. Master master mode: - Master master mode (Master-Master) principle is similar to the overall master-slave mode, the difference is two clusters can take each other to write separation, can bear to read and write services. Two -phase commit:- Two phase commit such programs to ensure consistency and strong transaction, the server returned to the client successfully indicates that certain data has been successfully backed up, it will not cause any data loss.Each server can bear to read and write services. But the disadvantage is the delay caused by cluster higher overall throughput decreases. Paxosalgorithm: - Paxos algorithm based on Paxos strong consistency algorithm program implementation, the same client connection server to ensure data consistency. The disadvantage is complex, latency and throughput clusters with different clustered servers. Hbase simple backup mode if the table is not online relatively easy to handle, you can copy table or distcp or spapshot table. If the table is online and offline cannot be allowed only through snapshot scheme online table implement a backup. Hbase Replication master-slave mode equipment by specifying the cluster will send Hlog data asynchronously to the standby inside the cluster, basically no performance impact on the primary cluster, the data delay time is shorter. Main cluster provides literacy services to prepare the cluster to provide reading services. If the primary cluster fails, you can quickly switch to the backup cluster. We can look back to Hbase backup status, Hbase can offer the online backup and offline backup through the above simple backup mode, master-slave and the master-mode three backup modes. Hbase Replication Master Master Mode between two mutual clusters backup, provide literacy services, separate read and write. By comparison, the overall opinion Hbase Replication solution can solve the cluster security, data security, read and write separation, operation and maintenance, and the guest operating errors, and so the question, and ease of management and configuration, provide powerful online applications support. to be continue...

mkumar13 · ‎07-15-2016

Apache Kylin origin In today's era of big data, Hadoop has become the de facto standards, and a large number of tools one after another around the Hadoop platform to build, to address the needs of different scenarios. For example, Hadoop Hive is a data warehouse tools, data files stored on HDFS distributed file system can be mapped to a database table and provides SQL queries. Hive execution engine can be converted to SQL MapReduce task to run, ideally suited for data warehouse data analysis. Another example HBase is based on Hadoop, high availability, high performance, column-oriented, scalable distributed storage system, Hadoop HDFS architecture to provide high reliability of HBase underlying storage support. Although existing business analytical tools such as Tableau, etc. are here because of the lack of Hadoop-based distributed analysis engine. But they exist with significant limitations, such as difficult to extend horizontally, can not handle large scale data, but also the lack of support for Hadoop. Apache Kylin (Chinese: Kirin) appears, can solve the above problems based on Hadoop. Apache Kylin is an open source distributed storage engine originally developed by the eBay contribution to the open source community. It provides Hadoop above the SQL query interface and multidimensional analysis (OLAP) capability to support large-scale data, and even be able to handle TB PB-level analysis tasks, be able to query a huge table in the Hive sub-second, and supports high concurrency. Apache Kylin scenarios (1) If your data exists in the Hadoop HDFS distributed file system, and you use Hive to build a data warehouse based on HDFS systems, and data analysis, and huge amount of data, such as TB levels. (2) At the same time you can also use HBase Hadoop platform for data storage and use HBase line keys for fast data query applications (3) The huge amount of data your Hadoop platform Accumulated daily and would like to do Dimension Data analysis. If your application is similar to the above, it is very suitable for Apache Kylin do large amounts of multidimensional data analysis. Apache Kylin core idea is to use the space for time, the computed result is stored in multidimensional data HBase, fast data query. And because Apache Kylin develop a variety of flexible policy in terms of queries and further improve the utilization of space, so that such a balance in the application of the policy worthwhile. Apache Kylin development course Apache Kylin in October 2014 in github open source, and soon joined Apache Incubator in November 2014, in November 2015 officially graduated to become top-level Apache project, also became the first entirely Chinese team designed and developed the top-level Apache project. Apache Kylin official website is: http://kylin.apache.org In March 2016, Apache Kylin core developers create Kyligence company in Shanghai, to better promote the rapid development of the project and the community. The company's official website is: http: //kyligence.io In order to get better development, in April 2016, big data company Kyligence Kui-Technology has been awarded a multi-million dollar angel investment round.

mkumar13 · ‎07-12-2016

I have done data analysis for one of my project using below approach and hopefully it may help you understand underlying subject. Soon i'll post my project on data analysis and detail description on technology used Python(web scraping- data collection), Hadoop, Spark and R. Data analysis is a highly iterative and non-linear process, better reflected by a series of cyclic process, in which information is learned at each step, which then informs whether (and how) to refine, and redo, the step that was just performed, or whether (and how) to proceed to the next step. Setting the Scene Data analysis is a study of subjective question and study even includes developing and executing a plan for collecting data, a data analysis presumes the data have already been collected. More specifically, a study includes the development of a hypothesis or question, the designing of the data collection process (or study protocol), the collection of the data, and the analysis and interpretation of the data. Activities of data Analysis There are 5 core activities of data analysis: 1. Stating and refining the question 2. Exploring the data 3. Building formal statistical models 4. Interpreting the results 5. Communicating the results 1. Stating and Refining the Question Doing data analysis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing. The thinking begins before you even look at a dataset, and it’s well worth devoting careful thought to your question. This point cannot be over-emphasized as many of the “fatal” pitfalls of a data analysis can be avoided by expending the mental energy to get your question right. Types of Questions:- Descriptive A descriptive question is one that seeks to summarize a characteristic of a set of data. Examples include determining the proportion of males, the mean number of servings of fresh fruits and vegetables per day, or the frequency of viral illnesses in a set of data collected from a group of individuals. Exploratory An exploratory question is one in which you analyze the data to see if there are patterns, trends, or relationships between variables. These types of analyses are also called “hypothesis- generating” analyses because rather than testing a hypothesis as would be done with an inferential, causal, or mechanistic question, you are looking for patterns that would support proposing a hypothesis. Inferential An inferential question would be a restatement of this proposed hypothesis as a question and would be answered by analyzing a different set of data. Predictive A predictive question would be one where you ask what types of people will eat a diet high in fresh fruits and vegetables during the next year. In this type of question you are less interested in what causes someone to eat a certain diet, just what predicts whether someone will eat this certain diet. For example, higher income may be one of the final set of predictors, and you may not know (or even care) why people with higher incomes are more likely to eat a diet high in fresh fruits and vegetables, but what is most important is that income is a factor that predicts this behavior. Mechanistic This will lead to an answer that will tell us, if the diet does, indeed, cause a reduction in the number of viral illnesses, how the diet leads to a reduction in the number of viral illnesses. A question that asks how a diet high in fresh fruits and vegetables leads to a reduction in the number of viral illnesses would be a mechanistic question. 2. Exploratory Data Analysis Exploratory data analysis is the process of exploring your data, and it typically includes examining the structure and components of your dataset, the distributions of individual variables, and the relationships between two or more variables. The most heavily relied upon tool for exploratory data analysis is visualizing data using a graphical representation of the data. There are several goals of exploratory data analysis, which are: 1. To determine if there are any problems with your dataset. 2. To determine whether the question you are asking can be answered by the data that you have. 3. To develop a sketch of the answer to your question. 3. Using Models to Explore Your Data In a very general sense, a model is something we construct to help us understand the real world. But a simple summary statistic, such as the mean of a set of numbers, is not enough to formulate a model. A statistical model must also impose some structure on the data. At its core, a statistical model provides a description of how the world works and how the data were generated. The model is essentially an expectation of the relationships between various factors in the real world and in your dataset. What makes a model a statistical model is that it allows for some randomness in generating the data. 4. Comparing Model Expectations to Reality Inference is one of many possible goals in data analysis and so it’s worth discussing what exactly is the act of making inference. 1. Describe the sampling process 2. Describe a model for the population(populations is subset of my data) Drawing a fake picture:- To begin with we can make some pictures, like a histogram of the data. Reacting to Data: Refining Our Expectations Okay, so the model and the data don’t match very well, as was indicated by the histogram above. So what do do? Well, we can either 1. Get a different model 2. Get different data 5. Interpreting Your Results and Communication Conclusion:- Communication is fundamental to good data analysis. You gather data by communicating your results and the responses you receive from your audience should inform the next steps in your data analysis. The types of responses you receive include not only answers to specific questions, but also commentary and questions your audience has in response to your report. References, Additional Information:- MyBlogSite

rahul_perumandl · ‎06-26-2017

Hi I'm also trying to deply metron on AWS. But I'm getting errors. Could you please help me out ?

mkumar13 · ‎07-11-2016

For Composite Key: <LEVELNAME>_<ENTITYNAME>_Key Note: For multiple key needs put multiple keys in Fact tables. OOzie Job Naming: <VENDOR>_<ENTITY>_<LEVELNAME>_<FREQUENCY>_[<CALC>|<AGRT>|<DownStream>].xml File extension for Hadoop: HQL files extension ".hql" Java files extension ".java" Property file extentsion ".properties" Shell script extension ".sh" Oozie config files ".xml" Data definition files ".ddl"

mkumar13 · ‎07-07-2016

We have Hive over Hbase table and lets say there are few columns with INT datatype, data loaded from Hive. Now if we would like to delete data based on values present in that particular column(INT), is not possible. It is because values are converted to Binary, even HBase API filter(SingleColumnValueFilter) would return wrong result if we query that particular column values from HBase. Problem to solve: How purge Hive INT datatype column data from HBase? This is the first textual series containing the resolution of above problem. Next series i'll create a small video on running code and cover other datatypes too. In such scenario we cant use standard API and unable to apply filters on binary column values, Solution :- Below JRuby program code. So you have already heard many advantages of storing data in HBase(specially binary block format) and create Hive table on top of that to query your data. I am not going to explain use case for this, why we required HBase over Hive but simple reason for batter visibility/representation of data in tabular format. I have come across this problem few days back when we required to purge HBase data after completion of retention period and we struck to delete data from HBase table using HBase API's and filters when particular column/columns is of INT data type from Hive. Below is sample use case:- There are two type of storage format when for Hive data in HBase:- 1. Binary 2. String Storing data in Binary block in HBase has its own advantages. Below script to create sample tables in both Hbase and Hive:- HBase:- 1. create 'tiny_hbase_table1', 'ck', 'o', {NUMREGIONS => 16, SPLITALGO => 'UniformSplit'} Hive:- CREATE EXTERNAL TABLE orgdata ( key INT, kingdom STRING, kingdomkey INT, kongo STRING ) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key#b,o:kingdom#s,o:kingdomKey#b,o:kongo#b") TBLPROPERTIES( "hbase.table.name" = "tiny_hbase_table1", "hbase.table.default.storage.type" = "binary" ); insert into orgdata(1,'London',1001,'victoria secret'); insert into orgdata values(2,'India',1001,'Indira secret'); insert into orgdata values(3,'Saudi Arabia',1001,'Muqrin'); insert into orgdata values(4,'Swaziland',1001,'King Mswati'); hbase(main):080:0> scan 'tiny_hbase_table1' ROW COLUMN+CELL \x00\x00\x00\x01 column=o:kingdom, timestamp=1467806798430, value=Swaziland \x00\x00\x00\x01 column=o:kingdomKey, timestamp=1467806798430, value=\x00\x00\x03\xE9 \x00\x00\x00\x02 column=o:kingdom, timestamp=1467806928329, value=India \x00\x00\x00\x02 column=o:kingdomKey, timestamp=1467806928329, value=\x00\x00\x03\xE9 \x00\x00\x00\x03 column=o:kingdom, timestamp=1467806933574, value=Saudi Arabia \x00\x00\x00\x03 column=o:kingdomKey, timestamp=1467806933574, value=\x00\x00\x03\xE9 \x00\x00\x00\x04 column=o:kingdom, timestamp=1467807030737, value=Swaziland \x00\x00\x00\x04 column=o:kingdomKey, timestamp=1467807030737, value=\x00\x00\x03\xE9 4 row(s) in 0.0690 seconds Now lets apply our HBase filter we get no result:- hbase(main):001:0> scan 'tiny_hbase_table1', {FILTER => "(PrefixFilter ('\x00\x00\x00\x01') hbase(main):002:1" scan 'tiny_hbase_table1', {FILTER => "(PrefixFilter ('1') If we dont know what is the equivalent value of INT column like kingdomkey, its not possible to apply filter. Now you can see we get wrong results and with SingleColumnValueFilter would fail in this scenario, see below:- import org.apache.hadoop.hbase.filter.CompareFilter import org.apache.hadoop.hbase.filter.SingleColumnValueFilter import org.apache.hadoop.hbase.filter.SubstringComparator import org.apache.hadoop.hbase.util.Bytes scan 'tiny_hbase_table1', {LIMIT => 10, FILTER => SingleColumnValueFilter.new(Bytes.toBytes('o'), Bytes.toBytes('kingdomKey'), CompareFilter::CompareOp.valueOf('EQUAL'), Bytes.toBytes('1001')), COLUMNS => 'o:kingdom' } ROW COLUMN+CELL \x00\x00\x00\x01 column=o:kingdom, timestamp=1467806798430, value=Swaziland \x00\x00\x00\x02 column=o:kingdom, timestamp=1467806928329, value=India \x00\x00\x00\x03 column=o:kingdom, timestamp=1467806933574, value=Saudi Arabia \x00\x00\x00\x04 column=o:kingdom, timestamp=1467807030737, value=Swaziland 4 row(s) in 0.3640 seconds Now Solution is below JRuby program, using it you get proper results and inside program you can apply delete_row hbase command to delete candidate record as soon as you find in loop:- import org.apache.hadoop.hbase.HBaseConfiguration import org.apache.hadoop.hbase.client.HTable import org.apache.hadoop.hbase.client.Get import org.apache.hadoop.hbase.util.Bytes import org.apache.hadoop.hbase.client.Scan; import org.apache.hadoop.hbase.util.Bytes; import org.apache.hadoop.hbase.client.ResultScanner; import org.apache.hadoop.hbase.client.Result; import java.util.ArrayList; def delete_get_some() var_table = "tiny_hbase_table1" htable = HTable.new(HBaseConfiguration.new, var_table) rs = htable.getScanner(Bytes.toBytes("o"), Bytes.toBytes("kingdomKey")) output = ArrayList.new output.add "ROW\t\t\t\t\t\tCOLUMN\+CELL" rs.each { |r| r.raw.each { |kv| row = Bytes.toInt(kv.getRow) fam = kv.getFamily ql = Bytes.toString(kv.getQualifier) ts = kv.getTimestamp val = Bytes.toInt(kv.getValue) rowval = Bytes.toInt(kv.getRow) output.add "#{row} #{ql} #{val}" } } output.each {|line| puts "#{line}\n"} end delete_get_some ROW COLUMN+CELL 1 kingdomKey 1001 2 kingdomKey 1001 3 kingdomKey 1001 4 kingdomKey 1001 You can declare variable and apply custom filter on values and delete rowkey based on readable values:- if val <= myVal and row.include? 'likeme^' output.add "#{val} #{row} <<<<<<<<<<<<<<<<<<<<<<<<<<- Candidate for deletion" deleteall var_table, rowend Hope this solve a problem you are facing too. Let me know in case of any query and suggestions...

mkumar13 · ‎07-06-2016

Thankyou!!! yes its syntax error caused the issue and now i am able to run program successfully...

Online	Offline
Last Visited	‎08-15-2019 08:33 PM

Member Since	‎05-05-2016 12:35 PM
Last Visited	‎08-15-2019 08:33 PM
Posts	147
Kudos received	222

Cloudera Community

Re: HDP3.0.1 Ambari unable to stop all services...

Re: Do we need to create a normal managed table be...

Re: Where can I find list of enhancements (Release...

Re: Spark performance parameter num-executors has ...

Re: How we can connect an external Hive table to a...

Re: examslocal:Getting error while login...

Re: advice what OS should i choose with minimum co...

Apache Shiro design is intuitive and a simple way ...

HBase Replication and comparison with popular onli...

Past and Future of Apache Kylin

Data Analysis Approach to a successful outcome...

Re: Metron Single Node AWS Deployment?

Re: Hive Naming conventions and database naming st...

JRuby code to purge/query/filter data on Hbase ove...

Re: HBase JRuby program error "NoMethodError: unde...