Member since
05-05-2016
147
Posts
223
Kudos Received
18
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3727 | 12-28-2018 08:05 AM | |
3712 | 07-29-2016 08:01 AM | |
3054 | 07-29-2016 07:45 AM | |
7114 | 07-26-2016 11:25 AM | |
1383 | 07-18-2016 06:29 AM |
07-19-2016
09:08 AM
Just tweak the timeout number and see if error persist... sed -e 's/\(agent.task.timeout=\).*/\12700/' \
-e 's/\(agent.package.install.task.timeout=\).*/\12700/' \
-i /etc/ambari-server/conf/ambari.properties
... View more
07-19-2016
08:44 AM
3 Kudos
Apache Shiro design is intuitive and simple way to ensure the safety of the application. For more detail on Apache Shiro project go to http://shiro.apache.org/what-is-shiro.html Software design is generally based on user stories to achieve, that is, based on how users interact with the system to design the user interface or service API. For example, a user story will be displayed after a user logs on a button to view personal account information, if the user is not registered, it displays a registration button. This user story implies major application user needs to be accomplished. Even if the user is not a person here, but third-party systems, when coding is also interacting with the system as "user" to deal with. Apache Shiro reaction of this concept in their design by the intuitive notion exposed to developers so that Apache Shiro in almost all applications are easy to use. Outline: Shiro has three main top-level concept: Subject, SecurityManager, Realms. The following diagram describes the interactions between these concepts, the following will introduce 11 to do. Subject: Subject is a microcosm of the current user in the security field. User usually implied meaning people, and Subject can be people, may be a third-party service, the guardian accounts, corn operation. Or any interaction with the system can be called Subject. Subject all instances must be bound to a SecurityManager, so that when interacting with the Subject, in fact, has been transformed into this SecurityManager Subject associated interact. SecurityManager: SecurityManager Shiro as the core framework, as in the form of "umbrella" object exists, it coordinated its internal security components forming an object graph. Upon the completion of its internal configuration SecurityManager and objects in the application, SecurityManager will take a back seat, developers use most of the time in the Subject API. Then-depth understanding of SecurityManager, Again: When interaction with the Subject, in fact, hidden behind heavy Subject SecurityManager responsible for safe operation. This point in the above figure also reflected. Realms: Realms as a bridge Shiro and security applications between data sources or connectors. When you need to get the user accounts for authentication (login) or authorization (access control), Shiro will find in the application configuration is responsible for this work Realm (one or more) to obtain complete data security. In this sense, Realm is essentially a security-related Dao: It encapsulates the details of the data source of the link, and provide data in accordance with the needs of Shiro. When you configure Shiro, the authentication and authorization to provide at least one Reaml. You can configure multiple Realm, but at least one. Shiro built a number of security can be connected to a data source Realm, such as LDAP, relational databases (JDBC), similar to the INI text configuration files resources and properties. If the built-in Realm can not meet the demand, you can also use the custom data source on behalf of their own Realm implementation. And other internal components, like, SecurityManager management Realm how to obtain related Subject security and identity information. The following figure shows the core concepts Shiro framework, followed by a brief description will be made eleven: Original link: http://shiro.apache.org/architecture.html
... View more
Labels:
07-19-2016
07:19 AM
Below is the link give you brief description of your question... https://community.hortonworks.com/articles/22756/quickly-enable-ssl-encryption-for-hadoop-component.html
... View more
07-19-2016
12:25 AM
3 Kudos
This article is first series of three articles, next coming articles with some code and mechanism present in latest version of HBase supporting HBase Replication.
HBase Replication Hbase Replication solution can solve the cluster
security, data security, read and write separation, operation and maintenance,
and the guest operating errors, and so the ease of management and
configuration, provide powerful online applications support. Hbase replication currently used in the industry are rare,
because there are many aspects, such as HDFS has multiple backup copies in a
way to help security HBASE underlying data, and the relatively small number of
companies in the cluster size. Another reason the data is not very high degree
of importance, such as some logging system or as a second warehouse of
historical data to split a large number of read requests. Such data lost to be present
or back up at other places (database cluster). For such cases the Slave
Replication cluster become dispensable, the fundamental importance not
reflected. Therefore in hbase management platform a low level of security and
essential services is area of concern and following discussion of Replication
cluster cannot waste time to read.
Currently on HBase exists very important applications, both
online and off-line applications. So security Hbase data also appears very important.
For the problems often come from a single cluster are following:-
Failure data managers, irreversible DDL
operations. BLOCK underlying HDFS file block corruption Excessive short-term pressure on the cluster
read data caused by adding servers to deal with this situation is more a waste
of resources. System upgrades, maintenance, diagnose problems
will cause the cluster unavailable time to grow. Double the atomic difficult to guarantee. Unpredictable for some reason. (Eg engine room
off, large-scale hardware damage, disconnection, etc.) Impact of MR computing offline applications
cause larger delay on-line literacy. If you worry about the above questions, then, Replication main
cluster is a good choice, and we are in this area to do some simple research.
By simply following the problems we encountered in the use and methods taken.
It is popular online backup comparison program For backup solutions to a redundant data center there are
several angles to analyze like current consistency, transactional delay,
throughput, data loss and Failover we have currently several options:- Simple Backup: - Simple
backup mode where timing of Dump the cluster is scheduled, usually by snapshot
to set the timestamp. We can make an elegant design too for on-line data center
with low interference or no interference.
However, this scheme is have some disadvantages like just before the
time point of snapshot if unexpected events occur inevitable lead to data loss
of entire duration, as many people cannot accept that. Master-slave mode: -Master-slave
mode (Master-the Slave) This model is simple compared to a lot more advantages
backup mode, you can ensure data consistency eventual consistency, data from
the primary cluster to the standby cluster low latency, asynchronous writes
will not the primary cluster to bring pressure on performance, how much will
have a minimal impact on performance, incident comes less data loss, and the
main cluster in the standby cluster can also be guaranteed. Usually by
constructing better Log system plus check Point to achieve,
can read and write separation, the primary cluster can act as reader services,
but only to prepare clusters generally bear reading services. Master master mode: - Master
master mode (Master-Master) principle is similar to the overall
master-slave mode, the difference is two clusters can take each other to
write separation, can bear to read and write services. Two -phase commit:- Two
phase commit such programs to ensure consistency and strong transaction, the
server returned to the client successfully indicates that certain data has been
successfully backed up, it will not cause any data loss.Each server can bear to
read and write services. But the disadvantage is the delay caused by
cluster higher overall throughput decreases. Paxosalgorithm: -
Paxos algorithm based on Paxos strong
consistency algorithm program implementation, the same client
connection server to ensure data consistency. The disadvantage
is complex, latency and throughput clusters with different clustered servers. Hbase simple
backup mode if the table is not online relatively easy to handle, you
can copy table or distcp or spapshot table. If the table is
online and offline cannot be allowed only
through snapshot scheme online table implement a
backup. Hbase
Replication master-slave mode equipment by specifying the cluster will
send Hlog data asynchronously to the standby inside the cluster, basically
no performance impact on the primary cluster, the data delay time is
shorter. Main cluster provides literacy services to prepare the cluster to
provide reading services. If the primary cluster fails, you can quickly
switch to the backup cluster. We can look back to Hbase backup
status, Hbase can offer the online backup and offline backup through
the above simple backup mode, master-slave and the master-mode three backup modes. Hbase Replication Master Master
Mode between two mutual clusters backup, provide literacy services,
separate read and write. By
comparison, the overall opinion Hbase Replication solution can solve
the cluster security, data security, read and write separation, operation and
maintenance, and the guest operating errors, and so the question, and ease of
management and configuration, provide powerful online applications support. to be continue...
... View more
Labels:
07-18-2016
06:40 AM
Please paste output of above command i.e. HostCleanup result /var/lib/ambari-agent/data/hostcleanup.result to see command has partially completed or completed successfully..
... View more
07-18-2016
06:29 AM
2 Kudos
Duplicate Question!! Ask question one time as suggestions to this question is already posted on below:- https://community.hortonworks.com/questions/45601/how-i-can-connect-an-external-hive-table-to-an-int.html#answer-45610
... View more
07-18-2016
04:03 AM
2 Kudos
Below are the steps to configure Ranger with PostgreSQL, https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_Ranger_Install_Guide/content/configuring_postgresql_for_ranger.html
... View more
07-18-2016
03:33 AM
1 Kudo
Just check if you are running into multinode cluster and facing timeout issue below:- https://issues.apache.org/jira/browse/ZOOKEEPER-1213 If you are running single node what is the result of ./zookeeper/bin/zkServer.sh start
... View more
07-15-2016
11:35 AM
8 Kudos
Apache Kylin origin In today's era of big data, Hadoop has become the de facto standards, and a large number of tools one after another around the Hadoop platform to build, to address the needs of different scenarios. For example, Hadoop Hive is a data warehouse tools, data files stored on HDFS distributed file system can be mapped to a database table and provides SQL queries. Hive execution engine can be converted to SQL MapReduce task to run, ideally suited for data warehouse data analysis.
Another example HBase is based on Hadoop, high availability, high performance, column-oriented, scalable distributed storage system, Hadoop HDFS architecture to provide high reliability of HBase underlying storage support. Although existing business analytical tools such as Tableau, etc. are here because of the lack of Hadoop-based distributed analysis engine. But they exist with significant limitations, such as difficult to extend horizontally, can not handle large scale data, but also the lack of support for Hadoop. Apache Kylin (Chinese: Kirin) appears, can solve the above problems based on Hadoop. Apache Kylin is an open source distributed storage engine originally developed by the eBay contribution to the open source community. It provides Hadoop above the SQL query interface and multidimensional analysis (OLAP) capability to support large-scale data, and even be able to handle TB PB-level analysis tasks, be able to query a huge table in the Hive sub-second, and supports high concurrency.
Apache Kylin scenarios
(1) If your data exists in the Hadoop HDFS distributed file system, and you use Hive to build a data warehouse based on HDFS systems, and data analysis, and huge amount of data, such as TB levels. (2) At the same time you can also use HBase Hadoop platform for data storage and use HBase line keys for fast data query applications (3) The huge amount of data your Hadoop platform Accumulated daily and would like to do Dimension Data analysis. If your application is similar to the above, it is very suitable for Apache Kylin do large amounts of multidimensional data analysis.
Apache Kylin core idea is to use the space for time, the computed result is stored in multidimensional data HBase, fast data query. And because Apache Kylin develop a variety of flexible policy in terms of queries and further improve the utilization of space, so that such a balance in the application of the policy worthwhile. Apache Kylin development course
Apache Kylin in October 2014 in github open source, and soon joined Apache Incubator in November 2014, in November 2015 officially graduated to become top-level Apache project, also became the first entirely Chinese team designed and developed the top-level Apache project. Apache Kylin official website is:
http://kylin.apache.org
In March 2016, Apache Kylin core developers create Kyligence company in Shanghai, to better promote the rapid development of the project and the community. The company's official website is: http: //kyligence.io
In order to get better development, in April 2016, big data company Kyligence Kui-Technology has been awarded a multi-million dollar angel investment round.
... View more
Labels:
07-12-2016
08:38 AM
5 Kudos
I have done data analysis for one of my project using below approach and hopefully it may help you understand underlying subject. Soon i'll post my project on data analysis and detail description on technology used Python(web scraping- data collection), Hadoop, Spark and R. Data analysis is a highly iterative and non-linear process, better reflected by a series of cyclic process, in which information is learned at each step, which then informs whether (and how) to refine, and redo, the step that was just performed, or whether (and how) to proceed to the next step. Setting the Scene Data analysis is a study of subjective question and study even includes developing and executing a plan for collecting data, a data analysis presumes the data have already been collected. More specifically, a study includes the development of a hypothesis or question, the designing of the data collection process (or study protocol), the collection of the data, and the analysis and interpretation of the data. Activities of data Analysis There are 5 core activities of data analysis: 1. Stating and refining the question 2. Exploring the data 3. Building formal statistical models 4. Interpreting the results 5. Communicating the results 1. Stating and Refining the Question Doing data analysis requires quite a bit of thinking and we believe that when you’ve completed a good data analysis, you’ve spent more time thinking than doing. The thinking begins before you even look at a dataset, and it’s well worth devoting careful thought to your question. This point cannot be over-emphasized as many of the “fatal” pitfalls of a data analysis can be avoided by expending the mental energy to get your question right. Types of Questions:- Descriptive A descriptive question is one that seeks to summarize a characteristic of a set of data. Examples include determining the proportion of males, the mean number of servings of fresh fruits and vegetables per day, or the frequency of viral illnesses in a set of data collected from a group of individuals. Exploratory An exploratory question is one in which you analyze the data to see if there are patterns, trends, or relationships between variables. These types of analyses are also called “hypothesis- generating” analyses because rather than testing a hypothesis as would be done with an inferential, causal, or mechanistic question, you are looking for patterns that would support proposing a hypothesis. Inferential An inferential question would be a restatement of this proposed hypothesis as a question and would be answered by analyzing a different set of data. Predictive A predictive question would be one where you ask what types of people will eat a diet high in fresh fruits and vegetables during the next year. In this type of question you are less interested in what causes someone to eat a certain diet, just what predicts whether someone will eat this certain diet. For example, higher income may be one of the final set of predictors, and you may not know (or even care) why people with higher incomes are more likely to eat a diet high in fresh fruits and vegetables, but what is most important is that income is a factor that predicts this behavior. Mechanistic This will lead to an answer that will tell us, if the diet does, indeed, cause a reduction in the number of viral illnesses, how the diet leads to a reduction in the number of viral illnesses. A question that asks how a diet high in fresh fruits and vegetables leads to a reduction in the number of viral illnesses would be a mechanistic question. 2. Exploratory Data Analysis Exploratory data analysis is the process of exploring your data, and it typically includes examining the structure and components of your dataset, the distributions of individual variables, and the relationships between two or more variables. The most heavily relied upon tool for exploratory data analysis is visualizing data using a graphical representation of the data. There are several goals of exploratory data analysis, which are: 1. To determine if there are any problems with your dataset. 2. To determine whether the question you are asking can be answered by the data that you have. 3. To develop a sketch of the answer to your question. 3. Using Models to Explore Your Data In a very general sense, a model is something we construct to help us understand the real world. But a simple summary statistic, such as the mean of a set of numbers, is not enough to formulate a model. A statistical model must also impose some structure on the data. At its core, a statistical model provides a description of how the world works and how the data were generated. The model is essentially an expectation of the relationships between various factors in the real world and in your dataset. What makes a model a statistical model is that it allows for some randomness in generating the data. 4. Comparing Model Expectations to Reality Inference is one of many possible goals in data analysis and so it’s worth discussing what exactly is the act of making inference. 1. Describe the sampling process 2. Describe a model for the population(populations is subset of my data) Drawing a fake picture:- To begin with we can make some pictures, like a histogram of the data. Reacting to Data: Refining Our Expectations Okay, so the model and the data don’t match very well, as was indicated by the histogram above. So what do do? Well, we can either 1. Get a different model 2. Get different data 5. Interpreting Your Results and Communication Conclusion:- Communication is fundamental to good data analysis. You gather data by communicating your results and the responses you receive from your audience should inform the next steps in your data analysis. The types of responses you receive include not only answers to specific questions, but also commentary and questions your audience has in response to your report. References, Additional Information:- MyBlogSite
... View more
Labels: