About SQLShaw

VidyaSargur · ‎12-21-2022

@aravinth53, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.

James12 · ‎11-11-2021

I have downloaded the nagios with the .ova extension but the thing is not still going displaying the same output

balucr · ‎05-29-2021

@Onedile wrote: Yes this is possible. You need to kinit with the username that has been granted access to the SQL server DB and tables. integrated security passes your credentials to the SQL server using kerberos "jdbc:sqlserver://sername.domain.co.za:1433;integratedSecurity=true;databaseName=SCHEMA;authenticationScheme=JavaKerberos;" This worked for me. It doesn't work, it's still facing issue with the latest MSSQL JDBC driver as the Kerberos tokens are lost when the mappers spawn (as the YARN transitions the job to its internal security subsystem) 21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1616335290043_2743822 21/05/29 19:00:40 INFO mapreduce.JobSubmitter: Executing with tokens: [Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:nameservice1, Ident: (token for c795701: HDFS_DELEGATION_TOKEN owner=c795701@XX.XXXX.XXXXXXX.COM, renewer=yarn, realUser=, issueDate=1622314832608, maxDate=1622919632608, sequenceNumber=29194128, masterKeyId=1856)] 21/05/29 19:01:15 INFO mapreduce.Job: Task Id : attempt_1616335290043_2743822_m_000000_0, Status : FAILED Error: java.lang.RuntimeException: java.lang.RuntimeException: com.microsoft.sqlserver.jdbc.SQLServerException: Integrated authentication failed. ClientConnectionId:53879236-81e7-4fc6-88b9-c7118c02e7be Caused by: java.security.PrivilegedActionException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt) Use the jtds driver as recommended here

pmadiot · ‎03-23-2020

hello, This wokarround didn't work for me. Configured the LDAP setup so that the BaseDN matches only 1 entry. calling "ambari-server sync-ldap --existing" didn't remove all existing LDAP Users and groups rather it deleted 2 only. may be i missed something, but after running the setup do we need to restart ambari-server? What should be the expected behaviour when runing the "ambari-server sync-ldap --all" and the BaseDN pointing to a single AD entry? The doc states the following for option '--exisiting' : "Users will be removed from Ambari if they no longer exist in LDAP, and group membership in Ambari will be updated to match LDAP". Since AD users still exist that would have no effect to remove the users even if baseDN points to single entry. What we are looking for (HDP2.6.5) is to remove all LDAP synced users other than these specified in --users users.txt and --groups group.txt. It looks like there is no such tool and we have to resort to manually use ambari APIs somehow. One thing i'm not sure is how are the lowercased alias being handled, since during the first sync we had the default value 'true' to force lower case, and now changed it to 'false' looking forward your insights

DavidR · ‎02-10-2020

I would also like to know this. Is Hive on Spark currently supported by HortonWorks? Since which version? Thanks in advance, David Resende

AMyth · ‎09-08-2019

Cannot find the archive section the link

SQLShaw · ‎03-27-2018

AD (After Druid) In my opinion, Druid creates a new analytic service that I term the real-time EDW. In traditional EDWs the process of getting from the EDW to an optimized OLAP could take hours depending on the size of the EDW. Having a cube take six or more hours was not unusual for a lot of companies. The process also tended to be brittle and needed to be closely monitored. In addition to real-time, Druid also facilitates long-term analytics which effectively provides a Lambda architecture for your data warehouse. Data streams into Druid and is held in-memory for a configurable amount of time. While in-memory the data can be queried and visualized. After a period of time the data is then passed to long-term (historical) storage as segments on HDFS. These segments can also be part of the same visualization as the real-time data. As mentioned previously, all data in Druid contains a timestamp. The other data elements consist of the same properties as traditional EDWs: dimensions and measures. The timestamp simplifies the aggregation and Druid is completely denormalized into a single table. Remember dimensions are descriptions or attributes and measures are always additive numbers. Since this is always true, it is easy for Druid to infer in the data what are dimensional attributes and what are measures. For each timestamp duration Druid can, in real-time, aggregate facts along all dimensional attributes. This makes Druid ideal for topN, timeseries, and group-by with group-by being the least performant. The challenges around Druid and other No-SQL type technologies like MongoDB is in the visualization layer as well as the architectural and storage complexities. Durid stores json data and the json data can be difficult to manage and visualize in your standard tools such as Tableau or PowerBI. This is where the integration between Druid and Hive becomes most useful. There is a three-part series describing the integration: Druid and HIve Part 1 https://hortonworks.com/blog/apache-hive-druid-part-1-3/ Druid and Hive Part 2 https://hortonworks.com/blog/sub-second-analytics-hive-druid/ Druid and Hive Part 3 https://hortonworks.com/blog/connect-tableau-druid-hive/ The integration provides a single pane of glass against real-time pre-aggregated cubes, standard Hive tables, and historical OLAP data. More importantly, the data can be accessed through standard ODBC and JDBC visualization tools as well as managed and secure3d through Ambari, Ranger, and Atlas. Druid provided out-of-the-box lambda architecture for time-series data and, coupled with Hive, we now provide for the flexibility and ease-of-access associated with standard RDBMS’s.

SQLShaw · ‎03-27-2018

Druid is an OLAP solution for streaming event data as well as OLAP for long-term storage. All Druid data requires a timestamp. Druid’s storage architecture is based off the timestamp similar to how HBase stores by key. Following are some key benefits of Druid: Real-time EDW on event data (time-series) Long-term storage leveraging HDFS High availability Extremely performant querying over large data sets Aggregation and Indexing High-level of data compression Hive integration Druid provides a specific solution for specific problems that could not be handled by any other technology. With that being said, there are instances where Druid may not be a good fit: Data without a timestamp No need for real-time streaming Normalized (transactional) data (no joins in Druid) Small data sets No need for aggregating measures Non-BI queries like Spark or streaming lookups Why Druid? BD (Before Druid) In traditional EDWs data is broken into dimensional tables and fact tables. Dimensions describe an object. For example, a product dimension will have colors, sizes, names, and other descriptors of product. Dimensions are always descriptors of something whether it is a product, store, or something that is part of every EDW, date. In addition to dimensions, EDWs have facts, or measures. Measures are always numbers that can be added. For example, the number 10 can be measure but averages cannot. The reason is that you can add 10 to another number, but adding 2 averages does not make numerical sense. The reason for dimensions and facts is two-fold; firstly, it was a means to denormalize the data and reduce joins. Most EDW’s are architected so that you will not need more than 2 joins to get any answer; secondly, dimensions and facts easily map to business questions (see Agile Data Warehouse Design in the reference section). For example, take the following question: “How many product x were purchased last month in store y”? We can dissect this sentence in the following way. product, month, and store are all dimensions while the question “how many” is the fact or measure. For that single question you can begin building your star schema: Figure 1: Star Schema The fact table will have a single row for each unique product sold in a particular store for particular time frame. The difference between an EDW and OLAP is that an OLAP system will pre-aggregate this answer. Prior to the query you will run a process that anticipates this question and will add up all the sales totals for all the products for all time ranges. This is fundamentally why in traditional EDW development all possible questions needed to be flushed out prior to building the schemas. The questions being asked define how the model is designed. This makes traditional EDW development extremely difficult, prone to errors, and expensive. Interviewing LOBs to find what questions they may ask the system or, more likely, looking at existing reports and trying reproduce the data in an EDW design was only the first step. Once the EDW was built you still had to work on what is called the “semantic layer”. This is the point where you instruct the OLAP tool how to aggregate the data. Tools like SQL Server Analysis Server (SSAS) are complicated tools and require a deep understanding of OLAP concepts. They are based off the Kimball methodology and therefore to some extent require the schema to look as much like a star schema as possible. Figure 2: SSAS In these tools the first thing you needed to do was define hierarchies. The easiest hierarchy to define is date. Date always follows the pattern: year,month,day,hour,seconds. Other hierarchies include geography:country, state, county, city, zip code. Hierarchies are important in OLAP because they describe how the user will drill through the data and how the data will be aggregated at each level of the hierarchy. The semantic layer is also where you define what the analyst will actually see in their visualization tools. For example, exposing and EDW surrogate key would only confuse the analyst. In the hadoop space the semantic layer is handled by vendors and software like Jethrodata, AtScale, Kyvos, and Kylin (open source).

SQLShaw · ‎03-01-2018

Hi @Data Stocker, the short answer is no. The longer answer is that currently Hive has ACID (transactional) capabilities but no concept like BEGIN...END TRANSACTION. There is work being done to provide this in a future release.

waqarsherbhatti · ‎02-20-2018

Thank you Scott

Online	Offline
Last Visited	‎06-25-2024 10:10 AM

Member Since	‎07-31-2019 06:56 AM
Last Visited	‎06-25-2024 10:10 AM
Posts	346
Kudos received	257

Cloudera Community

Re: Regarding to activate HIVE ACID transactions o...

Re: Hive 1.2.1++

Re: What is the fastest way to load data into Apac...

Re: Do i have to commit my insert statment in hive...

Re: Deploying hortonworks sandbox VM to cluster

Re: Installed sandbox but can't get the welcome / ...

Re: Getting VirtualBox Error while importing Virtu...

Re: Sqoop to SQL Server with Integrated Security

Re: How do I remove LDAP accounts from Ambari

Re: Hive on Spark

Re: Hortonworks Hive ODBC Driver 1.2 (64-bit) wind...

EDW After Druid

Why Use Druid?

Re: Do i have to commit my insert statment in hive...

Re: Deploying hortonworks sandbox VM to cluster