About SQLShaw

SQLShaw · ‎01-17-2017

Hi @Joe Harvy You can create scripts that will create the databases and tables based off the Hive metastore. This blog walks you through the step. https://sharebigdata.wordpress.com/2016/06/12/hive-metastore-internal-tables/

SQLShaw · ‎01-03-2017

@Sami Ahmad You'll want to use beeline going forward since the Hive CLI will be deprecated. Beeline's JDBC connection provides for a higher level of security than the Hive CLI.

SQLShaw · ‎12-20-2016

@Isa Mllr See this version matrix https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

SQLShaw · ‎11-29-2016

Hi @Dagmawi Mengistu. Make sure you setup Hive proxy as described here http://docs.hortonworks.com/HDPDocuments/Ambari-2.4.2.0/bk_ambari-views/content/setup_HDFS_proxy_user.html

SQLShaw · ‎11-21-2016

Hi @Adnan Alvee. Having already narrowed performance issues down to a particular job, the primary thing you will want to look for within each stage is skew. Skew is when a small number of tasks take significantly longer to execute than others. You'll want to look specifically at task runtime to see if something is wrong. As you drill down into slower running tasks you'll want to focus on where they are slow. Is the slowness in writing data, reading data, or computation? This can narrow things down to a particular problem with a node or maybe you don't have enough disks to handle the scratch space for shuffles. Keep in mind that Spark scales linearly. Your processing may be slow simply due to not enough hardware. You'll want to focus on how much memory and cpu you've allocated to your executers as well has how many disks you have in each node. It also looks as if your #executers is quite large. Consider having fewer executors with more resources per executor. Also the executor memory is memory per executor. The number you have is a bit large. Try playing with those numbers and see if it makes a difference. Hope this helps.

SQLShaw · ‎11-14-2016

Hi @Clay McDonald. One of the reason is because of MapReduce. Hive uses Tez but Polybase is not compatible yet with Tez. MapReduce is a batch data processing engine. You will also want to make sure your Hive tables are properly configured using best practices. Try implementing some of these rules where applicable http://hortonworks.com/blog/5-ways-make-hive-queries-run-faster/. Also be aware of your cluster size. MapReduce (as well as other data processing engines) use parallel processing but if you don't have many nodes than you are taking advantage of the design. Note sure if its applicable in your case but you could use multiple SQL Servers to parallelize your Polybase query. https://msdn.microsoft.com/en-us/library/mt607030.aspx

SQLShaw · ‎11-09-2016

Thanks @Andrew Grande! That worked! I feel like a noob 🙂 but appreciate all the help!

SQLShaw · ‎11-08-2016

Here is the repeating error in nifi-app.log.nifi-app.txt

SQLShaw · ‎11-08-2016

I apologize. The actual error is "File name is too long". The name of the file is COWFILE1.DAT and COWFILE1.ARC.

SQLShaw · ‎11-08-2016

I have a couple of rather large hex files I need to convert to another format with the intention of then stripping out certain attributes and storing results in SQL Server and/or Hive. The files are 12.7 MB and 3.3 MB. When I use the code from this HCC answer https://community.hortonworks.com/questions/60597/hexdump-nifi-processor-nifi-hexdump-processor.html import java.io.DataInputStream def flowFile = session.get() if(!flowFile) return def attr = '' session.read(flowFile, {inputStream -> dis = new DataInputStream(inputStream) attr = Long.toHexString(dis.readLong()) attr2 = Long.toHexString(dis.readLong()) } as InputStreamCallback) flowFile = session.putAttribute(flowFile, 'first16hex', attr+attr2) session.transfer(flowFile, REL_SUCCESS) in an ExecuteProcess I get a "File too large" error. I'm also aware of this JIRA but am looking for a good workaround. https://issues.apache.org/jira/browse/NIFI-2997

Online	Offline
Last Visited	‎06-25-2024 10:10 AM

Member Since	‎07-31-2019 06:56 AM
Last Visited	‎06-25-2024 10:10 AM
Posts	346
Kudos received	257

Cloudera Community

Re: Regarding to activate HIVE ACID transactions o...

Re: Hive 1.2.1++

Re: What is the fastest way to load data into Apac...

Re: Do i have to commit my insert statment in hive...

Re: Deploying hortonworks sandbox VM to cluster

Re: Backup specific Hive table

Re: whats the difference between beeline and hive

Re: When is the hdp-2.5.0 Sandbox for Azure availa...

Re: Hive not running after HDP 2.5 ambari installm...

Re: What are the important metrics to notice for e...

Re: Why are T-SQL queries from MS SQL Server 2016 ...

Re: Need to convert a hex file to another format u...

Re: Need to convert a hex file to another format u...

Re: Need to convert a hex file to another format u...

Need to convert a hex file to another format using...