Member since
01-21-2018
9
Posts
2
Kudos Received
0
Solutions
08-05-2018
07:05 AM
@Taylor Wilson: I am not sure if I understand what you are looking for. How is your reply related to the question being asked?
... View more
08-01-2018
05:10 PM
We are using Hortonworks ODBC driver to connect hive from C# application. Here is a sample code to fire query on Hive.
static void Main(string[] args)
{
string connectionString = "DSN=AzureHDP";
string query = "SELECT * FROM schema_5218.demandquantity";
int queryTimeout = 100;
try
{
using (var connection = new OdbcConnection(connectionString))
{
Console.WriteLine("Opening Hive connection.");
connection.Open();
using (var cmd = connection.CreateCommand())
{
cmd.CommandTimeout = queryTimeout;
cmd.CommandText = query;
Console.WriteLine("Executing Hive query.");
cmd.ExecuteNonQuery();
Console.WriteLine("Hive query executed successfully.");
Console.ReadKey(false);
}
}
}
catch (Exception e)
{
Console.WriteLine($"Encountered error during Hive query execution. [{e.Message}]");
Console.ReadKey(false);
throw;
}
}
Issue is that if the network connection drops after the query has been fired on Hive(before query returns) the resulting exception results in application crash. Reason is that AccessViolcationException is thrown which are not sent to application catch block. So this exception becomes an un-handled exception causing the application to crash. Is this due to any bug in the ODBC driver? Is there any workaround or fix awaited? There are certain settings which can be done to catch such exception but those are not recommended by Microsoft. Due this this reason I am not very keep to use these settings. Those settings are [HandleProcessCorruptedStateExceptions] & [SecurityCritical] and legacyCorruptedStateExceptionsPolicy. Attaching actual application code screenshot. You can that the exception is thrown at the close of the connection using clause and the control does not go to the catch block. Hortonworks Driver Version: 2.01.10.1014 .Net Version: 4.7 C# version: 7.3
... View more
07-20-2018
06:38 PM
Give this a try. In the linux box from where you start Hive CLI create a user with the same name as the <ADLS ServicePrincipal> name. Start Hive CLI from this new user account.
... View more
07-03-2018
01:24 PM
That works, thank you.
... View more
07-03-2018
08:05 AM
I am exporting Hive table data to csv files in HDFS using such queries FROM Table T1 INSERT OVERWRITE DIRECTORY '<HDFS Directory>' SELECT *; Hive is writing many small csv files(1-2MB) to the destination directory. Is there a way to control the number of files or the size of csv files? Note: 1) These csv files are not used for creating tables out of them so cannot replace the query with INSERT INTO TABLE... 2) Already tried these setting values to no avail hive.merge.mapfiles=true;
hive.merge.mapredfiles
hive.merge.smallfiles.avgsize
hive.merge.size.per.task
mapred.max.split.size
mapred.min.split.size; TIA I have many tables in Hive with varying size. Some are very large and some are small. I am fine if for large tables many files are generated till each file is larger than 16 MB. I don't want to explicitly set the number of mappers because that will hamper query performance for large tables.
... View more
- Tags:
- hdfs-blocks
- Hive
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
04-21-2018
04:02 PM
1 Kudo
@ShuThank you so much. First approach worked. INSERT INTO Target_table(col_1, col_2, col_3) SELECT col_1, col_2,int(null) col_3 FROM Source_table;
... View more
04-21-2018
02:42 PM
1 Kudo
I have two tables in Hive.
CREATE TABLE Target_table(
col_1 timestamp,
col_2 int,
col_3 int) CLUSTERED BY (col_1) INTO 50 BUCKETS STORED AS ORC
TBLPROPERTIES('transactional'='true')
CREATE TABLE Source_table(
col_1 timestamp,
col_2 int)
I am trying to execute this query
INSERT INTO Target_table (col_1, col_2)
SELECT col_1, col_2 FROM Source_table;
Query runs successfully in Beeline.
Same query fails when executed via Hortonworks ODBC Driver with the error
ERROR [HY000] [Hortonworks][Hardy] (80) Syntax or semantic analysis error
thrown in server while executing query.
Error message from server: Error while compiling statement: FAILED:
SemanticException [Error 10044]: Line 1:18 Cannot insert into target table
because column number/types are different 'Targer': Table insclause-0 has 3
columns, but query has 2 columns.
Looks like Hive is completely ignoring the column list in the Insert clause.
Other Details
Cluster: Azure HDInsight Cluster
Hortonworkds Data Platform: HDP-2.6.2.25
OS: Windows 10
Language: C#
Any help is appreciated.
... View more
- Tags:
- Data Processing
- odbc
01-21-2018
04:30 AM
I am trying to deploy HDP cluster on Azure which uses Azure Data Lake storage. I followed the instructions here for Ambari and cluster setup. I followed the instruction here for ADLS access. When is try to start Hive I get the following error. <code>Exception in thread "main" java.lang.RuntimeException: java.io.IOException: The ownership on the staging directory adl://home/tmp/hive/root/_tez_session_dir/c96088dd-444e-42e3-9293-251656b01b17 is not as expected. It is owned by <ADLS Service Principal>. The directory must be owned by the submitter root or by root
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:582)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
I am able to access ADLS via the HDFS commands
Permission on /tmp folder
> hdfs dfs -ls /tmp
Found 1 items
drwxrwxrwt+ - <ADLS Service Principal> 0 2018-01-20 21:29 /tmp/hive
TIA
<br>
... View more
- Tags:
- adls