Member since
11-24-2015
56
Posts
57
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1167 | 05-21-2016 02:32 PM | |
1701 | 04-26-2016 05:22 AM | |
2985 | 01-15-2016 06:23 PM | |
5966 | 12-24-2015 04:52 PM |
08-11-2020
01:40 AM
I did this in root user, found the file and changed it there. But, how to change it for each node?
... View more
12-01-2018
03:07 PM
Thank you for your answer. I wish I could buy you a beer. 🙂 My solution was slightly different but starring me right in the face. Your answer provided the perfect clue. In my case, there was no "universal" directory. However, I noticed there was a CA bundle file. It DID NOT come preconfigured with the GeoTrust CA we have for the SSL certificate we purchased for our F5. It did have some other GeoTrust CA and many others, as well. I simply opened up the cacerts.pem file and added that GeoTrust CA to the end, saved the file, and ran my test query. It then worked! [mpetronic@vmwhnsqsrclnt01 ~]$ echo "show tables" | isql -d, -b -v f5 mpetronic $(cat ~/.pw.dat)
cro_capacity_extract_tmp
cro_capacity_ranked_tmp
cro_capacity_report_final
cro_efficiency_extract_tmp
cro_efficiency_hourly_tmp
cro_efficiency_report_final
j1_total_user_counts
san_data_2
test
Here is what my directory structure looks like for the ODBC driver version I am using: [root@vmwhnsqsrclnt01 lib]# tree /usr/lib/hive/
hive
└── lib
└── native
├── hiveodbc
│ ├── ErrorMessages
│ │ └── en-US
│ │ ├── DSMessages.xml
│ │ ├── HiveODBCMessages.xml
│ │ ├── ODBCMessages.xml
│ │ ├── SQLEngineMessages.xml
│ │ └── ThriftExtensionMessages.xml
│ ├── EULA.txt
│ ├── Hortonworks\ Hive\ ODBC\ Driver\ User\ Guide.pdf
│ ├── Release\ Notes.txt
│ └── Setup
│ ├── odbc.ini
│ └── odbcinst.ini
└── Linux-amd64-64
├── api.prod.quasar.nadops.net.pem
├── cacerts.pem <<< Added GeoTrust CA to end of this file
├── cacerts.pem.orig
├── HiveODBC.did
├── hortonworks.hiveodbc.ini
└── libhortonworkshiveodbc64.so
... View more
08-18-2017
09:11 PM
You should be able to export an environment variable and place the odbc.ini file anywhere you want - where you do have write access. Can you try this and then run your test? I never tried making it work completely with just a connect string. export ODBCINI=/path/to/your/odbc.ini
... View more
09-09-2016
10:18 PM
Thanks @Constantin Stanca. Parsing that log will work just fine for my use case. Appreciate your help!!!
... View more
09-01-2016
03:15 PM
We have lots of partitioned tables and need to write queries that have partition clauses that include year, month, day values that are not as simple as: where year=2016 and month=8 and day between 7 and 14 Often, they require non contiguous ranges of days over different months and years, etc. So, I am trying to come up with a way to help users craft those partitions clauses more easily and programatically (to the greatest extent possible). I don't want users to have to write wrapper scripts around queries or have to write queries into files first as they can do something like this: beeline 'connect-string' --hivevar part_string=$(./make_part.py 2015-12-25 2016-01-07) -f some.hql Where make_part.py might be a Python script that takes two dates and forms the full partition clause string that can then be simply referenced in some.hql, like this, which I know would work: select * from table where ${hivevar:part_string}; What I would like to do is something "functionally" like this but from within beeline so users can work more interactively in the beeline shell. For example, I wish you could do this from within beeline: set hivevar:part_string=!sh ./make_part.py 2015-12-25 2016-01-07 And have the output of the !sh command become the value of the hive variable. That does not work, of course. So, I was wondering, is it possible to create a UDF that could be used in a WHERE clause that could return the partition clause string which would get evaluated by Hive properly and work as expected - meaning, proper partition pruning. Something like this: select * from table where udf_make_part("2015-12-25", "2016-01-07"); Where udf_make_part would do the same thing as make_part.py - take some date arguments and return some generated partition string. I've not worked with UDFs so far but just wondering if they could be used in this context in the WHERE clause. Or, does anyone have another useful approach for dealing with long, complicated partition clauses?
... View more
Labels:
05-21-2016
02:32 PM
1 Kudo
I stumbled onto my own answer. Appears that you have to provide a partition spec to the alter command. I figured this out after poking around in mysql to look at my hive metastore to see if that would give me a clue. These queries are what made me think that the serde information is on a partition-by-partition basis: mysql> select * from TBLS where TBL_NAME='rm'\G
*************************** 1. row ***************************
TBL_ID: 170
CREATE_TIME: 1463833647
DB_ID: 11
LAST_ACCESS_TIME: 0
OWNER: mpetronic
RETENTION: 0
SD_ID: 227
TBL_NAME: rm
TBL_TYPE: EXTERNAL_TABLE
VIEW_EXPANDED_TEXT: NULL
VIEW_ORIGINAL_TEXT: NULL
LINK_TARGET_ID: NULL
1 row in set (0.00 sec)
mysql> select * from SDS where CD_ID=170\G
*************************** 1. row ***************************
SD_ID: 227
CD_ID: 170
INPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: hdfs://mpws:8020/jup1_stats/external_tables/rm
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
SERDE_ID: 227
*************************** 2. row ***************************
SD_ID: 228
CD_ID: 170
INPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: hdfs://mpws:8020/jup1_stats/external_tables/rm/year=2016/month=5/day=10
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
SERDE_ID: 228
*************************** 3. row ***************************
SD_ID: 229
CD_ID: 170
INPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat
IS_COMPRESSED:
IS_STOREDASSUBDIRECTORIES:
LOCATION: hdfs://mpws:8020/jup1_stats/external_tables/rm/year=2016/month=5/day=11
NUM_BUCKETS: -1
OUTPUT_FORMAT: org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat
SERDE_ID: 229
Once I saw all those 'LOCATION' values in the SDS table, I tried the following command to alter the table and then the query worked. Interesting. Does this imply that you could have a different schema for each partition? Can anyone comment as to why the avro schema is tied to a partition and not simply to the whole table? alter table rm partition (year=2016) set serdeproperties ('avro.schema.url' = 'hdfs://mpws:8020/jup1_stats/avro/rm_1.avsc'); Since all my data is in partitions under the "year=2016" partition, I was able to just specify that one partition and it applied the change to all partitions under that.
... View more
05-20-2016
04:28 AM
The issues might be related to the missing blocks. Verify the block report and for missing blocks delete or upload the files.
... View more
06-20-2017
11:29 AM
I think you're missing some ranger-usersync libraries here.
... View more
05-26-2017
01:04 PM
And that said, I actually restarted Ambari as well - so I can't say for certain that the agent restart was sufficient; it may well have been the agents plus Ambari which did the trick.
... View more
04-25-2017
01:57 PM
This error is a huge problem. It needs to be patched and a warning check should be in place BEFORE starting an install.
... View more
04-25-2016
09:25 AM
You can definitely add multiple services in the Ranger UI, e.g. I recently had to secure multiple SolrCloud-Clusters with one Ranger instance. Since every SolrCloud Cluster was handling their own policies, I had to add one Ranger service for each SolrCloud cluster. I named the Ranger Services solrcloud01, solrcloud02 and solrcloud03 (this was not done through Ambari!). Usually you have one Ranger service for each Hadoop service in your cluster (e.g. hive, hdfs,...), but you could use the same Ranger instance for different clusters. E.g. you could use one Ranger instance for mycluster_dev, mycluster_int, mycluster_prd (not recommending this!) and manage all policies in one place. The naming convention <cluster>_<service> is only used when you enable the Ranger plugins through Ambari. When you enable the plugins manually (e.g. for Solr there is no Ambari support at the moment) you can choose your own name.
... View more
05-02-2016
09:00 PM
How big is this specific ORC file and can this be shared with us ? Can you also check if this is hanging in one of the mapper (that is reading this ORC file) or before you get into application/mapper in YARN.
... View more
03-08-2016
10:55 PM
3 Kudos
First, spot-on by letting the ZK processes write to their own disks. As for letting the active/passive NNs write to the same physical disks as the JNs, I think you are OK with that approach. I say that as the edits are what are being written to continuously, but the fsimage files are only being read/recreated at key points such as checkpointing and startup. I probably pitched a bit of overkill in a blog I did last year on this topic of filesystems, but feel free to check it out at https://martin.atlassian.net/wiki/x/EoC3Ag if you need some help going to sleep at night. 😉 If you do check it out, you'll notice my very clear advice is that you should still make backups of the fsimage/edits files (even w/HA enabled) to avoid a potential "bunker scene" of your own. Having seen what happens first hand by losing this information (it was a configuration screw-up, not a h/w failure), I know I simply don't want to be there again.
... View more
03-08-2016
10:01 AM
@Mark Petronic Thanks for the feedback. I would check this and get it fixed as appropriate.
... View more
09-24-2018
02:53 PM
Request for @Ancil McBarnett (or anyone else who knows): Please flesh out a little on ... "You do not want Derby in your cluster."
... View more
01-18-2018
04:24 AM
can you please answer to this post https://community.hortonworks.com/questions/162789/how-actually-namenode-ha-qjm-works.html
... View more
04-24-2018
12:24 PM
Heartbeats work fine from ambari-agent host with this: rpm -qa openssl
openssl-1.0.1e-51.el7_2.5.x86_64 But not with this: rpm -qa openssl
openssl-1.0.2k-8.el7.x86_64 With this newer version of openssl, the ambari agent is attempting to connect to ambari server using https instead of http. In our setup, ambari is restricted to just internal cluster users (admins) and therefore is not setup for https. This results in lost heartbeats. You can work around this by changing the default verification rule for python on each agent host like this: sed -i 's/verify=platform_default/verify=disable/' /etc/python/cert-verification.cfg
ambari-agent restart I know, not the best solution because you change the security default for python host-wide. But, as an interim fix, it works.
... View more
02-10-2016
03:18 AM
@Mark Petronic See if this makes sense
... View more
01-19-2016
08:41 PM
1 Kudo
Interestingly, I just upgraded Ambari from 2.1 to 2.2 as part of my upgrade plans and the Hive service check now passes. The stack trace does show Ambari running various command scripts that implement this check.
... View more
08-25-2016
07:13 PM
It is not working for me. Can you let me know if i'm doing anything wrong?? test4 is a table partitioned on lname and is ORC format. the partition I'm trying to merge has just 2 small files. ALTER TABLE test4 PARTITION (lname='vr') CONCATENATE;
... View more
11-21-2016
09:49 AM
FYI YARN by default limits the number of concurrent applications to 10000. this parameter can be changed through the YARN config in Ambari ( Yarn.scheduler.capacity.SPECIFIC_QUEUE.maximum-applications )
... View more
01-15-2016
06:23 PM
@Mark Herring No, I haven't actually pursued this too much beyond this post. My gut feel, and the way I have implemented it, is that I use the same name for all schema versions that are of the same "logical type". That just felt the most right to me. I still include the custom "version" field, as I noted in the post, as both a book-keeping feature (for human consumption) and to use programmatically. For example, I am coalescing 1000's of smaller CSV files into one larger Avro file. But, since I have multiple versions of a given CSV "logical type" in flight, say 3, I create up to three different Avro files and pack them with all the CSV data to aligns with one of thee Avro schemas. In that case, my "packer" app uses the version field from the schema to build a filename to help show that this file contains version 2 data. This is probably more for me to 'see' it in directory listings, etc, but it helps me in debugging and monitoring as this is my first build on Hadoop so the more "human friendly" the file names and paths are, the easier I can ensure my design is working. I would rather see vsat_modc_stats__v2__20160114_123456.avro than 000000023_1. 🙂 The __v2__ is the schema version in this example.
... View more