Member since
02-17-2015
40
Posts
25
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1472 | 01-31-2017 04:47 AM | |
593 | 07-26-2016 05:46 PM | |
2503 | 05-02-2016 10:12 AM |
08-03-2017
06:37 AM
This was back in 2016, nowadays I would go for Nifi (open source) or StreamSets (free to use, pay for support) Flume is deprecated in Hortonworks now and will be removed from in future releases 3.*: deprecations_HDP.
... View more
06-22-2017
11:51 AM
It was a while ago. Cloudera works fine with AccountKey. (read + write) Cloudera cannot write using an SAS-token. Reading from a blob with SAS-token works fine. We tested the same situation with HDInsight (Hortonworks) Add the storage account with SAS token. This works (read, write) I could not update/replace the Azure-jars, because of breaking changes in the API. (Cloudera Hadoop 2.6.0 vs Hortonworks Hadoop 2.7.x) To answer you question. I recorded the network stream and found the problem. I replayed the PUT request with a low-level network tool with SAS-token as addition behind the x-copy-source header. Then the request was successful. The problem is the code generating this request Azure-Storage inside the old jar still 0.6.0 working with Hadoop 2.6.x.
... View more
03-03-2017
12:59 AM
1 Kudo
We are facing the same problem. The service autorestarts and creates multiple headDumps on /tmp/ of ~3Gb. Well spot that the OOM-error is occuring due to PermGen. The default value for all mgnt-services is Java Configuration Options for Navigator Metadata Server: -XX:MaxPermSize=196Mb (default) We have boosted it up a bit to 256 and increase the heap as well. Here is a link to the Cloudera Recommendation: Cloudera-documenations-MetadataServer-configuration Formula = (elements + relations) * 200. You can find these in the logs: grep NavServerUtil /pathtologs/cloudera-scm-navigator/* In our case 3 Gb heap should be more than enough elements 1496854 relations 1700992 3197846 200 639569200 0,6 multiply bytes gb
... View more
02-15-2017
06:36 AM
This issue is nested in the azure-jars shipped with the distribution. Cloudera is using a very old azure-storage jar version 0.6.0. This issue not present is Hortonworks (and Azure HDInsight) because they use an up-to-date version. I tried to replace the jars with updated jars. But they are linked to hdfs-2.7.x and Cloudera is using hdfs-2.6.x, so it did not work in the end. Is there an update of azure-storage jars on the roadmap of Cloudera?
... View more
01-31-2017
06:18 AM
I had a similar problem. I had enabled the agent_tls, but the keystore field was not filled or the file was on a different location. Now the server did not start anymore. I needed to rollback the setting, thx for your post. I used mysql tool on the command-line to connect as root to MySQL db, and executed an update: use scm;
update CONFIGS set VALUE='false' where ATTR='agent_tls';
Query OK, 1 row affected (0.05 sec) After a restart of cloudera-scm-server, the server was working again and I could enter the UI.
... View more
01-31-2017
04:47 AM
When I used the FullyQualifiedDomainName (with a '.' in it) the repo is working fine! parcelRepositories: ["http://localrepo.cdh-cluster.internal/parcels/cdh5/", "http://localrepo.cdh-cluster.internal/parcels/spark2/"]
... View more
01-17-2017
01:53 PM
After some time I found a possible problem. The header x-ms-copy-source refers to the original Blob to copy. When you suffix the file WITH the SAS-token the PUT-request works... Going to rest now... and sleep on it...
... View more
01-17-2017
10:02 AM
As I mentioned before the command works with Account Key, not using the SAS-token. I did a TCPDump of both situations the part where the SAS-token fails after the PUT to move the file: # ===== Account Key ======
PUT /CONTAINER/test6.txt._COPYING_?timeout=90 HTTP/1.1
Accept: application/xml
Accept-Charset: UTF-8
Content-Type:
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: c809fa4d-1f75-4acd-a2d5-9ddbb33d15b6
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/CONTAINER/_$azuretmpfolder$/0495c6ef-5529-42cb-ae51-aa479c609493test6.txt._COPYING_
x-ms-date: Tue, 17 Jan 2017 17:42:49 GMT
Authorization: SharedKey MYSTORAGEACCOUNT:/8hrG9WRAjAAAlASkaQPHx3hDZF535lqnsSH18asD5M=
Host: MYSTORAGEACCOUNT.blob.core.windows.net
Connection: keep-alive
Content-Length: 0
HTTP/1.1 202 Accepted
Transfer-Encoding: chunked
Last-Modified: Tue, 17 Jan 2017 17:42:49 GMT
ETag: "0x8D43F00414594DE"
Server: Windows-Azure-Blob/1.0 Microsoft-HTTPAPI/2.0
x-ms-request-id: 4f36d344-0001-0059-68e9-701081000000
x-ms-version: 2013-08-15
x-ms-copy-id: dcdf5297-4570-43ff-920e-3bb1e3f0ce01
x-ms-copy-status: success
Date: Tue, 17 Jan 2017 17:42:48 GMT
# ========= SAS Token ============
PUT /CONTAINER/test7.txt._COPYING_?sp=rwdl&sr=c&sv=2015-07-08&se=2017-02-20T11%3A30%3A49Z&timeout=90&sig=YyX%2BL%2FTpXAAAGAi0vqfiipuD9iVM31F0Pjwup7tA%3D HTTP/1.1
Accept: application/xml
Accept-Charset: UTF-8
Content-Type:
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: dc55f745-482f-426e-96f2-c906d90ffb46
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/CONTAINER/_$azuretmpfolder$/47c25af4-b68f-4239-b573-c51796fb2335test7.txt._COPYING_
Host: MYSTORAGEACCOUNT.blob.core.windows.net
Connection: keep-alive
Content-Length: 0
HTTP/1.1 404 The specified resource does not exist.
... View more
01-17-2017
08:48 AM
We want to write data to Azure BlobStorage-account. We are using the lastest CDH5: 5.9.0-1.cdh5.9.0.p0.23. You have two options: Use the Account-Key (root key of storage account) A SAS-token for: limited amount of time, limited privilages, limit IP-range. You can add key-value to core-site.xml to add access to an Azure Storage account. For option 1 use: key: fs.azure.account.key.ACCOUNTNAME.blob.core.windows.net val: TheAccountKeyEndingOn== For option 2 use: key: fs.azure.sas.CONTAINER.ACCOUNTNAME.blob.core.windows.net val: GeneratedSasToken Example: sr=c&sp=rwdl&sig=YyX%2BL/TpX5sdadASD7fiipuD9iVM31F0Pjwup7tA%3D&sv=2015-07-08&se=2017-02-20T11%3A30%3A49Z To be clear option 1 works fine, the problem is about option 2! When we create a SAS-token with all accessrights we are not able upload a file. # works: list
hdfs dfs -ls wasbs://mycontainer@myaccount.blob.core.windows.net/
# works: copy to hdfs
hdfs dfs -cp wasbs://mycontainer@myaccount.blob.core.windows.net/dummyfile.txt /tmp/
# fails: put to BlobStorage
hdfs dfs -put localfile.txt wasbs://mycontainer@myaccount.blob.core.windows.net/
# error: put: com.microsoft.windowsazure.storage.StorageException: The specified resource does not exist
# If you look in Azure Portal inside the BlobStore container there is a folder created:
# _$azuretmpfolder$
# with the file I wanted to copy:
# '13fe7e79-36b4-47b5-85d9-20f1f316e280localfile.txt._COPYING_' How to solve this problem? For the time we configure the Account-key (option 1) to access StorageAccount to work around the problem. Update what happens under the hood with TCPDump, these are the highlights: # >> = HTTP request # << = HTTP response # somtimes I put some headers below the request dfs dfs -put test.txt wasb://CONTAINER@MYSTORAGEACCOUNT.blob.core.windows.net/test5 # does the file exists?
>> HEAD /xml/test5.txt?SAS-token
<< HTTP/1.1 404
>> GET /xml?comp=list&sp=rwdl&sr=c&prefix=test5.txt%2&SAS-token
<< HTTP/1.1 200 OK
# does the copying file exists?
>> HEAD /xml/test5.txt._COPYING_?SAStoken
<< HTTP/1.1 404
# send the content.
>> PUT /xml/test5.txt._COPYING_?comp=blocklist&SAStoken
x-ms-client-request-id: 6738c38e-0a2c-4d9c-9c49-4f9657ff8eb0
x-ms-meta-hdi_permission: {"owner":"alexander","group":"supergroup","permissions":"rw-r--r--"}
x-ms-meta-hdi_tmpupload: _%24azuretmpfolder%24%2Ffa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_
>> PUT /xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_?blockid=AAAAALjq9uI%3D&comp=block&SAS-token
( content is send here, with some XML )
# is the send file there?
>> HEAD /xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_?SAS-token
<< HTTP/1.1 200 OK
# now move the file:
>> PUT /xml/test5.txt._COPYING_?SAS-token
x-ms-version: 2013-08-15
User-Agent: WA-Storage/0.6.0 (JavaJRE 1.7.0_67; Linux 3.10.0-514.2.2.el7.x86_64)
x-ms-client-request-id: 13c057d3-eebe-462a-869c-fb39429665dc
x-ms-copy-source: http://MYSTORAGEACCOUNT.blob.core.windows.net/xml/_$azuretmpfolder$/fa03cbe2-8267-4963-8896-744b9c431525test5.txt._COPYING_
# respone with body: << HTTP/1.1 404
<?xml version="1.0" encoding="utf-8"?>
<Error><Code>CannotVerifyCopySource</Code>
<Message>The specified resource does not exist. RequestId:da53e5d1-0001-0109-39e3-7049dc000000 Time:2017-01-17T17:01:38.5004017Z</Message></Error> Our script to Generate a SAS-token, based on https://github.com/Azure-Samples/hdinsight-dotnet-python-azure-storage-shared-access-signature/blob/master/Python/SASToken.py. import time
import getpass
from azure.storage import AccessPolicy
from azure.storage.blob import BlockBlobService
from datetime import datetime, timedelta
def main():
print("Going to generate a Container SAS-token.")
conf = get_user_input()
blob_service = get_blob_service(**conf)
policies = get_policies(blob_service, conf["container_name"])
if conf.get('policy_name') not in policies.keys():
add_new_policy(blob_service, policies, **conf)
generate_sas_token(blob_service, **conf)
def get_user_input():
return {'account_name': raw_input('StorageAccount Name: '),
'account_key': getpass.getpass('StorageAccount Key: '),
'container_name': raw_input('Container: '),
'permissions': raw_input('Permissions [rwdl] (default "rl"): ') or 'rl',
'policy_name': raw_input('Policy name (default "readonly"): ') or 'readonly',
'expiry_days': int(raw_input('Expiry days (default 365): ') or '365'),
'ip_filter': raw_input('IP filter (default None): ') or None}
def get_blob_service(account_name=None, account_key=None, container_name=None, **unused):
blob_service = BlockBlobService(account_name=account_name, account_key=account_key)
if not blob_service.exists(container_name):
raise IOError(
"Container '%s' does not exist in StorageAccount '%s'!" % (container_name, account_name))
else:
print('can access the container in that storage account.')
return blob_service
def add_new_policy(blob_service, policies, container_name=None, policy_name=None, expiry_days=None, permissions=None, **unused):
expiry = datetime.utcnow() + timedelta(days=expiry_days)
access_policy = AccessPolicy(permission=permissions, expiry=expiry)
policies[policy_name] = access_policy
print('adding new policy...')
# Set the container to the updated list of identifiers (policies)
blob_service.set_container_acl(container_name, signed_identifiers=policies)
# Wait 3 seconds for acl to propagate
time.sleep(3)
print("new policy is added.")
def get_policies(blob_service, container_name):
print('fetch current policies...')
identifiers = blob_service.get_container_acl(container_name)
for k, v in identifiers.items():
print(" - '%s': permissions: %s, start: %s, expiry: %s" % (k, v.permission, v.start, v.expiry))
return identifiers
def generate_sas_token(blob_service, container_name=None, policy_name=None, ip_filter=None, account_name=None, permissions=None, expiry=None, **unused):
print("generating new sas token...")
# Generate a new Shared Access Signature token using the policy (by name)
sas_token = blob_service.generate_container_shared_access_signature(
container_name, id=policy_name, ip=ip_filter, protocol='https')
print('')
print('Now you add/update in ClouderaManager -> HDFS -> config: core-site.xml')
print('')
print('=== key ===')
print('fs.azure.sas.%s.%s.blob.core.windows.net' % (container_name, account_name))
print('=== value ===')
print(sas_token)
print('=== description ===')
print('Token with permissions: "%s", expires "%s"' % (permissions, expiry.date()))
print('')
print('Now, restart HDFS and test with command:')
print('hdfs dfs -ls wasbs://%s@%s.blob.core.windows.net/' % (container_name, account_name))
if __name__ == "__main__":
main()
... View more
Labels:
01-03-2017
07:38 AM
Hi Garry, didn't check yet the R-Studio yet. Without openjdk-8-headless, but only with a oracle-java8. I'll update when we are going to install and use it, somewhere this week.
... View more
12-30-2016
07:51 AM
2 Kudos
Hi Garry, I ran into the same problem. It relates to https://community.cloudera.com/t5/Cloudera-Manager-Installation/Problem-with-cloudera-agent/td-p/47698. solution remove other java versions. In my case I had to removed 'java-1.8.0-openjdk-headless' which was installed by package R. Hope this helps
... View more
12-14-2016
10:47 AM
I’ll try that out this week. And let you know! Thx for your advice.
... View more
12-13-2016
06:16 AM
I have the same problem. Looks like a bug in ClouderaDirector to my opnnion. Link to the issue I have created: https://community.cloudera.com/t5/Cloudera-Director-Cloud-based/ClouderaDirector-2-2-0-failed-with-local-repository/td-p/48460
... View more
- Tags:
- got the same issue
12-13-2016
02:41 AM
Localrepo synced latest version from: - ClouderaDirector - ClouderaManager Also serving parcels: - CDH - spark2 Bootstrap config: cloudera-manager { ... repository: "http://localrepo/cloudera-manager/" repositoryKeyUrl: "http:// localrepo /cloudera-manager/RPM-GPG-KEY-cloudera" } ... cluster { products { CDH : 5 } parcelRepositories : ["http://localrepo/parcels/cdh5/", "http://localrepo/parcels/spark2/"] ... } We start with cloudera-director-client bootstrap-remote with the config file. The ClouderaDirector provisioning: ClouderaManager, datanodes, masters are created. But script failes at around step 870/900. No errors in ClouderaManager logs, error appears in ClouderaDirector log, getting something from an empty-collection when building some Repo-list. Bootstrap remote with a config file end with failed state: /var/log/cloudera-director-server/application.log [2016-12-13 10:00:53] INFO [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: >> BootstrapClouderaManagerAgent$HostInstall/4 [DeploymentContext{environment=Environment{n
ame='DataLake-devtst', provider=InstanceProviderConfig{t ...
[2016-12-13 10:00:53] ERROR [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
java.util.NoSuchElementException: null
at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:154)
at com.google.common.collect.Iterators.getOnlyElement(Iterators.java:307)
at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:284)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.getRepoUrl(BootstrapClouderaManagerAgent.java:325)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.newApiHostInstallArguments(BootstrapClouderaManagerAgent.java:307)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.access$200(BootstrapClouderaManagerAgent.java:63)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:162)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:112) Is this a bug? Or am I doing somthing wrong? Local repo looks like this, and works fine for installing ClouderaDirector: [root@localrepo mirror]# ls -ARls | grep / ./cloudera-director: ./cloudera-director/repodata: ./cloudera-director/RPMS: ./cloudera-director/RPMS/x86_64: ./cloudera-director/RPMS/x86_64/repodata: ./cloudera-manager: ./cloudera-manager/repodata: ./cloudera-manager/RPMS: ./cloudera-manager/RPMS/x86_64: ./cloudera-manager/RPMS/x86_64/repodata: ./parcels: ./parcels/cdh5: ./parcels/spark2:
... View more
Labels:
10-25-2016
12:03 AM
I understand that you can provision from Director to a Postgres-db: Still I got second question: Should I use a DB for ClouderaDirector which I reuse for other ClouderaManager + HUE and Oozie? This is my first question: If you check the appliciation.properties in /etc/cloudera-director-server/application.properties I don't see postgres. Can you check whether Cloudera Director can be hosted on a Postgres DB? #
# Configurations for database connectivity.
#
# Optional database type (h2 or mysql) (defaults to h2)
# lp.database.type: mysql
... View more
10-24-2016
05:01 AM
Cloudera Director requires a pre-installed DB or an embedded H2. In the config-files I can change the db to an external MySQL-database. First question: - Does Cloudera-Director support Postgres, is there a sample config? The plan is to deploy db on separate node in cluster close to Cloudera Director and Cloudera Manager. Second question: - Should I reuse the external-db for ClouderaDirector, ClouderaManager, Hive, HUE and OOZIE. Each service has its own database but is using the same MySQL / Postgres instance. This is easier to backup and make the DB high available. If only MySQL is supported by Cloudera Director, than the other services have to use that db-type as well.
... View more
Labels:
09-07-2016
10:14 AM
1 Kudo
As @Jean-Philippe Player mentions read Parquet directory as tables its not yet supported by Hive. Source: http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html. You are able to do it in Impala: # Using Impala:
CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
STORED AS PARQUET
LOCATION '/user/etl/destination';
With some spark/scala code you can generate the create table statement based on a parquet file: spark.read.parquet("/user/etl/destination/datafile1.dat").registerTempTable("mytable")
val df = sqlContext.sql("describe mytable")
// "colname (space) data-type"
val columns = df.map(row => row(0) + " " + row(1)).collect()
// Print the Hive create table statement:
println("CREATE EXTERNAL TABLE mytable")
println(s" (${columns.mkString(", ")})")
println("STORED AS PARQUET ")
println("LOCATION '/user/etl/destination/datafile1.dat';")
... View more
08-31-2016
10:04 AM
# on a datanode:
sudo su - hdfs
jcmd $(pgrep -U hdfs -f ".*DataNode") GC.run I can confirm this works, also the alert disappears (for a while) in Ambari. (95% -> 43% heap use)
... View more
08-31-2016
09:44 AM
I'm facing the same issues. I also bummed up the datanode heap from 1 -> 2 gigs, still Ambari-Alerts. And no noticable hdfs effects. Good to know its a new feature, which might need tuning. It could also be dependent on the JVM-version, we are at java oracle-1.8. Some garbage collection options from a running Datanode proces (-XX: prefix): +useConcMarkSweepGC ParallelGCThreads=4 NewSize=200M MaxNewSize=200M PermSize=128M
... View more
08-19-2016
06:18 AM
1 Kudo
We learned it the hard way. One of the disks crashed which contained the file-channel datadir which resulted in data-loss. Make sure your storage is redundant! Tune your batch sizes Monitor your disks, SMART can tell you that a disk is going to fail Or (when HDP stacks includes Flume 1.6) use a KafkaChannel. (not the KafkaSink) The message is accepted from the source when its put on/accepted on kafka topic.
... View more
08-09-2016
12:58 AM
Thanks, although I posted this more than 1y ago. I assume its working properly now. The docs explain the process very clairly: http://www.cloudera.com/documentation/enterprise/5-6-x/topics/cm_mc_hive_udf.html At the bottom, a restart from HS2 is mandatory.
... View more
07-27-2016
11:47 AM
Hi @Junichi Oda, We have the same error in the Ranger log, even when the groupnames are filled: ERROR LdapUserGroupBuilder [UnixUserSyncThread] - sink.addOrUpdateUser failed with exception: org/apache/commons/httpclient/URIException, for user: userX, groups: [groupX, groupY] I have inspected the sourcecode from ranger-0.6 which is part of HDP-2.4.3.0 our current version of the stack. Interesting enough all calls to remote server inside LdapUserGroupBuilder.addOrUpdateUser(user, groups) are wrapped in a try-catch(Exception e). There is addUser, addUserGroupInfo and delXUserGroupInfo. But we don't see that in the log. The addOrUpdateUser is wrapped with try-catch(Throwable t). Looks like its an Error not an Exception! I found this RANGER-804 ticket revering to missing classes. I copied the jars in '/usr/hdp/current/ranger-usersync/lib' from another folder. The code runs but I have a Certificate PKI error at the moment because we use LDAPS, but looks like this might get you further. Greetings, Alexander
... View more
07-26-2016
06:00 PM
Hi @Zaher, Depending on your data you should care about the channel you choose. The memory-channel is simple and easy, but data is lost when the Flume-agent crashes (OutOfMemory) most likely, or power/hardware-issues also likely... There are channels with higher durability for your data. The filechannel is very durable when underlaying storage is redundant as well. Take a look at the flume-channels and there configuration options. For your OutOfMem-problem you can decrease the transaction and batch capacity and increase the heap in the flume-env config in Ambari as @Michael Miklavcic suggests.
... View more
07-26-2016
05:46 PM
2 Kudos
We manage our Flume-agents in Ambari. We have 3 'data-ingres'-nodes of many nodes. These nodes are bundled in a ConfigGroup, which are located at the top in Ambari > Flume > config with the name 'dataLoaders'. The default flume.conf is empty, for the config-group 'dataLoaders' we override the default and add 2 agents: Pulling data from a Queue and put it in Kafka + HDFS Receiving JSON and placing it on a Kafka-topic. Each host in the config-group will run the 2 agents, which can be restarted separately from the Ambari-flume summary page. When you have changed the config, it is traceable/audited in Ambari. A restart from Ambari will place the new config file for the flumes. Ambari-agent on the Flume host will inspect if the process is running and Alarm you when its dead. Ambari will help you when upgrading stack to latest version(s). notes: You cannot put a host in multiple config groups. (don't mix responsibilities) The configuration is in plain text and no validation at all. (start and check /var/log/flume/**.log)
Rolling restart for a config group is not supported (restart flume-agents 1 by 1) Ambari 'alive'-checks are super simple, locked-up agent is running, but not working... Ambari Flume data insight charts are too simple, (Grafana coming, or use JMXExporter -> Prometheus)
... View more
06-19-2016
03:04 PM
Thx for the reply. I forgot to mention that these settings that you suggest are also present. unfortunately they are not applied after a server reboot when Ambari starts automatically.
... View more
05-31-2016
08:01 AM
1 Kudo
We had a problem on Datanodes, which had low nr open files limits. This will lead to exceptions when heavily using hdfs. After a machine reboot the service ambari-agent starts automatically. This is a sub-process from the init process. The limits from the init process is 1024 (soft) / 4096 (hard). Any fork from this process by default will copy the limits. If AmbariAgent is running as a root user the startup command does not change to another user and not use the limits as configured in /etc/security/limits.conf & /etc/security/limits.d/*.conf # Print AmbariAgent, HDFS and YARN limits on a DataNode
grep 'open files' /proc/$(ps aux | grep "[A]mbariAgent" | awk '{print $2}')/limits
grep 'open files' /proc/$(pgrep -U hdfs)/limits
grep 'open files' /proc/$(pgrep -u yarn java)/limits
If you restart HDFS from the AmbariServer UI the limits are still low. After restarting the AmbariAgent service the limits of AmbariAgent are increased. When restart HDFS after AmbariAgent the limits are correct. The problem still exists for nodes that are rebooted. Our Solution: Change the /etc/init.d/ambari-agent script on all the nodes and add a line just before starting the ambari agent: case "$1" in
start)
# Start is a 'fork' of init process and had 1024/4048 limits!
CURRENT_HARD_LIMIT=$(ulimit -Hn)
if [[ $CURRENT_HARD_LIMIT -lt 5000 ]]; then
ulimit -Hn 128000
fi
$command_prefx "/usr/sbin/ambari-agent $@"
;;
stop)
... View more
Labels:
05-24-2016
01:27 PM
1 Kudo
Hi @Jonas Straub, we configured a secure SolrCloud cluster, with success.
There is one MAJOR issue: https://issues.apache.org/jira/browse/RANGER-678 The ranger plugins (hive, hdfs, kafka, hbase, solr) generating audit logs, are not able to send the audit-logs to a secure Solr. The bug was reported 06/Oct/15, but not yet addressed. How do we get it addressed so people can start using a secure Solr for audit logging? Greetings, Alexander
... View more
05-24-2016
01:00 PM
1 Kudo
Great article, When testing the connection to Solr from Ranger as @Jonas Straub mentions the /var/log/ranger/admin/xa_portal.log shows the URL. It tries to access ${Solr URL}/admin/collections. So you should enter an URL ending with /solr. Than the log gives an Authentication Required 401. Now Solr is Kerbors-secured the request from Ranger to fetch collections should also use a kerberos-ticket... Did someone manage to make the lookup from Ranger to Solr (/w kerberos) work?
... View more
05-03-2016
06:57 AM
3 Kudos
We have a reliable Flume stream from JMS-source through a FileChannel to a HDFS-sink. The FileChannel buffers data before writing to HDFS. One of these blocks (log-<number>) is not valid due a hard-reset of the machine. When flume starts it tries to work through data which is still in the fileChannel and not yet delivered to the sink. The logs said that the file is corrupt. The content of the data file is binary, but you can see the headers, key (uuid), and the values (xml) in plain text. The bottom of the file looks incomplete, no ending tags. A unix move command fails with an I/O error. After copying the file, the new copied file is a little bit smaller in bytes. Flume also writes a log-<number>.meta next to it. I'm not sure how to make the two files in sync to flume can process it. I want to restore most of the data of this ~1Gb data file. What is the best approach?
... View more
Labels: