Member since
02-17-2015
40
Posts
25
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2673 | 01-31-2017 04:47 AM | |
2612 | 07-26-2016 05:46 PM | |
7602 | 05-02-2016 10:12 AM |
08-03-2017
06:37 AM
This was back in 2016, nowadays I would go for Nifi (open source) or StreamSets (free to use, pay for support) Flume is deprecated in Hortonworks now and will be removed from in future releases 3.*: deprecations_HDP.
... View more
01-31-2017
06:18 AM
I had a similar problem. I had enabled the agent_tls, but the keystore field was not filled or the file was on a different location. Now the server did not start anymore. I needed to rollback the setting, thx for your post. I used mysql tool on the command-line to connect as root to MySQL db, and executed an update: use scm;
update CONFIGS set VALUE='false' where ATTR='agent_tls';
Query OK, 1 row affected (0.05 sec) After a restart of cloudera-scm-server, the server was working again and I could enter the UI.
... View more
01-31-2017
04:47 AM
When I used the FullyQualifiedDomainName (with a '.' in it) the repo is working fine! parcelRepositories: ["http://localrepo.cdh-cluster.internal/parcels/cdh5/", "http://localrepo.cdh-cluster.internal/parcels/spark2/"]
... View more
12-14-2016
10:47 AM
I’ll try that out this week. And let you know! Thx for your advice.
... View more
12-13-2016
02:41 AM
Localrepo synced latest version from: - ClouderaDirector - ClouderaManager Also serving parcels: - CDH - spark2 Bootstrap config: cloudera-manager { ... repository: "http://localrepo/cloudera-manager/" repositoryKeyUrl: "http://localrepo/cloudera-manager/RPM-GPG-KEY-cloudera" } ... cluster { products { CDH: 5 } parcelRepositories: ["http://localrepo/parcels/cdh5/", "http://localrepo/parcels/spark2/"] ... } We start with cloudera-director-client bootstrap-remote with the config file. The ClouderaDirector provisioning: ClouderaManager, datanodes, masters are created. But script failes at around step 870/900. No errors in ClouderaManager logs, error appears in ClouderaDirector log, getting something from an empty-collection when building some Repo-list. Bootstrap remote with a config file end with failed state: /var/log/cloudera-director-server/application.log [2016-12-13 10:00:53] INFO [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: >> BootstrapClouderaManagerAgent$HostInstall/4 [DeploymentContext{environment=Environment{n
ame='DataLake-devtst', provider=InstanceProviderConfig{t ...
[2016-12-13 10:00:53] ERROR [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed
java.util.NoSuchElementException: null
at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:154)
at com.google.common.collect.Iterators.getOnlyElement(Iterators.java:307)
at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:284)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.getRepoUrl(BootstrapClouderaManagerAgent.java:325)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.newApiHostInstallArguments(BootstrapClouderaManagerAgent.java:307)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.access$200(BootstrapClouderaManagerAgent.java:63)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:162)
at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:112) Is this a bug? Or am I doing somthing wrong? Local repo looks like this, and works fine for installing ClouderaDirector: [root@localrepo mirror]# ls -ARls | grep / ./cloudera-director: ./cloudera-director/repodata: ./cloudera-director/RPMS: ./cloudera-director/RPMS/x86_64: ./cloudera-director/RPMS/x86_64/repodata: ./cloudera-manager: ./cloudera-manager/repodata: ./cloudera-manager/RPMS: ./cloudera-manager/RPMS/x86_64: ./cloudera-manager/RPMS/x86_64/repodata: ./parcels: ./parcels/cdh5: ./parcels/spark2:
... View more
Labels:
09-07-2016
10:14 AM
1 Kudo
As @Jean-Philippe Player mentions read Parquet directory as tables its not yet supported by Hive. Source: http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html. You are able to do it in Impala: # Using Impala:
CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
STORED AS PARQUET
LOCATION '/user/etl/destination';
With some spark/scala code you can generate the create table statement based on a parquet file: spark.read.parquet("/user/etl/destination/datafile1.dat").registerTempTable("mytable")
val df = sqlContext.sql("describe mytable")
// "colname (space) data-type"
val columns = df.map(row => row(0) + " " + row(1)).collect()
// Print the Hive create table statement:
println("CREATE EXTERNAL TABLE mytable")
println(s" (${columns.mkString(", ")})")
println("STORED AS PARQUET ")
println("LOCATION '/user/etl/destination/datafile1.dat';")
... View more
07-27-2016
11:47 AM
Hi @Junichi Oda, We have the same error in the Ranger log, even when the groupnames are filled: ERROR LdapUserGroupBuilder [UnixUserSyncThread] - sink.addOrUpdateUser failed with exception: org/apache/commons/httpclient/URIException, for user: userX, groups: [groupX, groupY] I have inspected the sourcecode from ranger-0.6 which is part of HDP-2.4.3.0 our current version of the stack. Interesting enough all calls to remote server inside LdapUserGroupBuilder.addOrUpdateUser(user, groups) are wrapped in a try-catch(Exception e). There is addUser, addUserGroupInfo and delXUserGroupInfo. But we don't see that in the log. The addOrUpdateUser is wrapped with try-catch(Throwable t). Looks like its an Error not an Exception! I found this RANGER-804 ticket revering to missing classes. I copied the jars in '/usr/hdp/current/ranger-usersync/lib' from another folder. The code runs but I have a Certificate PKI error at the moment because we use LDAPS, but looks like this might get you further. Greetings, Alexander
... View more
07-26-2016
06:00 PM
Hi @Zaher, Depending on your data you should care about the channel you choose. The memory-channel is simple and easy, but data is lost when the Flume-agent crashes (OutOfMemory) most likely, or power/hardware-issues also likely... There are channels with higher durability for your data. The filechannel is very durable when underlaying storage is redundant as well. Take a look at the flume-channels and there configuration options. For your OutOfMem-problem you can decrease the transaction and batch capacity and increase the heap in the flume-env config in Ambari as @Michael Miklavcic suggests.
... View more
07-26-2016
05:46 PM
2 Kudos
We manage our Flume-agents in Ambari. We have 3 'data-ingres'-nodes of many nodes. These nodes are bundled in a ConfigGroup, which are located at the top in Ambari > Flume > config with the name 'dataLoaders'. The default flume.conf is empty, for the config-group 'dataLoaders' we override the default and add 2 agents: Pulling data from a Queue and put it in Kafka + HDFS Receiving JSON and placing it on a Kafka-topic. Each host in the config-group will run the 2 agents, which can be restarted separately from the Ambari-flume summary page. When you have changed the config, it is traceable/audited in Ambari. A restart from Ambari will place the new config file for the flumes. Ambari-agent on the Flume host will inspect if the process is running and Alarm you when its dead. Ambari will help you when upgrading stack to latest version(s). notes: You cannot put a host in multiple config groups. (don't mix responsibilities) The configuration is in plain text and no validation at all. (start and check /var/log/flume/**.log)
Rolling restart for a config group is not supported (restart flume-agents 1 by 1) Ambari 'alive'-checks are super simple, locked-up agent is running, but not working... Ambari Flume data insight charts are too simple, (Grafana coming, or use JMXExporter -> Prometheus)
... View more