About MrBee

MrBee · ‎08-09-2017

Glad to hear this is still a useful workaround 🙂

MrBee · ‎08-03-2017

This was back in 2016, nowadays I would go for Nifi (open source) or StreamSets (free to use, pay for support) Flume is deprecated in Hortonworks now and will be removed from in future releases 3.*: deprecations_HDP.

MrBee · ‎01-31-2017

I had a similar problem. I had enabled the agent_tls, but the keystore field was not filled or the file was on a different location. Now the server did not start anymore. I needed to rollback the setting, thx for your post. I used mysql tool on the command-line to connect as root to MySQL db, and executed an update: use scm; update CONFIGS set VALUE='false' where ATTR='agent_tls'; Query OK, 1 row affected (0.05 sec) After a restart of cloudera-scm-server, the server was working again and I could enter the UI.

MrBee · ‎01-31-2017

When I used the FullyQualifiedDomainName (with a '.' in it) the repo is working fine! parcelRepositories: ["http://localrepo.cdh-cluster.internal/parcels/cdh5/", "http://localrepo.cdh-cluster.internal/parcels/spark2/"]

MrBee · ‎12-14-2016

I’ll try that out this week. And let you know! Thx for your advice.

MrBee · ‎12-13-2016

Localrepo synced latest version from: - ClouderaDirector - ClouderaManager Also serving parcels: - CDH - spark2 Bootstrap config: cloudera-manager { ... repository: "http://localrepo/cloudera-manager/" repositoryKeyUrl: "http://localrepo/cloudera-manager/RPM-GPG-KEY-cloudera" } ... cluster { products { CDH: 5 } parcelRepositories: ["http://localrepo/parcels/cdh5/", "http://localrepo/parcels/spark2/"] ... } We start with cloudera-director-client bootstrap-remote with the config file. The ClouderaDirector provisioning: ClouderaManager, datanodes, masters are created. But script failes at around step 870/900. No errors in ClouderaManager logs, error appears in ClouderaDirector log, getting something from an empty-collection when building some Repo-list. Bootstrap remote with a config file end with failed state: /var/log/cloudera-director-server/application.log [2016-12-13 10:00:53] INFO [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: >> BootstrapClouderaManagerAgent$HostInstall/4 [DeploymentContext{environment=Environment{n ame='DataLake-devtst', provider=InstanceProviderConfig{t ... [2016-12-13 10:00:53] ERROR [pipeline-thread-31] - c.c.l.pipeline.util.PipelineRunner: Attempt to execute job failed java.util.NoSuchElementException: null at com.google.common.collect.AbstractIterator.next(AbstractIterator.java:154) at com.google.common.collect.Iterators.getOnlyElement(Iterators.java:307) at com.google.common.collect.Iterables.getOnlyElement(Iterables.java:284) at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.getRepoUrl(BootstrapClouderaManagerAgent.java:325) at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.newApiHostInstallArguments(BootstrapClouderaManagerAgent.java:307) at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent.access$200(BootstrapClouderaManagerAgent.java:63) at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:162) at com.cloudera.launchpad.bootstrap.cluster.BootstrapClouderaManagerAgent$HostInstall.run(BootstrapClouderaManagerAgent.java:112) Is this a bug? Or am I doing somthing wrong? Local repo looks like this, and works fine for installing ClouderaDirector: [root@localrepo mirror]# ls -ARls | grep / ./cloudera-director: ./cloudera-director/repodata: ./cloudera-director/RPMS: ./cloudera-director/RPMS/x86_64: ./cloudera-director/RPMS/x86_64/repodata: ./cloudera-manager: ./cloudera-manager/repodata: ./cloudera-manager/RPMS: ./cloudera-manager/RPMS/x86_64: ./cloudera-manager/RPMS/x86_64/repodata: ./parcels: ./parcels/cdh5: ./parcels/spark2:

MrBee · ‎09-07-2016

As @Jean-Philippe Player mentions read Parquet directory as tables its not yet supported by Hive. Source: http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/impala_parquet.html. You are able to do it in Impala: # Using Impala: CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat' STORED AS PARQUET LOCATION '/user/etl/destination'; With some spark/scala code you can generate the create table statement based on a parquet file: spark.read.parquet("/user/etl/destination/datafile1.dat").registerTempTable("mytable") val df = sqlContext.sql("describe mytable") // "colname (space) data-type" val columns = df.map(row => row(0) + " " + row(1)).collect() // Print the Hive create table statement: println("CREATE EXTERNAL TABLE mytable") println(s" (${columns.mkString(", ")})") println("STORED AS PARQUET ") println("LOCATION '/user/etl/destination/datafile1.dat';")

MrBee · ‎07-27-2016

Hi @Junichi Oda, We have the same error in the Ranger log, even when the groupnames are filled: ERROR LdapUserGroupBuilder [UnixUserSyncThread] - sink.addOrUpdateUser failed with exception: org/apache/commons/httpclient/URIException, for user: userX, groups: [groupX, groupY] I have inspected the sourcecode from ranger-0.6 which is part of HDP-2.4.3.0 our current version of the stack. Interesting enough all calls to remote server inside LdapUserGroupBuilder.addOrUpdateUser(user, groups) are wrapped in a try-catch(Exception e). There is addUser, addUserGroupInfo and delXUserGroupInfo. But we don't see that in the log. The addOrUpdateUser is wrapped with try-catch(Throwable t). Looks like its an Error not an Exception! I found this RANGER-804 ticket revering to missing classes. I copied the jars in '/usr/hdp/current/ranger-usersync/lib' from another folder. The code runs but I have a Certificate PKI error at the moment because we use LDAPS, but looks like this might get you further. Greetings, Alexander

MrBee · ‎07-26-2016

Hi @Zaher, Depending on your data you should care about the channel you choose. The memory-channel is simple and easy, but data is lost when the Flume-agent crashes (OutOfMemory) most likely, or power/hardware-issues also likely... There are channels with higher durability for your data. The filechannel is very durable when underlaying storage is redundant as well. Take a look at the flume-channels and there configuration options. For your OutOfMem-problem you can decrease the transaction and batch capacity and increase the heap in the flume-env config in Ambari as @Michael Miklavcic suggests.

MrBee · ‎07-26-2016

We manage our Flume-agents in Ambari. We have 3 'data-ingres'-nodes of many nodes. These nodes are bundled in a ConfigGroup, which are located at the top in Ambari > Flume > config with the name 'dataLoaders'. The default flume.conf is empty, for the config-group 'dataLoaders' we override the default and add 2 agents: Pulling data from a Queue and put it in Kafka + HDFS Receiving JSON and placing it on a Kafka-topic. Each host in the config-group will run the 2 agents, which can be restarted separately from the Ambari-flume summary page. When you have changed the config, it is traceable/audited in Ambari. A restart from Ambari will place the new config file for the flumes. Ambari-agent on the Flume host will inspect if the process is running and Alarm you when its dead. Ambari will help you when upgrading stack to latest version(s). notes: You cannot put a host in multiple config groups. (don't mix responsibilities) The configuration is in plain text and no validation at all. (start and check /var/log/flume/**.log) Rolling restart for a config group is not supported (restart flume-agents 1 by 1) Ambari 'alive'-checks are super simple, locked-up agent is running, but not working... Ambari Flume data insight charts are too simple, (Grafana coming, or use JMXExporter -> Prometheus)

Online	Offline
Last Visited	‎07-29-2017 03:13 AM

Member Since	‎02-17-2015 07:53 AM
Last Visited	‎07-29-2017 03:13 AM
Posts	40
Kudos received	25

Cloudera Community

Re: ClouderaDirector 2.2.0 failed with local-repos...

Re: Flume in Production - To Ambari or not to Amb...

Re: Too many open files in region server logs

Re: parquet snappy file loading into hive

Re: Flume in Production - To Ambari or not to Amb...

Re: how to rollback cloudera manager tls configura...

Re: ClouderaDirector 2.2.0 failed with local-repos...

Re: ClouderaDirector 2.2.0 failed with local-repos...

ClouderaDirector 2.2.0 failed with local-repositor...

Re: Create Hive table to read parquet files from p...

Re: How does the gid associate the groupname in th...

Re: java.lang.OutOfMemoryError: Java heap space wi...

Re: Flume in Production - To Ambari or not to Amb...