Member since
11-19-2015
158
Posts
25
Kudos Received
21
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
11923 | 09-01-2018 01:27 AM | |
1145 | 09-01-2018 01:18 AM | |
3810 | 08-20-2018 09:39 PM | |
516 | 07-20-2018 04:51 PM | |
1527 | 07-16-2018 09:41 PM |
09-17-2018
07:45 PM
These are spam accounts, by the way. Look at all the "answers" from the other users for every question, and they all link back to dataflair's website.
... View more
09-10-2018
06:46 PM
I think you are asking about adding directories to Datanodes.
dfs.datanode.data.dir in the hdfs-site.xml file is a comma-delimited list of directories for where the DataNode will store blocks for HDFS. Plus, https://community.hortonworks.com/questions/89786/file-uri-required-for-dfsdatanodedatadir.html
Property
Default
Description
dfs.datanode.data.dir
file://${hadoop.tmp.dir}/dfs/data
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. Directories that do not exist are ignored
Otherwise, I'm afraid your question doesn't make sense other than running mkdir HDFS command to "add a new directory in HDFS"
... View more
09-04-2018
06:58 PM
@Manish
Tiwari, perhaps you can look at https://docs.hortonworks.com/HDPDocuments/Cloudbreak/Cloudbreak-2.7.1/content/data-lake/index.html Otherwise, you can search https://docs.hortonworks.com/ for the keywords you are looking for
... View more
09-01-2018
01:27 AM
Nagios / OpsView / Sensu are popular options I've seen StatsD / CollectD / MetricBeat are daemon metric collectors (MetricBeat is somewhat tied to an Elasticsearch cluster though) that run on each server Prometheus is a popular option nowadays that would scrape metrics exposed by local service I have played around a bit with netdata, though I'm not sure if it can be applied for Hadoop monitoring use cases. DataDog is a vendor that offers lots of integrations such as Hadoop, YARN, Kafka, Zookeeper, etc. ... Realistically, you need some JMX + System monitoring tool, and a bunch exist
... View more
09-01-2018
01:18 AM
1 Kudo
A Data Lake is not tied to a platform or technology. Hadoop is not a requirement for a datalake either. IMO, a "data lake project" should not be a project description or the end goal; you can say you got your data from "source X", using "code Y", transformed and analyzed using "framework Z", but the combinations of tools out in the market that support such statements are so broad and vague that it really depends on what business use cases you are trying to solve. For example, S3 is replaceable with HDFS or GCS or Azure Storage. Redshift is replaceable with Postgres (and you really should use Athena anyway if the data you want to query is in S3, where Athena is replaceable by PrestoDB), and those can be compared to Google BigQuery. My suggestion would be not to tie yourself to a certain toolset, but if you are in AWS, their own documentation pages are very extensive. Since you are not asking about a Hortonworks specific question, I'm not sure what information you are looking for from this site.
... View more
08-24-2018
06:30 PM
You can enable JMX for metrics + Grafana for visualization, then Ambari Infra for log collection However, you will not have visibility into Consumer Lag like Confluent Control Center offers, and you will need to find some external tools to do that for you such as LinkedIn Burrow. If you are not satisfied with that, Confluent Control Center can be added to an HDP cluster with manual setup. https://docs.confluent.io/current/control-center/docs/installation/install-apache-kafka.html You will need to copy the Confluent Metrics Reporter JARs from the Confluent Enterprise download over onto your HDP Kafka nodes under /usr/hdp/current/kafka
... View more
08-21-2018
07:53 PM
@Shobhna Dhami After "available connectors" it does not list it, so you have not setup the classpath correctly, as I linked to. In Kafka 0.10, you need to run $ export CLASSPATH=/path/to/extracted-debezium-folder/*.jar # Replace with the real address
$ connect-distributed ... # Start Connect Server You can also perform a request to the /connector-plugins URL address before sending any configuration to verify the Debezium connector was correctly installed.
... View more
08-21-2018
07:46 PM
@Vamshi
Reddy
Yes, "Confluent" is not some custom version of Apache Kafka In fact, this process is very repeatable for all other Kafka Connect plugins. Download the code Build it against the Kafka version you run Move the package to the Connect server Extract the JAR files onto the Connect server CLASSPATH Run/Restart Connect
... View more
08-20-2018
11:26 PM
From a non-Hadoop machine, install Java+Maven+Git
git clone https://github.com/confluentinc/kafka-connect-hdfs
cd kafka-connect-hdfs
git fetch --all --tags --prune
git checkout tags/v4.1.2 # This is a Confluent Release number, which corresponds to a Kafka release number
mvn clean install -DskipTests
This should generate some files under the target folder in that directory.
So, using the 4.1.2 example, I would
ZIP up the "target/kafka-connect-hdfs-4.1.2-package/share/java/" folder that was built, then copy this file and extract it into all HDP servers that I want to run Kafka Connect on. For example, /opt/kafka-connect-hdfs/share/java
From there, you would find your "connect-distributed.properties" file and add a line for
plugin.path=/opt/kafka-connect-hdfs/share/java
Now, run something like this (I don't know the full location of the property files)
connect-distributed /usr/hdp/current/kafka/.../connect-distributed.properties
Once that starts, you can attempt to hit http://connect-server:8083/connector-plugins , and you should see an item for "io.confluent.connect.hdfs.HdfsSinkConnector"
If successful, continue to read the HDFS Connector documentation, then POST the JSON configuration body to the Connect Server endpoint. (or use Landoop's Connect UI tool)
... View more
08-20-2018
09:39 PM
@Shobhna Dhami Somewhere under /usr/hdp/current/kafka there is a connect-distributed script. You run this and provide a connect-distributed.properties file. Assuming you are running a recent Kafka version (above 0.11.0), In the properties file, you would add a line that includes "plugin.path" that points to a directory containing the extracted package of the debezium connector. As mentioned in the Debezium documentation Simply download the connector’s plugin archive, extract the JARs into your Kafka Connect environment, and add the directory with the JARs to Kafka Connect’s classpath. Restart your Kafka Connect process to pick up the new JARs. Kafka Documentation - http://kafka.apache.org/documentation/#connect Confluent Documentation - https://docs.confluent.io/current/connect/index.html (note: Confluent is not a "custom version" of Kafka, they just provide a stronger ecosystem around it)
... View more