Member since
06-07-2016
923
Posts
319
Kudos Received
115
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1199 | 10-18-2017 10:19 PM | |
1175 | 10-18-2017 09:51 PM | |
4886 | 09-21-2017 01:35 PM | |
304 | 08-04-2017 02:00 PM | |
355 | 07-31-2017 03:02 PM |
11-11-2020
01:20 AM
You can try this ${message:unescapeXml()} This function unescapes a string containing XML entity escapes to a string containing the actual Unicode characters corresponding to the escapes.
... View more
09-16-2020
03:27 PM
I believe this will fail if you stop your job today and run it tomorrow.. now will change to other day and you will miss the data...
... View more
08-11-2020
03:11 AM
Hii You can also use https://onlinejsontools.org/ for json validator,beautify,minify,xml,yaml,CSV,bson,plain text,base64,tsv. Do checkout this site!
... View more
07-28-2020
11:31 PM
"I highly recommend skimming quickly over following slides, specially starting from slide 7. http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey" This slide is not there at the path
... View more
07-19-2020
07:37 AM
Here we have listed a few ETL tools both, traditional and Open source you can have a look at them and see for yourself which one suits your use case. 1. Panoply: Panoply is the main cloud ETL supplier and data warehouse blend. With 100+ data connectors, ETL and data ingestion is quick and simple, with only a couple of snaps and a login among you and your recently coordinated data. In the engine, Panoply is really utilizing an ELT approach (instead of conventional ETL), which makes data ingestion a lot quicker and progressively powerful, since you don't need to trust that change will finish before stacking your data. What's more, since Panoply fabricates oversaw cloud data warehouses for each client, you won't have to set up a different goal to store all the data you pull in utilizing Panoply's ELT procedure. On the off chance that you'd preferably utilize Panoply's rich arrangement of data gatherers to set up ETL pipelines into a current data warehouse, Panoply can likewise oversee ETL forms for your Azure SQL Data Warehouse. 2. Stitch: Stitch is a self-administration ETL data pipeline. The Stitch API can reproduce data from any source, and handle mass and gradual data refreshes. Stitch additionally gives a replication motor that depends on various techniques to convey data to clients. Its REST API underpins JSON or travel, which empowers programmed recognition and standardization of settled report structures into social constructions. Stitch can associate with Amazon Redshift engineering, Google BigQuery design, and Postgres design - and incorporates with BI apparatuses. Stitch is normally intended to gather, change and burden Google examination data into its own framework, to naturally give business bits of knowledge on crude data. 3. Sprinkle: Sprinkle is a SaaS platform providing ETL tool for organisations.Their easy to use UX and code free mode of operations makes it easy for technical and non technical users to ingest data from multiple data sources and drive real time insights on the data. Their Free Trial enables users to first try the platform and then pay if it fulfils the requirement. Some of the open source tools include 1. Heka: Heka is an open source programming framework for elite data gathering, investigation, observing and detailing. Its principle part is a daemon program known as 'hekad' that empowers the usefulness of social occasion, changing over, assessing, preparing and conveying data. Heka is written in the 'Go' programming language, and has worked in modules for contributing, disentangling, separating, encoding and yielding data. These modules have various functionalities and can be utilized together to assemble a total pipeline. Heka utilizes Advanced Message Queuing Protocol (AMQP) or TCP to transport data starting with one area then onto the next. It tends to be utilized to stack and parse log records from a document framework, or to perform constant investigation, charting and inconsistency recognition on a data stream. 2. Logstash: Logstash is an open source data handling pipeline that ingests data from numerous sources at the same time, changing the source data and store occasions into ElasticSearch as a matter of course. Logstash is a piece of an ELK stack. The E represents Elasticsearch, a JSON-based hunt and investigation motor, and the K represents Kibana, which empowers data perception. Logstash is written in Ruby and gives a JSON-like structure which has a reasonable division between inner items. It has a pluggable structure highlighting more than 200 modules, empowering the capacity to blend, coordinate and arrange offices over various information, channels and yield. This instrument can be utilized for BI, or in data warehouses with bring, change and putting away occasion capacities. 3. Singer: Singer's open source, order line ETL instrument permits clients to assemble measured ETL pipelines utilizing its "tap" and "target" modules. Rather than building a solitary, static ETL pipeline, Singer gives a spine that permits clients to interface data sources to capacity goals. With a huge assortment of pre-constructed taps, the contents that gather datapoints from their unique sources, and a broad choice of pre-fabricated focuses on, the contents that change and burden data into pre-determined goals, Singer permits clients to compose succinct, single-line ETL forms that can be adjusted on the fly by trading taps and focuses in and out.
... View more
07-07-2020
04:34 AM
Solr includes the specified file terms in an index. Indexing in Solr would be similar to creating an index at the end of a book that includes the words that appear in that book and their location, so basically we would take an inventory of the words that appear in the book and an inventory of the pages where said words appear That is, by including content in the index, we make said content available for search by Solr. This type of index, called an inverted index, is a way of structuring the information that will be retrieved by a search engine. You may find a longer answer of the way the information is stored and retrieved by solr in https://www.solr-tutorial.com/indexing-with-solr.html
... View more
07-05-2020
09:47 PM
Did anyone find a solution to this, I am not able to locate this jar with hdp 3.1.1.3.1.2.1-1
... View more
06-12-2020
08:41 AM
Hello, Does COBRIX support Python ? I see only Scala api's..at https://github.com/AbsaOSS/cobrix Please advice. Thanks Sreedhar Y
... View more
03-31-2020
08:56 AM
For example I want to transform the zabbix payload from v4.0 to v4.4: Zabbix Json Payload v4.0 (INPUT) {
"hosts": [
"Host B",
"Zabbix Server"
],
"groups": [
"Group X",
"Group Y",
"Group Z",
"Zabbix servers"
],
"tags": [
{
"tag": "availability",
"value": ""
},
{
"tag": "data center",
"value": "Riga"
}
],
"name": "Either Zabbix agent is unreachable",
"clock": 1519304285,
"ns": 123456789,
"eventid": 42,
"value": 1
} the JOLT transform: [
{
"operation": "shift",
"spec": {
"hosts": {
"*": [
"hosts.[&].host",
"hosts.[&].name"
]
},
"*": "&"
}
}
] The result ( Zabbix v4.4) {
"hosts" : [ {
"host" : "Host B",
"name" : "Host B"
}, {
"host" : "Zabbix Server",
"name" : "Zabbix Server"
} ],
"groups" : [ "Group X", "Group Y", "Group Z", "Zabbix servers" ],
"tags" : [ {
"tag" : "availability",
"value" : ""
}, {
"tag" : "data center",
"value" : "Riga"
} ],
"name" : "Either Zabbix agent is unreachable",
"clock" : 1519304285,
"ns" : 123456789,
"eventid" : 42,
"value" : 1
}
... View more
03-03-2020
12:59 AM
Are there news about this issue? I cannot change values to low case so I cannot use skewed tables. Regards
... View more
03-02-2020
08:33 PM
@midhunxavier I have used above code for my requirement, but having below issue.. Out Put data format: ["TER0626974_achieved","TER0630327_achieved","TER0630520_achieved","TER0537124_achieved","TER0404705_achieved"] Issue: Now the issue is writing and reading this data from Hive. We are able to insert this result into hive. But when try to read, getting below error. > archive_data <- dbGetQuery(hivecon, "SELECT * from Table") Error in .jcall(rp, "I", "fetch", stride, block) : org.apache.hive.service.cli.HiveSQLException: java.io.IOException: org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Start token not found where expected I guess this is because of : JSON should start with { and not with array ( [ )? But i am not sure how to change square brackets to {. Appreciate your support in resolving this issue. Thanks in Advance,
... View more
02-25-2020
01:41 PM
Hi @DLEdwards,
As this thread is older and was marked 'Solved' in 2016 you would have a better chance of receiving a resolution by starting a new thread. This will also provide the opportunity for you to provide details specific to your version of HDP, Storm and Metron that could aid others in providing a more accurate answer to your question.
... View more
01-16-2020
04:52 AM
Your best resource would be to contact sales for the most up to date information.
... View more
11-22-2019
10:56 PM
Hi @mqureshi , you have explained beautifully. But how the replication of blocks will impact this calculation? Please explain. Regards.
... View more
- Tags:
- namenode heap
11-20-2019
08:48 AM
To clarify for others, I believe the reason the following doesn't work is that it was executed in a bash shell, i.e. using a typical terminal. hive --hivevar rdate=112211 -e "select 9${hivevar:rdate}9" In double quoted strings, the '$' tells bash to treat "{hivevar:rdate}" as a variable but it isn't defined so it returns an empty string before being passed in to hive due to the -e flag. i.e. "9${hivevar:rdate}9" is evaulated to "99" even before passing it into hive. In contrast within single quotes there is no substitution in bash, so '9${hivevar:rdate}9' is passed into hive as is, so the following executes as the poster expected. hive --hivevar rdate=112211 -e 'select 9${hivevar:rdate}9'
... View more
11-06-2019
10:09 AM
In order to run a balancer on only one data node hdfs balancer -include -f <#datanode name can be specified> this would balance the data load on that particular DN.
... View more
10-12-2019
10:18 PM
I explained how to manually set a parameter at runtime beeline. Before to do this you have to set sth on Ambari Hive settings. Please refer: https://community.cloudera.com/t5/Support-Questions/params-that-are-allowed-to-be-modified-at-runtime-beeline/m-p/280063/highlight/true#M208647 Just use hive.security.authorization.sqlstd.confwhitelist.append=mapreduce.job.reduces
... View more
08-21-2019
10:39 AM
@ankurkapoor_wor Hi, Even I am facing the same issue as @mqureshi. I am trying to fetch data from SQL server in Avro fomat through NiFi and load it to Redshift through copy command. But the generated Avro file is converting the date and timestamp datatypes to string because of which copy command is loading all NULL values in the target table. So I tried to follow your approach, In my case I'm using ExecuteSQLRecord processor to fetch the data from SQL server and writing it to json format and then trying to convert it to Avro format using ConvertJsonToAvro processor but then I am unable to parse Record Schema. Could you please help me also to resolve this issue. Thanks in advance! Anusha
... View more
10-31-2017
11:34 PM
@mquershi please note that this API is asynchronous. Here's the method doc: public java.util.concurrent.Future<RecordMetadata> send(ProducerRecord<K,V> record)
Asynchronously send a record to a topic. Equivalent to send(record, null) . See send(ProducerRecord, Callback) for details. You should call .get() after send to make sure the event's actually sent out.
... View more
10-17-2017
06:27 PM
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.imad.streamlio</groupId>
<artifactId>kafkastorm</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<name>kafkastorm</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>1.1.0</version>
</dependency>
</dependencies>
</project>
But when I try to imoprt any packages, it sometimes doesn't show packages or sometimes I am able to import packages but when I create an object, I am unable to see my methods that an object can call. For example, I see absolutely nothing for TridentTopology object. <Ctrl>+<Space> doesn't do anything. Not even for import. I can also see jars under my "Maven Dependencies" in eclipse. I have even installed a new version of eclipse. I have run "maven clean install". It seems like I am missing something trivial. Any idea here?
... View more
Labels:
10-11-2017
12:55 AM
@Abhijeet Rajput The error you are getting is related to not setting tez.tez-ui.history-url.base value. In my case it is set to http://<hostname>:8080/#/main/view/TEZ/tez_cluster_instance where hostname is same as ambari hostname - I am guessing for you it will be localhost but I can be wrong. Once you set this value under Tez config, your error should go away.
... View more
10-11-2017
01:19 AM
@David Sheard Are you talking about the green button on the bottom right of the screen on your app - shown in my screenshot here ? You just need to click it to run your app. Storm status is not shown in Streaming Analytics App.
... View more
09-26-2017
06:58 PM
Hi @sally sally, if you are extracting only one value to attribute then its easy to use ExtractText processor:- by adding new property to it by adding regex like below. <count>(.*)<\/count> ExtractText Processor configs:- This regex only captures the value in <count></count> message and adds an attribute count to the flowfile.
... View more
09-19-2017
02:09 PM
@sally sally By setting your minimums (Min Num Entries and Min Group Size to some large value), FlowFiles that are added to a bin will not qualify for merging right away. You should then set "Max Bin Age" to a unit of time you are willing to allow a bin to hang around before it is merged regardless of the number of entries in that bin or that bins size. As far as the number of bins go, a new bin will be created for each unique filename found in the incoming queue. Should the MergeContent processor encounter more unique filenames then there are bins, the MergeContent processor will force merging of the oldest bin to free a bin for the new filename. So it is important to have enough bins to accommodate the number of unique filenames you expect to pass through this processor during the configured "max bin age" duration; otherwise, you could still end up with 1 FlowFile per merge. Thanks, Matt
... View more
09-25-2017
08:16 AM
@mqureshi This environment is our production environment and currently I can not update it . But I'll do some test in a sandbox or some other test environment and let you know the result. Thanks a lot.
... View more
04-03-2018
04:32 PM
@jiji big data Update Cassandra to listen for connections on the local machine’s IP address instead of on “localhost” Something like this listen_address: 192.168.3.133
rpc_address: 192.168.3.133 where 192.168.3.133 is the address of the machine running Cassandra in my case. Use your machine’s IP address in your setup. Please revert
... View more
09-19-2017
01:57 AM
Hey even I am facing same issue. I get a network error when I try to connect using putty.
... View more
12-27-2017
02:48 PM
I deleted all the snapshots and data after getting a go-ahead from the developers...
... View more
07-25-2017
10:12 PM
@PJ These directories exists on journal nodes if that's what you are using or whatever disk you will specify in ambari for namenode when you do your install. I think you will find the following link helpful. https://hortonworks.com/blog/hdfs-metadata-directories-explained/
... View more