Member since
09-01-2016
44
Posts
3
Kudos Received
0
Solutions
04-20-2022
03:04 AM
Hello all Please reply to this ASAP I am trying to install vm on my pc but my screen is sticking at the same point That Extracting and loading the hortonworks sandbox I have assigned 8 gb of ram n my laptop configuration is 8gb also and i5 n 7th gen
... View more
07-28-2018
06:28 AM
refer the thread https://community.hortonworks.com/questions/1786/how-to-clean-up-purge-kafka-queues.html
... View more
09-19-2017
06:13 PM
@Gobi Subramani it resemble what you are trying to do is auditing. If that is the case then create a audit table with source table name ,target table name, Records loaded and counts and sum. Get the values from select statement and use it with insert to the audit table. It should work fine. If it is not audit then you have to either append the data to a file which I have mentioned in the previous comment or you have to create a table and insert the data into the corresponding tables. Happy Hadooping!
... View more
04-05-2017
01:14 PM
@Namit Maheshwari Yes, there is a pattern for creating partition (yyyy-mm-dd) .. OK,your idea is , run the command and store the result and check for the existence of the partition ?? Is there any other simple way to check ?
... View more
11-22-2016
03:15 PM
1 Kudo
@Gobi Subramani You are looking at it wrong. Spark Context is the main entry point into Spark and is the connection to a Spark cluster, and can be used to create RDDs, accumulators etc. on that cluster. You can run both in cluster as well as local mode and if you would have to define which one, you'd define that in the Spark context. The workers don't get the Spark context per say, but if you were to package your program into a jar, the cluster manager would be responsible for copying the jar file to the workers, before it allocates tasks.
... View more
11-16-2016
07:23 AM
1 Kudo
This should be easy enough for you to test: 1. Insert values 1 to 40 for column user_id into table user_info_bucketed 2. Insert around 440 rows from 41 to 440 3. Ideally, each bucket should have about 19 rows, or around that 4. You can then check something like: SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 5;
SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 50;
SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 101;
SELECT user_id,INPUT__FILE__NAME FROM user_info_bucketed WHERE user_id = 160;
OR you can check the physical location of the file on HDFS to determine the line count.
... View more
11-14-2016
10:59 AM
@Gobi Subramani in case of memory pressure Spark will automatically evict RDD partitions from Workers in an LRU manner if no caching or persistence applied. depending on worker memory available LRU eviction happens independently on each Worker node.
... View more
11-08-2016
01:23 PM
1 Kudo
HDP 2.3+ packages Sqoop 1.4.6 which allows direct import to HDFS as parquet file, by using: --as-parquetfile See: https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html If you import directly to hive table (vs HDFS) you may need to do this as 2-step process (https://community.hortonworks.com/questions/56847/parquet-files-sqoop-import.html)
... View more
11-07-2016
02:44 PM
Hi Gobi, In your KafkaProducer constructor, you instantiate the class with a set of Properties, which should include a list of Brokers. This allows the Producer to have knowledge of more than one server. If you only have one server listed, then, yes, if that server goes down, your Producer will be unable to send any more messages. However, this scenario is highly unlikely because it is a best practice to use more than one Broker in your cluster. One benefit of configuring your Producer with a list of servers allows you to send messages without having to worry about the IP address of the particular server that will receive your messages. In terms of defining the topic, to which you will send your messages, this is defined in the ProducerRecord and can be achieved with something like this: Properties props = new Properties();
props.put(“bootstrap.servers”, “192.168.86.10:9092, host2:port, host3:port”);
Producer<String, String> producer = new KafkaProducer<>(props);
producer.send(new ProducerRecord<String, String>(“test-topic”, “hello distributed commit log”)); Have a great day, Brian
... View more
10-05-2016
07:38 PM
@Gobi Subramani I would suggest that you download and install HDP. It can handle creating the data flow for you. Here's an example of it collecting logs. Instead of writing to an Event bus you could use putHDFS connector and it would write it to hdfs for you. There isn't a lot of trickery to get the date/folder to work, you just need to ${now()} in place of the folder name to get the schema you are looking for. If you look around there are lots of walk throughs and templates. I have included a pic of a simple flow that would likely solve your issue.
... View more