About skumar_sandeep8

skumar_sandeep8 · ‎07-23-2019

I want to Import data from Microsoft SQL Server to BigQuery using NiFi. I used ExecuteSQL -> PutBigQueryBatch processors and data was successfully dumped onto BigQuery in avro (default) format. But when I check the schema in avro, it shows timestamp column in {"name":"LastModifiedDate","type":["null",{"type":"long","logicalType":"timestamp-millis"}]} format which is creating this column in integer format in BigQuery table. I want to know if we can map the avro column datatype with user defined datatype so that we can use it while creating BigQuery table( say LastModifiedDate timestamp) . I can see few suggestions to define the schema registry but that will be specific to a table only. I am looking for some generic solution where we can define a centralize repository and that can be used for all tables for such types of column. The executeSQL processor configuration is as follow: The PutBigQueryBatch processor configuration is as follow: My question is pretty much similar to this https://community.hortonworks.com/questions/202979/avro-schema-change-using-convertrecord-processor-i.html but looks like it is missing the answer what i want.

skumar_sandeep8 · ‎12-20-2017

I had a scenario where a Sqoop ingestion query failed on Hive table due to table lock issue. Does Hive maintain lock history? Is there any way to check which user has acquired this lock after lock is released. From the log I am getting below message. FAILED: Error in acquiring locks: Lock acquisition for LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:db_name, tablename:table_name, operationType:NO_TXN)], txnid:0, user:user_name, hostname:myhost.org.com, agentInfo:user_20171218230532_32d0dac8-17d9-4f1c-af90-be7e6f8e0358) timed out after 5502640ms. LockResponse(lockid:28622126, state:WAITING)

skumar_sandeep8 · ‎04-20-2017

I am going through several blogs and tutorials and everywhere I found the JVM heap size should be set to lower than the Map and Reduce memory defined. For example, suppose I have defined the following configuration in my mapred-site.xml file. <name>mapreduce.map.memory.mb</name> <value>4096</value> <name>mapreduce.reduce.memory.mb</name> <value>8192</value> Then Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN. Therefore It should be something like this mapreduce.map.java.opts=-Xmx3072m mapreduce.reduce.java.opts=-Xmx6144m Why cant be define the JVM heap size same as Mapper and Reducer memory size. Is there any pros and cons of this? What will happen if I define the JVM heap size same as Mapper and Reduce memory size.

skumar_sandeep8 · ‎01-20-2017

How you are capturing --last-value "2016-05-23 00:00:00". Is it a hard coded value or you are capturing from database ?

Online	Offline
Last Visited	‎07-30-2019 07:11 AM

Member Since	‎01-20-2017 11:12 AM
Last Visited	‎07-30-2019 07:11 AM
Posts	6

Cloudera Community

DataType mapping in NiFi ExecuteSQL processor

How to identify the user who has acquired table lo...

mapreduce.map.java.opts memory setting and config...

Re: Basic CDC in Hadoop using Spark with Data Fram...

DataType mapping in NiFi ExecuteSQL processor

How to identify the user who has acquired table lo...

​mapreduce.map.java.opts memory setting and config...

Re: Basic CDC in Hadoop using Spark with Data Fram...

mapreduce.map.java.opts memory setting and config...