Member since
01-20-2017
6
Posts
0
Kudos Received
0
Solutions
07-23-2019
09:09 AM
I want to Import data from Microsoft SQL Server to BigQuery using NiFi. I used ExecuteSQL -> PutBigQueryBatch processors and data was successfully dumped onto BigQuery in avro (default) format. But when I check the schema in avro, it shows timestamp column in {"name":"LastModifiedDate","type":["null",{"type":"long","logicalType":"timestamp-millis"}]} format which is creating this column in integer format in BigQuery table. I want to know if we can map the avro column datatype with user defined datatype so that we can use it while creating BigQuery table( say LastModifiedDate timestamp) . I can see few suggestions to define the schema registry but that will be specific to a table only. I am looking for some generic solution where we can define a centralize repository and that can be used for all tables for such types of column. The executeSQL processor configuration is as follow: The PutBigQueryBatch processor configuration is as follow: My question is pretty much similar to this https://community.hortonworks.com/questions/202979/avro-schema-change-using-convertrecord-processor-i.html but looks like it is missing the answer what i want.
... View more
Labels:
- Labels:
-
Apache NiFi
12-20-2017
06:21 AM
I had a scenario where a Sqoop ingestion query failed on Hive table due to table lock issue. Does Hive maintain lock history? Is there any way to check which user has acquired this lock after lock is released. From the log I am getting below message. FAILED: Error in acquiring locks: Lock acquisition for LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:db_name, tablename:table_name, operationType:NO_TXN)], txnid:0, user:user_name, hostname:myhost.org.com, agentInfo:user_20171218230532_32d0dac8-17d9-4f1c-af90-be7e6f8e0358) timed out after 5502640ms. LockResponse(lockid:28622126, state:WAITING)
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
04-20-2017
11:46 AM
I am going through several blogs and tutorials and everywhere I found the JVM heap size should be set to lower than the Map and Reduce memory defined. For example, suppose I have defined the following configuration in my mapred-site.xml file. <name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value> Then Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN. Therefore It should be something like this mapreduce.map.java.opts=-Xmx3072m
mapreduce.reduce.java.opts=-Xmx6144m Why cant be define the JVM heap size same as Mapper and Reducer memory size. Is there any pros and cons of this? What will happen if I define the JVM heap size same as Mapper and Reduce memory size.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
01-20-2017
11:16 AM
How you are capturing --last-value "2016-05-23 00:00:00". Is it a hard coded value or you are capturing from database ?
... View more