Member since
01-20-2017
6
Posts
0
Kudos Received
0
Solutions
07-23-2019
09:09 AM
I want to Import data from Microsoft SQL Server to BigQuery using NiFi. I used ExecuteSQL -> PutBigQueryBatch processors and data was successfully dumped onto BigQuery in avro (default) format. But when I check the schema in avro, it shows timestamp column in {"name":"LastModifiedDate","type":["null",{"type":"long","logicalType":"timestamp-millis"}]} format which is creating this column in integer format in BigQuery table. I want to know if we can map the avro column datatype with user defined datatype so that we can use it while creating BigQuery table( say LastModifiedDate timestamp) . I can see few suggestions to define the schema registry but that will be specific to a table only. I am looking for some generic solution where we can define a centralize repository and that can be used for all tables for such types of column. The executeSQL processor configuration is as follow: The PutBigQueryBatch processor configuration is as follow: My question is pretty much similar to this https://community.hortonworks.com/questions/202979/avro-schema-change-using-convertrecord-processor-i.html but looks like it is missing the answer what i want.
... View more
Labels:
- Labels:
-
Apache NiFi
10-16-2018
07:56 AM
I have several Sqoop jobs running to pull data from Oracle data source. Ideally, these jobs are running fine but sometimes I have seen that few tables are getting stuck in the accepted stage even after sufficient resources are being allocated. The resource allocations are as follow: App type: MAPREDUCE, elapsedTime: 11h 12m 51s, Queue: ingestion, Queue usage percentage: 1.025 %, Allocated VCores: 2, Allocated Memory: 12.29 GB This table dont have much record count and generally, this completes in 10 min but today it took 11h with no progress and even no data is being fetched from the source. I am trying to debug and fix this issue and looking for assistance on this.
... View more
- Tags:
- capacity-scheduler
- container-allocation
- Data Ingestion & Streaming
- data-ingestion
- Sqoop
- yarn-scheduler
Labels:
- Labels:
-
Apache Sqoop
-
Apache YARN
12-20-2017
06:21 AM
I had a scenario where a Sqoop ingestion query failed on Hive table due to table lock issue. Does Hive maintain lock history? Is there any way to check which user has acquired this lock after lock is released. From the log I am getting below message. FAILED: Error in acquiring locks: Lock acquisition for LockRequest(component:[LockComponent(type:EXCLUSIVE, level:TABLE, dbname:db_name, tablename:table_name, operationType:NO_TXN)], txnid:0, user:user_name, hostname:myhost.org.com, agentInfo:user_20171218230532_32d0dac8-17d9-4f1c-af90-be7e6f8e0358) timed out after 5502640ms. LockResponse(lockid:28622126, state:WAITING)
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
07-24-2017
05:00 AM
I am going through Hortonworks documentation beeline vs hive cli and got to know that we can connect Hive using beeline using two different mode.
Embadded mode like !connect jdbc:hive2:// Remote Client with HiveServer2 TCP Transport Mode - !connect jdbc:hive2://<host>:<port>/<db> From the documentation it is clear that Remote transport mode supports authentication with LDAP and Kerberos. It also supports encryption with SSL.
Does it mean Embedded mode don't support LDAP and Kerberos authentications. Which mode is better for security point of view and which mode should I use to connect hive using beeline through a shell script. What are the difference between both mode. Couldn't understand much from the documentation and expecting the details explanations.
... View more
Labels:
- Labels:
-
Apache Hive
04-20-2017
11:46 AM
I am going through several blogs and tutorials and everywhere I found the JVM heap size should be set to lower than the Map and Reduce memory defined. For example, suppose I have defined the following configuration in my mapred-site.xml file. <name>mapreduce.map.memory.mb</name>
<value>4096</value>
<name>mapreduce.reduce.memory.mb</name>
<value>8192</value> Then Each Container will run JVMs for the Map and Reduce tasks. The JVM heap size should be set to lower than the Map and Reduce memory defined above, so that they are within the bounds of the Container memory allocated by YARN. Therefore It should be something like this mapreduce.map.java.opts=-Xmx3072m
mapreduce.reduce.java.opts=-Xmx6144m Why cant be define the JVM heap size same as Mapper and Reducer memory size. Is there any pros and cons of this? What will happen if I define the JVM heap size same as Mapper and Reduce memory size.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
01-20-2017
11:16 AM
How you are capturing --last-value "2016-05-23 00:00:00". Is it a hard coded value or you are capturing from database ?
... View more