Member since
08-03-2019
186
Posts
34
Kudos Received
26
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2024 | 04-25-2018 08:37 PM | |
5951 | 04-01-2018 09:37 PM | |
1639 | 03-29-2018 05:15 PM | |
6857 | 03-27-2018 07:22 PM | |
2078 | 03-27-2018 06:14 PM |
03-26-2018
02:02 AM
@Adithya Sajjanam sys_extract_utc is not a valid Hive function. You are translating your Oracle query as is and that's why getting this error. You can refer to the Hive documentation for reference to Date functions in hive and translate your query from Oracle syntax to Hive QL accordingly.
... View more
03-25-2018
09:47 PM
@Sanjay Gurnani Looking at your issue in detail, seems you are facing problem committing the data when the there is no data. Otherwise, when you have data, it is working fine. I am not sure if that is a bug since you are able to persist the data with other formats but I will check the source code to see why this peculiar behavior happen only to ORC. In the meantime to get your code working, I would recommend checking if your dataset has some records before proceeding with the commit to the sink. This way you can escape having empty file issue, if you are getting any with Parquet etc, where the writeStream method is working.
... View more
03-25-2018
09:39 PM
@Cathy Liu The table you are trying to create already exists 🙂 Either use the existing table or drop the table and recreate it if you want it afresh! Let know if you need any other help.
... View more
03-25-2018
07:05 PM
1 Kudo
Yes, the schema is available in the avro data file and you have to extract it to pass to the hive ddl. Hive ddl expects either the schema path or the schema literal to establish the schema in metastore
... View more
03-25-2018
06:19 PM
@Kareem Amin There are two types of files when we talk about avro. Avro files - which have the data avsc files - avro schema files Looks like you have the avro data files but not the avro schema. Follows the steps which will help you to get the avro schema, the avsc files, from your data files and create table on top of them.
//Take a few lines from your avro file
hdfs dfs -cat <your avro file name> | head --bytes 10K > $SAMPLE_FILE
//Extract the avro schema from your avro data file
java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE
//Upload the schema to hdfs
hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR
//Create the hive table using avro schema
CREATE EXTERNAL TABLE sampe_table
STORED AS AVRO
LOCATION 'hdfs:///user/hive/'
TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');
PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table. Let know if that works for you.
... View more
03-24-2018
10:10 PM
@Christian Lunesa If you are talking about a Sqoop Import, that's the only tag with your question :), it is always highly recommended to use an Integral column as split-by. But since you have only string/varchar columns in your data source, you can try following options based on the data that you have. add surrogate int PK and use it also as a split or split your data manually using a custom query with WHERE clause and run sqoop few times with num-mappers=1, or apply some deterministic Integer non-aggregation function to you varchar column, for example cast(substr(...) as int) as split-column. Let know if you need any other help!
... View more
03-24-2018
10:05 PM
@Santanu Ghosh This is a typo in the practice exam questions! You are not doing anything wrong 🙂 Keep the spirits up and all the best for your exam. Let know if you need any other help!
... View more
03-24-2018
10:03 PM
@Vincent van Oudenhoven Any specific reason for using ExecuteStream Command for this use case of yours? I will recommend using ExecuteScript or InvokeScript Processor and you can perform all the aforementioned operations from your question! For example, a very beginners example can be the following script which reads a file and create an empty file with all its attributes using ExecuteScript Processor. flowFile = session.get()
attrMap = flowFile.getAttributes()
session.remove(flowFile)
newflowFile = session.create()
newflowFile = session.putAllAttributes(newflowFile, attrMap)
session.transfer(newflowFile, REL_SUCCESS) Or this groovy script in an ExecuteScript processor which can read the content of your flow files and accordingly redirect them to downstream connections. import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
flowFile = session.get()
if(!flowFile)return
def text = ''
def storeID = 0
session.read(flowFile, {inputStream ->
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
storeID = text.tokenize("|")[2]
} as InputStreamCallback)
if(storeID >=1 && storeID <= 10)
session.transfer(flowFile, REL_SUCCESS)
else (storeID >10 && storeID <= 20)
session.transfer(flowFile, REL_FAILURE) You can have an external script executed using ExecuteStream command also but why maintain any code outside when the inbuilt flow file handling logic in a processor like ExecuteScript processor can help you achieve the functionality more easily.
... View more
03-23-2018
07:45 PM
@Shantanu kumar You can use the following flow. SplitText -> ExtractText -> RouteOnAttribute Follows a small description of what these processors should achieve. 1. SplitText - Same as the detailed flow above. Split each line into a new file. 2. ExtractText - Create a new attribute with the value of your StoreID column. Create a regex which will read your data, fetch the StoreID column from nth position and create StoreID attribute out of it. 3. RouteOnAttribute - Use expression language here for redirecting your flow files. For example ${StoreID:ge(1):and(${StoreID:le(10)}} //Route to processor handline store 1 to 10${StoreID:ge(11):and(${StoreID:le(20)}} //Route to processor handline store 11 to 20 And so on. This should redirect your data per your need.
... View more
03-23-2018
06:22 AM
Sorry but can you please put the pattern a bit more clearly? A bit hard to understand in the format you mentioned.
... View more