About buntu

KishoreBandaru · ‎10-08-2018

@buntu Try this: hive> CREATE VIEW log_view PARTITIONED ON (pagename,year,month,day) AS SELECTuid,properties,pagename year,month,day FROM log; Reason: The column names used in the partition must be available at the end of view creation in the same order as mentioned in as partitions.

Tom · ‎04-12-2018

In fact, we can use jackson to solve this problem, and it is universal to any json data. morphlines: [ { id: convertJsonToAvro importCommands: [ "org.kitesdk.**" ] commands: [ # read the JSON blob { readJson: {} } # java code { java { imports : """ import com.fasterxml.jackson.databind.JsonNode; import com.fasterxml.jackson.databind.ObjectMapper; import org.kitesdk.morphline.base.Fields; import java.io.IOException; import java.util.Set; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.Map; """ code : """ String jsonStr = record.getFirstValue(Fields.ATTACHMENT_BODY).toString(); ObjectMapper mapper = new ObjectMapper(); Map<String, Object> map = null; try { map = (Map<String, Object>)mapper.readValue(jsonStr, Map.class); } catch (IOException e) { e.printStackTrace(); } Set<String> keySet = map.keySet(); for (String o : keySet) { record.put(o, map.get(o)); } return child.process(record); """ } } # convert the extracted fields to an avro object # described by the schema in this field { toAvro { schemaFile: /etc/flume/conf/a1/like_user_event_realtime.avsc } } #{ logInfo { format : "loginfo: {}", args : ["@{}"] } } # serialize the object as avro { writeAvroToByteArray: { format: containerlessBinary } } ] } ]

stsudhakara · ‎03-30-2017

Check node IP address is listed in the file that is pointing yarn.resourcemanager.nodes.include-path (Path to file with nodes to include). Make sure that you starting nodemanager with correct user permissions

buntu · ‎02-01-2017

Ok, I do notice the CDH 5.10 parcel and requires Cloudera Manager to be updated before updating CDH parcel.

surajacharya · ‎01-17-2017

Currently cloudera does not have a parcel with R present in it. If you are trying to run it with spark, here is a good discussion about it. https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/SparkR-in-CDH-5-5/td-p/34602

buntu · ‎01-12-2017

Given the size of the dataset, I believe the data fits in memory and its not providing any additional performance improvement. Thanks!

TimothySpann · ‎11-16-2017

incrementalstream-1.xml

buntu · ‎05-02-2017

Thanks, this is very useful. How would one go about getting the application name? Is it the app name or the app ID or something else? Thanks!

srowen · ‎02-22-2016

"Not supported" means you can't file support tickets for it. It's shipped and works though.

Harsh J · ‎09-18-2015

Glad to hear you were able to figure it out. In spirit of https://xkcd.com/979/, please mark the thread solved with the solution post selected, so others with a similar problem can find their solution quicker on the web.

Online	Offline
Last Visited	‎10-18-2018 01:40 AM

Member Since	‎07-21-2014 02:20 PM
Last Visited	‎10-18-2018 01:40 AM
Posts	141
Kudos received	8

Cloudera Community

Re: CDH parcel repo URL

Re: NPE if kafka has null record key

Re: Flume metrics

Re: How to create partitioned view

Re: JSON to Avro, Sub-records in Avro

Re: Yarn NodeManager fails to start

Re: CDH parcel repo URL

Re: Installing R package on CDH cluster

Re: Impala performance with HDFS caching enabled

Re: Incrementally Streaming RDBMS Data to Your Had...

Re: Google Vision & Apache NiFi - Making Advanced ...

Re: Graphx in latest CDH

Re: Sqoop fails with "Error parsing arguments for ...