About cstanca

cstanca · ‎02-22-2017

@Connor O'Neal When using the command line tools to delete a topic, this is done by creating a Zookeeper node that requests the deletion. Under normal circumstances, this is executed by the cluster immediately. However, the command line tool has no way to know whether or not deletion of topics is enabled in the cluster. As a result, it will request deletion of topics regardless, and if the property is not enabled, this can be waiting. It is possible to delete the requests pending for deletion of topics to avoid this. Topics are requested for deletion by creating a Zookeeper node as a child under /admin/delete_topic that is named with the topic name. Deleting these Zookeeper nodes (but not the parent /admin/delete_topic node) will remove the pending requests. Then you change your property to allow deletes and re-execute topic delete requests. +++ Hopefully this response helped. Please vote/accept as best answer.

cstanca · ‎02-22-2017

@Connor O'Neal The import offset tool is the opposite of exporting. See my answer to your other question to understand the exporting step: https://community.hortonworks.com/questions/84827/is-there-a-tool-to-export-offsets.html#answer-84846 To import the offsets for the consumer group named “testconsumergroup” from a file named “offsets”: kafka-run-class.sh kafka.tools.ImportZkOffsets --zkconnect mykafka.abc.com:2181/kafka-cluster --input-file offsets +++ If this helped, please vote/accept as the best answer.

cstanca · ‎02-22-2017

@Connor O'Neal There is no named script to export offsets, but we are able to use the kafka-run-class.sh script to execute the underlying Java class for the tool in the proper environment. Exporting offsets will produce a file that contains each topic partition for the group and it’s offsets in a defined format that the import tool can read. The file that is created will have one topic partition per line, with the following format: /consumers/GROUPNAME/offsets/topic/TOPICNAME/PARTITIONID-0:OFFSET Let's assume export the offsets for the consumer group named “testconsumergroup” to a file named “offsets”: # kafka-run-class.sh kafka.tools.ExportZkOffsets --zkconnect mykafka.abc.com:2181/kafka-cluster --group testconsumergroup --output-file offsets # cat offsets /consumers/testconsumergroup/offsets/my-topic/0:8905 /consumers/testconsumergroup/offsets/my-topic/1:8915 /consumers/testconsumergroup/offsets/my-topic/2:9845 /consumers/testconsumergroup/offsets/my-topic/3:8072 /consumers/testconsumergroup/offsets/my-topic/4:8008 /consumers/testconsumergroup/offsets/my-topic/5:8319 /consumers/testconsumergroup/offsets/my-topic/6:8102 /consumers/testconsumergroup/offsets/my-topic/7:12739 +++ If this helped, please vote/accept response.

cstanca · ‎02-22-2017

Introduction Geospatial data is generated in huge volumes with the rise of the Internet of Things. IoT sensor networks are pushing the geospatial data rates even higher. There has been an explosion of sensor networks on the ground, mobile devices carried by people or mounted on vehicles, drones flying overhead, tethered aerostats (such as Google’s Project Loon), atmosats at high altitude, and microsats in orbit. Opportunity Geospatial analytics can provide us with the tools and methods we need to make sense of all that data and put it to use in solving problems we face at all scales. Challenges Geospatial work requires atypical data types (e.g., points, shapefiles, map projections), potentially many layers of detail to process and visualize, and specialized algorithms—not your typical ETL (extract, transform, load) or reporting work. Apache Spark Role in Geospatial Development While Spark might seem to be influencing the evolution of accessory tools, it’s also becoming a default in the geospatial analytics industry. For example, consider the development of Azavea’s open source geospatial library GeoTrellis. GeoTrellis was written in Scala and designed to handle large-scale raster operations. GeoTrellis recently adopted Spark as its distributed computation engine and, in combination with Amazon Web Services, scaled the existing raster processing to support even larger datasets. Spark brings amazing scope to the GeoTrellis project, and GeoTrellis supplies the geospatial capabilities that Spark lacks. This reciprocal partnership is an important contribution to the data engineering ecosystem, and particularly to the frameworks in development for supporting Big Data. About GeoTrellis GeoTrellisis a Scala library and framework that uses Spark to work with raster data. It is released under the Apache 2 License. GeoTrellis reads, writes, and operates on raster data as fast as possible. It implements manyMap Algebraoperations as well as vector to raster or raster to vector operations. GeoTrellis also provides tools to render rasters into PNGs or to store metadata about raster files as JSON. It aims to provide raster processing at web speeds (sub-second or less) with RESTful endpoints as well as provide fast batch processing of large raster data sets. Getting Started GeoTrellis is currently available for Scala 2.11 and Spark 2.0+. To get started with SBT, simply add the following to your build.sbt file: libraryDependencies += "org.locationtech.geotrellis" %% "geotrellis-raster" % "1.0.0" geotrellis-raster is just one submodule that you can depend on. To grab the latest snapshot build, add our snapshot repository: resolvers += "LocationTech GeoTrellis Snapshots" at "https://repo.locationtech.org/content/repositories/geotrellis-snapshots" GeoTrellis Modules geotrellis-proj4 : Coordinate Reference systems and reproject (Scala wrapper around Proj4j) geotrellis-vector : Vector data types and operations (Scala wrapper around JTS) geotrellis-raster : Raster data types and operations geotrellis-spark : Geospatially enables Spark; save to and from HDFS geotrellis-s3 : S3 backend for geotrellis-spark geotrellis-accumulo : Accumulo backend for geotrellis-spark geotrellis-cassandra : Cassandra backend for geotrellis-spark geotrellis-hbase : HBase backend for geotrellis-spark geotrellis-spark-etl : Utilities for writing ETL (Extract-Transform-Load), or "ingest" applications for geotrellis-spark geotrellis-geotools : Conversions to and from GeoTools Vector and Raster data geotrellis-geomesa : Experimental GeoMesa integration geotrellis-geowave : Experimental GeoWave integration geotrellis-shapefile : Read shapefiles into GeoTrellis data types via GeoTools geotrellis-slick : Read vector data out of PostGIS viaLightBend Slick geotrellis-vectortile : Experimental vector tile support, including reading and writing geotrellis-raster-testkit : Testkit for testing geotrellis-raster types geotrellis-vector-testkit : Testkit for testing geotrellis-vector types geotrellis-spark-testkit : Testkit for testing geotrellis-spark code A more complete feature list can be found at https://github.com/locationtech/geotrellis, GeoTrellis Features section. Hello Raster with GeoTrellis scala> import geotrellis.raster._ import geotrellis.raster._ scala> import geotrellis.raster.op.focal._ import geotrellis.raster.op.focal._ scala> val nd = NODATA nd: Int = -2147483648 scala> val input = Array[Int]( | nd, 7, 1, 1, 3, 5, 9, 8, 2, | 9, 1, 1, 2, 2, 2, 4, 3, 5, | | 3, 8, 1, 3, 3, 3, 1, 2, 2, | 2, 4, 7, 1, nd, 1, 8, 4, 3) 2, 2, 4, 3, 5, 3, 8, 1, 3, 3, 3, 1, 2, 2, 2, 4, 7, 1, -2147483648, 1, 8, 4, 3) scala> val iat = IntArrayTile(input, 9, 4) // 9 and 4 here specify columns and rows iat: geotrellis.raster.IntArrayTile = IntArrayTile([I@278434d0,9,4) // The asciiDraw method is mostly useful when you're working with small tiles // which can be taken in at a glance scala> iat.asciiDraw() res0: String = " ND 7 1 1 3 5 9 8 2 9 1 1 2 2 2 4 3 5 3 8 1 3 3 3 1 2 2 2 4 7 1 ND 1 8 4 3 " scala> val focalNeighborhood = Square(1) // a 3x3 square neighborhood focalNeighborhood: geotrellis.raster.op.focal.Square = O O O O O O O O O scala> val meanTile = iat.focalMean(focalNeighborhood) meanTile: geotrellis.raster.Tile = DoubleArrayTile([D@7e31c125,9,4) scala> meanTile.getDouble(0, 0) // Should equal (1 + 7 + 9) / 3 res1: Double = 5.666666666666667 Documentation Further examples and documentation of GeoTrellis use-cases can be found in the docs/ folder Scaladocs for the latest version of the project can be found here: http://geotrellis.github.com/scaladocs/latest/#geotrellis.package References Geospatial Data and Analysisby Aurelia Moser; Bill Day; Jon BrunerPublished by O'Reilly Media, Inc., 2017 http://geotrellis.io/

cstanca · ‎02-21-2017

@James Dinkel hive.llap.io.memory.mode is in Advanced hive-interactive-site configuration tab in Ambari UI. Could you make me a favor and check that it shows in that tab?

cstanca · ‎02-21-2017

@Mothilal marimuthu You did not specify whether you are talking about RDD, Datasets or Dataframe. Anyhow, let't assume RDD. It is not like a columnar database where you account only for the key-value. This is a row-based format. There is cost associated with empty values. I cannot tell you the exact cost because it depends on your data types, but there is cost to it. Why don't you run yourself a test. Persist your test RDD (small) with all values completed, then one with partial values, some of them null. Again, the data type matters. You can experiment by using null values on columns of the same type, then another RDD for a different type, etc. rdd.persist(StorageLevel.MEMORY_AND_DISK)

cstanca · ‎02-21-2017

@James Dinkel Ok. Still not explaining that you noticed that variable default value other than "cache". That is the default. How is hive.llap.io.enabled? By default is null. Try with true.

cstanca · ‎02-21-2017

@Fernando Lopez Bello As you look at that connection string, it shows JDBC, however, then you mention ODBC now working over that JDBC string. Please clarify that and what BI tool.

cstanca · ‎02-21-2017

@James Dinkel Is there a typo n your question above? You mention hive.llap.iomemory.mode.cache Correct is: set hive.llap.io.memory.mode=cache Just checking before moving forward. What makes me to believe is a typo is that you stated that it was null which is not correct. The default is actually "cache". That makes me to believe that you mistyped the variable.

cstanca · ‎02-21-2017

To check hdfs run something like this: <code>dfs -lsr hdfs://localhost:9000/user/hive/warehouse/events; Replace with your host. The extension will tell you whether is compressed.

Online	Offline
Last Visited	‎03-22-2019 03:12 AM

Member Since	‎03-16-2016 04:06 PM
Last Visited	‎03-22-2019 03:12 AM
Posts	707
Kudos received	1728

Cloudera Community

Re: 5th attempt at getting an answer to this quest...

Re: Trying to reinstall Apache NiFi 1.5 on HDF 3.1

Re: Is it mandatory that we should have exact moun...

Re: Alternate to smartsense

Re: Tracking of Hive tables metadata changes in re...

Re: How to remove topics to be deleted when deleti...

Re: Is there a tool to import offsets?

Re: Is there a tool to export offsets?

Open Source Geospatial Analytics with Apache Spar...

Re: LLAP not using io cache

Re: Spark SQL in-memory space managment

Re: LLAP not using io cache

Re: Connecting BI tools to Spark

Re: LLAP not using io cache

Re: Create Compressed avro Hive table