Member since
03-16-2016
707
Posts
1753
Kudos Received
203
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 6962 | 09-21-2018 09:54 PM | |
| 8721 | 03-31-2018 03:59 AM | |
| 2613 | 03-31-2018 03:55 AM | |
| 2754 | 03-31-2018 03:31 AM | |
| 6174 | 03-27-2018 03:46 PM |
02-22-2017
12:41 AM
6 Kudos
@Connor O'Neal When using the command line tools to delete a topic, this is
done by creating a Zookeeper node that requests the deletion. Under normal circumstances,
this is executed by the cluster immediately. However, the command line tool has no way to
know whether or not deletion of topics is enabled in the cluster. As a result, it will
request deletion of topics regardless, and if the property is not enabled, this can be waiting. It is possible to delete the requests pending for deletion of topics to avoid
this. Topics are requested for deletion by creating a Zookeeper node as a child under /admin/delete_topic
that is named with the topic name. Deleting these Zookeeper nodes (but not the parent /admin/delete_topic node)
will remove the pending requests. Then you change your property to allow deletes and re-execute topic delete requests. +++ Hopefully this response helped. Please vote/accept as best answer.
... View more
02-22-2017
12:39 AM
3 Kudos
@Connor O'Neal The import offset tool is the opposite of exporting. See my answer to your other question to understand the exporting step: https://community.hortonworks.com/questions/84827/is-there-a-tool-to-export-offsets.html#answer-84846 To import the offsets for the consumer group named “testconsumergroup” from a file named “offsets”: kafka-run-class.sh kafka.tools.ImportZkOffsets --zkconnect mykafka.abc.com:2181/kafka-cluster --input-file offsets +++ If this helped, please vote/accept as the best answer.
... View more
02-22-2017
12:33 AM
3 Kudos
@Connor O'Neal There is no named script to export offsets, but we are able to use the kafka-run-class.sh
script to execute the underlying Java class for the tool in the proper environment.
Exporting offsets will produce a file that contains each topic partition for the group
and it’s offsets in a defined format that the import tool can read. The file that is
created will have one topic partition per line, with the
following format: /consumers/GROUPNAME/offsets/topic/TOPICNAME/PARTITIONID-0:OFFSET Let's assume export the offsets for the consumer group named “testconsumergroup” to a file named
“offsets”: # kafka-run-class.sh kafka.tools.ExportZkOffsets --zkconnect mykafka.abc.com:2181/kafka-cluster --group testconsumergroup --output-file offsets # cat offsets /consumers/testconsumergroup/offsets/my-topic/0:8905
/consumers/testconsumergroup/offsets/my-topic/1:8915
/consumers/testconsumergroup/offsets/my-topic/2:9845
/consumers/testconsumergroup/offsets/my-topic/3:8072
/consumers/testconsumergroup/offsets/my-topic/4:8008
/consumers/testconsumergroup/offsets/my-topic/5:8319
/consumers/testconsumergroup/offsets/my-topic/6:8102
/consumers/testconsumergroup/offsets/my-topic/7:12739 +++ If this helped, please vote/accept response.
... View more
02-22-2017
12:07 AM
9 Kudos
Introduction Geospatial data is generated in huge volumes with the rise
of the Internet of Things. IoT sensor networks are pushing the geospatial data
rates even higher. There has been an explosion of sensor networks on the
ground, mobile devices carried by people or mounted on vehicles, drones flying
overhead, tethered aerostats (such as Google’s Project Loon), atmosats at high
altitude, and microsats in orbit. Opportunity Geospatial analytics can provide us with the tools and
methods we need to make sense of all that data and put it to use in solving
problems we face at all scales. Challenges Geospatial work requires atypical data types (e.g., points,
shapefiles, map projections), potentially many layers of detail to process and
visualize, and specialized algorithms—not your typical ETL (extract, transform,
load) or reporting work. Apache Spark Role in
Geospatial Development While Spark might seem to be influencing the evolution of
accessory tools, it’s also becoming a default in the geospatial analytics
industry. For example, consider the development of Azavea’s open source
geospatial library GeoTrellis. GeoTrellis was written in Scala and designed to
handle large-scale raster operations. GeoTrellis recently adopted Spark as its
distributed computation engine and, in combination with Amazon Web Services,
scaled the existing raster processing to support even larger datasets. Spark
brings amazing scope to the GeoTrellis project, and GeoTrellis supplies the
geospatial capabilities that Spark lacks. This reciprocal partnership is an
important contribution to the data engineering ecosystem, and particularly to
the frameworks in development for supporting Big Data. About GeoTrellis GeoTrellisis a Scala library and framework
that uses Spark to work with raster data. It is released under the Apache 2
License. GeoTrellis
reads, writes, and operates on raster data as fast as possible. It implements
manyMap Algebraoperations
as well as vector to raster or raster to vector operations. GeoTrellis
also provides tools to render rasters into PNGs or to store metadata about
raster files as JSON. It aims to provide raster processing at web speeds
(sub-second or less) with RESTful endpoints as well as provide fast batch
processing of large raster data sets. Getting
Started GeoTrellis is currently available for Scala 2.11 and Spark
2.0+. To get started with SBT,
simply add the following to your build.sbt file: libraryDependencies += "org.locationtech.geotrellis" %% "geotrellis-raster" % "1.0.0" geotrellis-raster is just one submodule that you can
depend on. To grab the latest
snapshot build, add our snapshot repository: resolvers += "LocationTech GeoTrellis Snapshots" at "https://repo.locationtech.org/content/repositories/geotrellis-snapshots" GeoTrellis Modules
geotrellis-proj4 :
Coordinate Reference systems and reproject (Scala wrapper around Proj4j)
geotrellis-vector :
Vector data types and operations (Scala wrapper around JTS)
geotrellis-raster :
Raster data types and operations
geotrellis-spark :
Geospatially enables Spark; save to and from HDFS
geotrellis-s3 :
S3 backend for geotrellis-spark
geotrellis-accumulo :
Accumulo backend for geotrellis-spark
geotrellis-cassandra :
Cassandra backend for geotrellis-spark
geotrellis-hbase :
HBase backend for geotrellis-spark
geotrellis-spark-etl :
Utilities for writing ETL (Extract-Transform-Load), or "ingest"
applications for geotrellis-spark
geotrellis-geotools :
Conversions to and from GeoTools Vector and Raster data
geotrellis-geomesa :
Experimental GeoMesa integration
geotrellis-geowave :
Experimental GeoWave integration
geotrellis-shapefile :
Read shapefiles into GeoTrellis data types via GeoTools
geotrellis-slick :
Read vector data out of PostGIS viaLightBend Slick
geotrellis-vectortile :
Experimental vector tile support, including reading and writing
geotrellis-raster-testkit :
Testkit for testing geotrellis-raster types
geotrellis-vector-testkit :
Testkit for testing geotrellis-vector types
geotrellis-spark-testkit :
Testkit for testing geotrellis-spark code A more
complete feature list can be found at https://github.com/locationtech/geotrellis,
GeoTrellis Features section. Hello Raster with
GeoTrellis scala> import geotrellis.raster._
import geotrellis.raster._
scala> import geotrellis.raster.op.focal._
import geotrellis.raster.op.focal._
scala> val nd = NODATA
nd: Int = -2147483648
scala> val input = Array[Int](
| nd, 7, 1, 1, 3, 5, 9, 8, 2,
| 9, 1, 1, 2, 2, 2, 4, 3, 5,
|
| 3, 8, 1, 3, 3, 3, 1, 2, 2,
| 2, 4, 7, 1, nd, 1, 8, 4, 3)
2, 2, 4, 3, 5, 3, 8, 1, 3, 3, 3, 1, 2, 2, 2, 4, 7, 1, -2147483648, 1, 8, 4, 3)
scala> val iat = IntArrayTile(input, 9, 4) // 9 and 4 here specify columns and rows
iat: geotrellis.raster.IntArrayTile = IntArrayTile([I@278434d0,9,4)
// The asciiDraw method is mostly useful when you're working with small tiles
// which can be taken in at a glance
scala> iat.asciiDraw()
res0: String =
" ND 7 1 1 3 5 9 8 2
9 1 1 2 2 2 4 3 5
3 8 1 3 3 3 1 2 2
2 4 7 1 ND 1 8 4 3
"
scala> val focalNeighborhood = Square(1) // a 3x3 square neighborhood
focalNeighborhood: geotrellis.raster.op.focal.Square =
O O O
O O O
O O O
scala> val meanTile = iat.focalMean(focalNeighborhood)
meanTile: geotrellis.raster.Tile = DoubleArrayTile([D@7e31c125,9,4)
scala> meanTile.getDouble(0, 0) // Should equal (1 + 7 + 9) / 3
res1: Double = 5.666666666666667
Documentation
Further
examples and documentation of GeoTrellis use-cases can be found in the docs/ folder
Scaladocs for the latest version of the
project can be found here: http://geotrellis.github.com/scaladocs/latest/#geotrellis.package References Geospatial Data and Analysisby Aurelia Moser; Bill Day; Jon BrunerPublished by O'Reilly Media, Inc., 2017 http://geotrellis.io/
... View more
Labels:
02-21-2017
10:34 PM
1 Kudo
@James Dinkel hive.llap.io.memory.mode is in Advanced hive-interactive-site configuration tab in Ambari UI. Could you make me a favor and check that it shows in that tab?
... View more
02-21-2017
10:21 PM
1 Kudo
@Mothilal marimuthu You did not specify whether you are talking about RDD, Datasets or Dataframe. Anyhow, let't assume RDD. It is not like a columnar database where you account only for the key-value. This is a row-based format. There is cost associated with empty values. I cannot tell you the exact cost because it depends on your data types, but there is cost to it. Why don't you run yourself a test. Persist your test RDD (small) with all values completed, then one with partial values, some of them null. Again, the data type matters. You can experiment by using null values on columns of the same type, then another RDD for a different type, etc. rdd.persist(StorageLevel.MEMORY_AND_DISK)
... View more
02-21-2017
09:49 PM
1 Kudo
@James Dinkel Ok. Still not explaining that you noticed that variable default value other than "cache". That is the default. How is hive.llap.io.enabled? By default is null. Try with true.
... View more
02-21-2017
09:42 PM
1 Kudo
@Fernando Lopez Bello As you look at that connection string, it shows JDBC, however, then you mention ODBC now working over that JDBC string. Please clarify that and what BI tool.
... View more
02-21-2017
09:38 PM
2 Kudos
@James Dinkel Is there a typo n your question above? You mention hive.llap.iomemory.mode.cache Correct is: set hive.llap.io.memory.mode=cache Just checking before moving forward. What makes me to believe is a typo is that you stated that it was null which is not correct. The default is actually "cache". That makes me to believe that you mistyped the variable.
... View more
02-21-2017
09:24 PM
1 Kudo
To check hdfs run something like this: <code>dfs -lsr hdfs://localhost:9000/user/hive/warehouse/events;
Replace with your host. The extension will tell you whether is compressed.
... View more