Waggle dance что это
Waggle dance что это
You can obtain Waggle Dance from Maven Central:
Waggle Dance is a request routing Hive metastore proxy that allows tables to be concurrently accessed across multiple Hive deployments. It was created to tackle the appearance of dataset silos that arose as our large organization gradually migrated from monolithic on-premises clusters to cloud based platforms.
In short, Waggle Dance provides a unified end point with which you can describe, query, and join tables that may exist in multiple distinct Hive deployments. Such deployments may exist in disparate regions, accounts, or clouds (security and network permitting). Dataset access is not limited to the Hive query engine, and should work with any Hive metastore enabled platform. We’ve been successfully using it with Spark for example.
We also use Waggle Dance to apply a simple security layer to cloud based platforms such as Qubole, DataBricks, and EMR. These currently provide no means to construct cross platform authentication and authorization strategies. Therefore we use a combination of Waggle Dance and network configuration to restrict writes and destructive Hive operations to specific user groups and applications.
We maintain a mapping of virtual database names to federated metastore instances. These virtual names are resolved by Waggle Dance during execution and requests are forwarded to the mapped metastore instance.
Virtual database name | Mapped database name | Mapped metastore URIs |
---|---|---|
mydb | mydb | thrift://host:port/ |
So when we do the following in a Hive CLI client connected to a Waggle Dance instance:
We are actually performing the query against the thrift://host:port/ metastore. All metastore calls will be forwarded and data will be fetched and processed locally. This makes it possible to read and join data from different Hive clusters via a single Hive CLI.
Waggle Dance is intended to be installed and set up as a service that is constantly running and should be installed on a machine that is accessible from wherever you want to query it from and which also has access to the Hive metastore service(s) that it is federating. Waggle Dance is available as a RPM or TGZ package, steps for installation of both are covered below.
The TGZ package provides a «vanilla» version of Waggle Dance that is easy to get started with but will require some additional scaffolding in order to turn it into a fully-fledged service.
Download the TGZ from Maven central and then uncompress the file by executing:
Although it’s not necessary, we recommend exporting the environment variable WAGGLE_DANCE_HOME by setting its value to wherever you extracted it to:
Refer to the configuration section below on what is needed to customise the configuration files before continuing.
Running on the command line
To run Waggle Dance execute:
Log messages will be output to the standard output by default.
The RPM package provides a fully-fledged service version of Waggle Dance.
Download the RPM from Maven Central and install it using your distribution’s packaging tool, e.g. yum :
Refer to the configuration section below on what is needed to customise the configuration files before continuing.
Running as a service
Once configured, the service needs to be started:
Currently any changes to the configuration files require restarting the service in order for the changes to take effect (the exception to this is any changes to the log4j2.xml logging config file which will be picked up while running):
In order to start using Waggle Dance it must first be configured for your environment. The simplest way to do this is to copy and then modify the template configuration files that are provided by the Waggle Dance package, i.e.:
The table below describes all the available configuration values for Waggle Dance server:
The table below describes all the available configuration values for Waggle Dance federations:
The table below describes the metastore tunnel configuration values:
The table below describes the mapped-tables configuration. For each entry in the list, a database name and the corresponding list of table names/patterns must be mentioned.
Property | Required | Description |
---|---|---|
*.mapped-tables[n].database | Yes | Name of the database which contains the tables to be mapped. |
*.mapped-tables[n].mapped-tables | Yes | List of tables allowed for the database specified in the field above. This property supports both full table names and Java RegEx patterns (both being case-insensitive). |
A metastore’s access control configuration is controlled by the access-control-type property.
The available values of this property are described below.
Property | Description |
---|---|
READ_ONLY | Read only access, creation of databases and and update/alters or other data manipulation requests to the metastore are not allowed. |
READ_AND_WRITE_AND_CREATE | Reads are allowed, writes are allowed on all databases, creating new databases is allowed. |
READ_AND_WRITE_AND_CREATE_ON_DATABASE_WHITELIST | Reads are allowed, writes are allowed on database names listed in the primary-meta-store.writable-database-white-list property, creating new databases is allowed and they are added to the white-list automatically. |
READ_AND_WRITE_ON_DATABASE_WHITELIST | Reads are allowed, writes are allowed on database names listed in the primary-meta-store.writable-database-white-list and federated-meta-stores[n].writable-database-white-list properties, creating new databases is not allowed. |
There are a number of write operations in the metastore whose requests do not contain database/table name context, and so cannot be routed to federated metastore instances configured with a writeable access control level.
This is not an issue for general operation, but may be a problem if you are wanting to use certain specific Hive features. At this time these features cannot be supported in a writable federation model.
Federation configuration storage
The following properties are configured in the server configuration file(waggle-dance-server.yml) and control the behaviour of the YAML federation storage:
Configuring a SSH tunnel
As outlined above the metastore-tunnel property is used to configure Waggle Dance to use a tunnel. The tunnel route expression is described with the following EBNF:
If all machines in the tunnel expression are not included in the known_hosts file then metastore-tunnel.strict-host-key-checking should be set to no.
To add the fingerprint of remote-box in to the known___hosts file the following command can be used:
The following configuration snippets show a few examples of valid tunnel expressions.
Simple tunnel to metastore server
Simple tunnel to cluster node with current user
Bastion host to cluster node with different users and key-pairs
Bastion host to cluster node with same user
Bastion host to cluster node with current user
Bastion host to metastore via jump-box with different users and key-pairs
Waggle Dance exposes a set of metrics that can be accessed on the /metrics end-point. These metrics include a few standard JVM, Spring and per-federation metrics which include per-metastore number of calls and invocation duration. If a Graphite server is provided in the server configuration then all the metrics will be exposed in the endpoint and Graphite.
The following snippet shows a typical Graphite configuration:
Prometheus can also be used to gather metrics. This can be done by enabling the Prometheus endpoint in the configuration:
Database resolution: MANUAL
Using this example Waggle Dance can be used to access all databases in the primary metastore and etldata / mydata from the federated metastore. The databases listed must not be present in the primary metastore otherwise Waggle Dance will throw an error on start up. If you have multiple federated metastores listed a database can only be uniquely configured for one metastore. Following the example configuration a query select * from etldata will be resolved to the federated metastore. Any database that is not mapped in the config is assumed to be in the primary metastore.
All non-mapped databases of a federated metastore are ignored and are not accessible.
Adding a mapped database in the configuration requires a restart of the Waggle Dance service in order to detect the new database name and to ensure that there are no clashes.
Database resolution: PREFIXED
In this scenario, like in the previous example, the query: select * from waggle_prod_etldata.my_table will effectively be this query: select * from etldata.my_table on the federated metastore. Any other databases which exist in the metastore named federated won’t be visible to clients.
Sample run through
Assumes database resolution is done by adding prefixes. If database resolution is done manually via the a list of configured databases the prefixes in this example can be ommitted.
Connect to Waggle Dance:
Show databases in all your metastores:
Join two tables in different metastores:
Database Name Mapping
NOTE: mapping names adds an extra layer of abstraction and we advise to use this as a temporary migration solution only. It becomes harder to debug where a virtual (remapped) table actually is coming from.
So in this example we have a Data lake X which federates tables from another Data lake Y. The desired end result is to show a ‘datawarehouse’ database and a ‘booking’ database in X which is proxied to the same ‘datawarehouse’ database in Y.
To achieve a unified view of all the booking tables in the different databases (without actually renaming them in Hive) we can configure Waggle Dance in X to map from the old to the new names like so:
Note: Both the ‘datawarehouse’ name and the mapped name ‘booking’ are shown in X, so the mapping adds an additional virtual database mapping to the same remote database. You can only map one extra name and you cannot map different databases to the same name. This is not allowed (will fail to load (invalid yaml)):
This is not allowed (will fail to load (invalid mapping)):
If an optional mapped-databases is used that filter is applied first and the renaming is applied after.
Being a Spring Boot Application, all standard actuator endpoints are supported.
Waggle Dance uses Log4j 2 for logging. In order to use a custom Log4j 2 XML file, the path to the logging configuration file has to be added to the server configuration YAML file:
This only works when Waggle Dance is obtained from the compressed archive (.tar.gz) file. If the RPM version is being used, the default log file path is hardcoded. Refer to the RPM version section for more details.
Hive Views and prefixes
Hive UDFs and prefixes
Hive UDFs are registered with a database. There are currently two limitations in how Waggle Dance deals with them:
Due to the distributed nature of Waggle Dance using UDFs is not that simple. If you would like a UDF to be used from a federated metastore we’d recommend registering the code implementing it in a distributed file or object store that is accessible from any client (for example you could store the UDF’s jar file on S3). See creating permanent functions in the Hive documentation.
Hive metastore filter hook
Waggle Dance can be built from source using Maven:
If you would like to ask any questions about or discuss Waggle Dance please join our mailing list at
The Waggle Dance logo uses the Beetype Filled font by Adrian Candela under the Creative Commons Attribution License (CC BY).
This project is available under the Apache 2.0 License.
Copyright 2016-2019 Expedia, Inc.
About
Hive federation service. Enables disparate tables to be concurrently accessed across multiple Hive deployments.
Австрийский этолог и лауреат Нобелевской премии Карл фон Фриш был одним из первых, кто перевел значение танца виляния.
СОДЕРЖАНИЕ
Описание
Покачиваясь, танцующие пчелы, которые были в гнезде в течение длительного времени, регулируют углы своего танца, чтобы приспособиться к изменяющемуся направлению солнца. Следовательно, пчелы, которые следуют за покачиванием в танце, по-прежнему правильно направляются к источнику пищи, даже если его угол относительно солнца изменился.
Кевин Эбботт и Реувен Дукас из Университета Макмастера в Гамильтоне, Онтарио, Канада, обнаружили, что если мертвую западную медоносную пчелу поместить на цветок, пчелы будут выполнять гораздо меньше танцев покачивания, вернувшись в улей. Ученые объясняют, что пчелы ассоциируют мертвую пчелу с присутствием хищника у источника пищи. Таким образом, уменьшение частоты повторения танцев указывает на то, что танцующие пчелы выполняют и передают форму анализа риска / пользы.
«В каждом путешествии пчела не перелетает от цветка одного вида к цветку другого, а переходит, скажем, с одной фиалки на другую фиалку и никогда не вмешивается в другой цветок, пока тот не вернется в улей; улей они сбрасывают свою ношу, и за каждой возвращающейся пчелой следуют три или четыре товарища. Что именно они собирают, трудно увидеть, а как они это делают, не наблюдалось ».
Юрген Таутц также пишет об этом в своей книге «Шут о пчелах» (2008):
Многие элементы коммуникации, используемые для набора минисогрева на места кормления, также наблюдаются в «истинном» роящем поведении. Миниогары собирателей не подвергаются тому же давлению отбора, что и настоящие стаи, потому что судьба всей колонии не стоит на кону. По-настоящему многочисленную колонию нужно быстро привести в новый дом, иначе она погибнет. Поведение, используемое для вербовки к источникам пищи, возможно, развилось из «истинного» роения.
Механизм
Полемика
Танцевальный язык против танца виляния
Пчелы, которые следуют за танцем виляния, могут успешно добывать корм, не расшифровывая информацию о языке танца, несколькими способами:
Танцевальный язык как язык
Эффективность и адаптация
Танец виляния может быть менее эффективным, чем думали раньше. Некоторые пчелы наблюдают более 50 пробежек без успешного кормления, в то время как другие будут успешно кормиться после наблюдения 5 пробежек. Точно так же исследования показали, что пчелы редко используют информацию, передаваемую в танце виляния, и, похоже, делают это только в десяти процентах случаев. Очевидно, существует конфликт между личной информацией или индивидуальным опытом и социальной информацией, передаваемой через танцевальное общение. Это проливает свет на тот факт, что отслеживание социальной информации более затратно с точки зрения энергии, чем самостоятельный поиск пищи, и не всегда выгодно. Используя обонятельные сигналы и память об обильных местах кормления, пчелы могут успешно добывать корм самостоятельно, не тратя потенциально значительную энергию, необходимую для обработки и выполнения указаний, переданных их собратьями-сборщиками.
Танец виляния полезен в одних средах, но не в других, что дает правдоподобное объяснение того, почему информация, предоставляемая танцами виляния, используется редко. В зависимости от погоды, других конкурентов и характеристик источника пищи передаваемая информация может быстро ухудшиться и стать устаревшей. В результате, как сообщается, фуражиры привязаны к своим местам кормления и продолжают многократно посещать один участок после того, как он стал убыточным. Например, танец виляния играет значительно большую роль в поиске пищи, когда источников пищи не так много. Например, в средах с умеренным климатом семьи медоносных пчел обычно исполняют танец виляния, но все же были в состоянии успешно добывать корм, когда информация о местоположении, предоставленная танцем, была экспериментально скрыта. Однако в тропических местах обитания медоносные пчелы собирают пищу, если предотвращают виляние движения. Считается, что это различие связано с неоднородностью ресурсов в тропической среде по сравнению с однородностью ресурсов в условиях умеренного климата. В тропиках пищевые ресурсы могут быть представлены в виде цветущих деревьев, богатых нектаром, но редких и непродолжительных цветущих. Таким образом, в тропических зонах информация о местонахождении кормов может быть более ценной, чем в зонах с умеренным климатом.
Эволюция
Предки современных медоносных пчел, скорее всего, совершали возбуждающие движения, чтобы побудить других собратьев по гнезду на корм. Эти возбуждающие движения включают тряску, зигзагообразные движения, жужжание и врезание в сокамерников. Подобное поведение наблюдается и у других перепончатокрылых, включая безжалостных пчел, ос, шмелей и муравьев.
Одна многообещающая теория эволюции танца виляния, впервые предложенная Мартином Линдауэром, состоит в том, что танец виляния изначально способствовал передаче информации о новом месте гнездования, а не пространственной информации о местах кормления.
Наблюдения показали, что разные виды медоносных пчел имеют разные «диалекты» виляющего танца, причем танец каждого вида или подвида различается по кривой или продолжительности. Исследование 2008 года показало, что смешанная колония азиатских медоносных пчел ( Apis cerana cerana ) и европейских медоносных пчел ( Apis mellifera ligustica ) постепенно стала понимать «диалекты» танца виляния друг друга.
Приложения для исследования операций
В этой статье мы представляем новый алгоритм маршрутизации BeeHive, который был вдохновлен коммуникативными и оценочными методами и процедурами медоносных пчел. В этом алгоритме пчелиные агенты путешествуют через сетевые регионы, называемые зонами кормодобывания. По пути их информация о состоянии сети доставляется для обновления локальных таблиц маршрутизации. BeeHive отказоустойчив, масштабируем и полностью полагается на локальную или региональную информацию соответственно. Мы демонстрируем с помощью обширного моделирования, что BeeHive обеспечивает аналогичную или лучшую производительность по сравнению с современными алгоритмами.
Еще одна пчелы вдохновленного stigmergic техника вычислительная называется оптимизацией колонии пчел используются в серверной оптимизации Интернета.
Протокол Zigbee RF назван в честь танца виляния.
waggle dance
1 waggle dance
2 waggle dance
3 waggle dance
См. также в других словарях:
Waggle dance — is a term used in beekeeping and ethology for a particular figure eight dance of the honey bee. By performing this dance, successful foragers can share with their hive mates information about the direction and distance to patches of flowers… … Wikipedia
waggle dance — (ARTHROPODA: Insecta) A dance performed by honeybees indicating source and location of a good source … Dictionary of invertebrate zoology
waggle dance — noun A dance in the form of a figure eight performed by the honey bee in order to communicate the direction and distance of patches of flowers, water sources, etc … Wiktionary
waggle dance — a series of patterned movements performed by a scouting bee, communicating to other bees of the colony the direction and distance of a food source or hive site. * * * … Universalium
waggle dance — a series of patterned movements performed by a scouting bee, communicating to other bees of the colony the direction and distance of a food source or hive site … Useful english dictionary
Dance of the bee — For the honeybee dance, see Waggle dance. The dance of the bee or dance of the wasp was a provocative Egyptian dance, part of the repertoire of the dancing girls of the Ghawazee. It was perhaps not unlike the famous Dance of the seven veils. In… … Wikipedia
Tremble dance — A tremble dance is a dance performed by receiver honey bees of the species Apis mellifera to recruit more receiver honey bees to collect nectar from the workers. [Ratnieks, F. L. W. (2001) Are you being served? Supermarkets and bee hives. The… … Wikipedia
Round dance (honey bee) — Round dance is a term used in bee keeping, entomology and communication of social insects. It describes a specific communicative behavior of honey bees inside the bee hive. When a forager or scout bee returns to the hive she performs a round… … Wikipedia
Bee learning and communication — Honey bees learn and communicate in order to find food sources and for other means. LearningLearning is essential for efficient foraging. Honey bees are unlikely to make many repeat visits if a plant provides little in the way of reward. A single … Wikipedia
Charles F. Hockett — Charles Francis Hockett Born January 17, 1916(1916 01 17) Columbus, Ohio, United States Died November 3, 2000( … Wikipedia
Bees algorithm — The Bees Algorithm is a population based search algorithm first developed in 2005. [Pham DT, Ghanbarzadeh A, Koc E, Otri S, Rahim S and Zaidi M. The Bees Algorithm. Technical Note, Manufacturing Engineering Centre, Cardiff University, UK, 2005]… … Wikipedia
Federating Hive with Waggle Dance
Hotels.com has recently contributed the Waggle Dance project to the open-source community. The unusual name is taken from a dance that bees perform for the rest of their colony, helpfully directing them to sources of food. This is pertinent because the purpose of our project is to direct Apache Hive based data consumers to sources of data.
At its core Waggle Dance is a request routing proxy that allows datasets to be concurrently accessed across multiple Hive deployments. It was created to tackle the appearance of dataset silos that arose as our large organization gradually migrated from monolithic on-premises big data clusters, to cloud based platforms. As our brands and teams raced to the cloud, they often setup their own Hive infrastructure to retain their agility and maintain their pace. However the data that these groups then produced could not easily be discovered or accessed by others, potentially limiting its value.
A fundamental component of the Hive data warehousing system is the metastore; a service that is responsible for maintaining metadata for all the datasets in the warehouse. This ‘data-about-data’ typically describes dataset schemas, encodings, the file store locations of the raw data, and also optimization hints such as table and column statistics. Data consumers rely on this metadata to find, understand, and efficiently process their target dataset. The metastore has proved to be so convenient that its use is not limited to Hive and there are mature integrations with many other popular data processing platforms (Spark, Flink, Cascading). In cloud environments the metastore provides another benefit; delivering consistent data access patterns over eventually consistent file stores. However, in our cloud of multiple Hive deployments, the monolithic and isolated nature of the metastore is extremely limiting. Our consumers need to access and analyse data maintained in multiple metastores from a single Hive instance and Hive’s architecture does not support this.
Waggle Dance solves this by providing a unified end point with which you can describe, query, and join tables that may exist in multiple distinct Hive deployments. Such deployments may exist in disparate regions, accounts, or even clouds (security, network, and the laws of physics permitting). Dataset access is not limited to the Hive query engine, and should work with any Hive metastore enabled platform. We’ve been successfully using it with Apache Spark for example.
We also use Waggle Dance to apply a simple security layer to cloud based platforms such as Qubole, DataBricks, and EMR. These currently provide no means to construct cross platform authentication and authorization strategies. Therefore we use a combination of Waggle Dance and network configuration to restrict writes and destructive Hive operations to specific user groups and applications.