Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 5405. Отображено 200.
08-06-2018 дата публикации

Самодиагностируемая бортовая вычислительная система с резервированием замещением

Номер: RU2657166C1

Изобретение относится к вычислительной технике и может быть использовано в системах различного назначения, где требуется высокая надежность и радиационная стойкость. Техническим результатом является сокращение времени задействования резервной системы, находящейся в выключенном состоянии, при одновременном обеспечении высокой надежности, отказоустойчивости и радиационной стойкости. В самодиагностируемую бортовую вычислительную систему, содержащую основную систему, введена аналогичная резервная система, каждая из систем имеет два идентичных канала основной и резервный, в каждый канал введена схема подключения вторичного питания, устройство резервирования. В каждом канале входы устройства резервирования подключены к выходу процессора, к выходу системного генератора, к выходу схемы начальной установки. Выход процессора подключен к входу коммутатора, второй вход которого подключен к выходу схемы начальной установки. Выход источника вторичного питания подключен к входу схемы подключения вторичного ...

Подробнее
10-10-2012 дата публикации

Data replication for a backup entity

Номер: GB0002485866B

Подробнее
05-07-2000 дата публикации

Fault detection in a redundant multi-processor system

Номер: GB0002345364A
Принадлежит:

A Triple Modular Redundancy (TMR) unit 10 comprises a plurality of processors connected by a bus and simultaneously executing the same processing operation. One of the processors is a master processor and the remaining processors are slaves. Information formed only by the master processor is outputted to the bus. Each processor has a multiplex control circuit 48 which compares the output information formed by the respective processor with the information outputted to the bus, thereby detecting a failure and allowing an internal circuit 46 to execute necessary processes. Various aspects of the TMR unit are described, but the claims relate to using an existence processor display flag circuit (340, Fig. 27A) indicating which processor(s) is/are normally operating among the plurality of processors constructing the multiplex unit and which processor(s) is/are disconnected from the multiplex unit due to a failure or the like.

Подробнее
17-08-1983 дата публикации

COMMUNICATION SYSTEM

Номер: GB0008319223D0
Автор:
Принадлежит:

Подробнее
18-03-2020 дата публикации

Gaming table monitoring apparatus

Номер: GB0202001498D0
Автор:
Принадлежит:

Подробнее
16-10-1984 дата публикации

DISTRIBUTED SIGNAL PROCESSING SYSTEM

Номер: CA0001176337A1
Автор: WORKS GEORGE A
Принадлежит:

Подробнее
20-05-1986 дата публикации

METHOD AND APPARATUS FOR THE SELECTION OF REDUNDANT SYSTEM MODULES

Номер: CA0001204875A1
Автор: TOWNSEND GREG M
Принадлежит:

Подробнее
30-03-1993 дата публикации

FAULT-TOLERANT OUTPUT CIRCUITS

Номер: CA0001315407C
Принадлежит: TRIPLEX

FAULT-TOLERANT OUTPUT CIRCUITS Circuit modules for providing digital or analog outputs from computational devices in such a manner that the components of the output circuit modules are tolerant of malfunctions in one or more of the components. In the digital output embodiment of the invention, output signals are independently derived using two voting circuits and are then applied to two switches connected in series to provide a fail-safe condition for most types of failure of the switches or the voting circuits. Two identical modules provide the ability to faithfully follow commanded on or off signals in all but a statistically small number of situations, and permit convenient replacement of a defective module without affecting output through the other module. In an analog output module, two independent voting circuits provide voted digital outputs to separate digital-to-analog converters, the outputs of which are compared to generate a validity signal that is used to control an output ...

Подробнее
06-08-1974 дата публикации

FAULT DETECTION AND HANDLING ARRANGEMENTS FOR USE IN DATA PROCESSING SYSTEMS

Номер: CA952627A
Автор:
Принадлежит:

Подробнее
25-10-1983 дата публикации

CLUSTER OF DATA-ENTRY TERMINALS

Номер: CA0001155961A1
Принадлежит:

Подробнее
21-02-2008 дата публикации

MATCH SERVER FOR A FINANCIAL EXCHANGE HAVING FAULT TOLERANT OPERATION

Номер: CA0002659395A1
Принадлежит:

Fault tolerant operation (104) is disclosed for a primary match server of a financial exchange using an active copy-cat instance, a.k.a, backup match server, that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. F ault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has bee n processed. The outputs of the backup match server are then compared with t he outputs of the primary match server to ensure correct operation. The disc losed embodiments further relate to fault tolerant failover mechanism allowi ng the backup match server to take over for the primary match server in a fa ult situation wherein the primary and backup match servers are loosely coupl ed, i.e, they need not be aware that they are operating in a fault tolerant environment. As such, the primary match server need not be specifically desi gned or ...

Подробнее
02-12-1991 дата публикации

METHOD FOR MODIFYING A FAULT-TOLERANT PROCESSING SYSTEM

Номер: CA0002043555A1
Автор: REYNDERS, PAUL T. M.
Принадлежит:

A method for modifying a fault-tolerant processing system (FTS) including a pair of partner sets of two processors (PA1/PA2; PB1/PB2) operating in microsynchronization at a first or low processing frequency (FL) and connected to a respective system bus (BA; BB) operating at a bus clock frequency (FB) lower than the first processing frequency (FL). The method consists in: - selecting the system bus (BA) associated to one of the sets of "slow" processors (PA1/PA2); - replacing the other set of "slow" processors (PB1/PB2) by a set of "fast" processors (PB1'/PB2'); - synchronizing the operation of the remaining "slow" set with that of the "fast" set by: - executing by each set (PA1/PA2; PB1'/PB2') one processor cycle during a first cycle (T1) of the bus clock frequency (FB) and generating a synchronization signal (SA1; SB1) at the end of this first bus clock cycle; - executing during each following bus clock cycle (T2-T6) an additional processor cycle and generating a synchronization signal ...

Подробнее
20-06-1995 дата публикации

METHOD FOR MODIFYING A FAULT-TOLERANT PROCESSING SYSTEM

Номер: CA0002043555C
Принадлежит: ALCATEL NV, ALCATEL N.V.

A method for modifying a fault-tolerant processing system (FTS) including a pair of partner sets of two processors (PA1/PA2; PB1/PB2) operating in microsynchronization at a first or low processing frequency (FL) and connected to a respective system bus (BA; BB) operating at a bus clock frequency (FB) lower than the first processing frequency (FL). The method consists in: - selecting the system bus (BA) associated to one of the sets of "slow" processors (PA1/PA2); - replacing the other set of "slow" processors (PB1/PB2) by a set of "fast" processors (PB1'/PB2'); - synchronizing the operation of the remaining "slow" set with that of the "fast" set by: - executing by each set (PA1/PA2; PB1'/PB2') one processor cycle during a first cycle (T1) of the bus clock frequency (FB) and generating a synchronization signal (SA1; SB1) at the end of this first bus clock cycle; - executing during each following bus clock cycle (T2-T6) an additional processor cycle and generating a synchronization signal ...

Подробнее
08-12-1996 дата публикации

FAIL-FAST, FAIL-FUNCTIONAL, FAULT-TOLERANT MULTIPROCESSOR SYSTEM

Номер: CA0002178392A1
Принадлежит:

A multiprocessor system includes a number of subprocessor systems, each substantially identically constructed, and each comprising a central processing unit (CPU), and at least one I/O device, interconnected by routing apparatus that also interconnects the sub-processor systems. A CPU of any one of the sub-processor systems may communicate, through the routing elements, with any I/O device of the system, or with any CPU of the system. Communications between I/O devices and CPUs is by packetized messages. Interrupts from I/O devices are communicated from the I/O devices to the CPUs (or from one CPU to another CPU) as message packets. CPUs and I/O devices may write to, or read from, memory of a CPU of the system. Memory protection is provided by an access validation method maintained by each CPU in which CPUs and/or I/O devices are provided with a validation to read/write memory of that CPU, without which memory access is denied.

Подробнее
23-03-2007 дата публикации

PROCESS OF CONTROL OF the Good performance Of a CALCULATOR

Номер: FR0002891069A1
Автор: ROUSSEL PIERRE
Принадлежит:

La présente invention est relative aux calculateurs exécutant en temps partagé, sous le contrôle de leurs systèmes d'exploitation, plusieurs programmes applicatifs distincts et indépendants. Elle concerne notamment, les réseaux de calculateurs embarqués de type IMA exécutant des programmes applicatifs écrits indépendamment des caractéristiques matérielles des calculateurs et ne résidant pas de façon permanente dans les calculateurs. Elle consiste à associé au coeur numérique de chaque calculateur du réseau un automate de surveillance à fonctionnement indépendant et à faire contrôler par l'automate de surveillance le bon suivi par le calculateur associé de l'ordonnancement temporel des tâches et allocations de partitions mémoire. En outre, les automates de surveillance peuvent être configurés pour exécuter des applications de service de surveillance de type rendez-vous manqué ou chien de garde auxquelles peuvent s'abonner les programmes applicatifs exécutés par les calculateurs du réseau ...

Подробнее
14-06-2013 дата публикации

PROCESS AND COMPUTER PROGRAM OF MANAGEMENT OF MULTIPLE BREAKDOWNS IN an COMPUTER INFRASTRUCTURE INCLUDING/UNDERSTANDING OF the EQUIPMENT HAVE HIGH AVAILABILITY.

Номер: FR0002984053A1
Принадлежит: BULL SAS

L'invention a notamment pour objet la gestion de pannes multiples dans une infrastructure informatique comprenant des équipements à haute disponibilité dont certains forment un premier groupe à haute disponibilité auquel est associé un premier niveau. Des sous-ensembles d'équipements de ce premier groupe forment des second groupes à haute disponibilité auxquels est associé un second niveau. Après avoir détecté (605) une défaillance d'un équipement du premier groupe, une solution de haute disponibilité est recherchée (610) dans des groupes auxquels est associé le premier niveau et comprenant l'équipement défaillant. Si aucune solution n'est identifiée (620), une solution est recherchée (630, 610) dans des groupes auxquels est associé le second niveau et comprenant l'équipement défaillant.

Подробнее
28-05-2020 дата публикации

APPARATUS AND METHOD FOR REDUCING REPEATED ACCESS TO THE SAME BLOCK OF THE MEMORY SYSTEM DURING RECEOVERY PROCEDURE

Номер: KR1020200058867A
Автор:
Принадлежит:

Подробнее
09-07-2009 дата публикации

METHODS AND SYSTEMS FOR GENERATING AVAILABILITY MANAGEMENT FRAMEWORK (AMF) CONFIGURATIONS

Номер: WO2009083827A1
Принадлежит:

Techniques for generating a system model for use by and availability management framework (AMF) are described. Inputs are received, processed and mapped into outputs which are further processed into a configuration file in an Information Model Management 5 (IMM) Service eXternal Markup Language (XML) format which can be used as a system model by an AMF.

Подробнее
01-04-1999 дата публикации

INTEGRATED INTERFACE FOR WEB BASED CUSTOMER CARE AND TROUBLE MANAGEMENT

Номер: WO1999015975A1
Принадлежит:

A system and method for opening and tracking trouble tickets over the public Internet. A customer service management system (40) provides information included within a customer profile record to a Web enabled infrastructure (30) which is accessible by a remote customer workstation (20) having a web browser (14) and Internet access (15). The customer profile information is used to prepopulate data fields in dialogs used to open a trouble ticket. Once a trouble ticket is opened, the customer workstation (20) tracks the existing trouble tickets through a browser based graphical user interface (240). The graphical user interface (240) provides current and historical status reports of the actions taken to resolve a network event and the service organizations responsible for resolving the network event.

Подробнее
28-05-2019 дата публикации

Read descriptors at heterogeneous storage systems

Номер: US0010303795B2
Принадлежит: Amazon Technologies, Inc., AMAZON TECH INC

In response to a read request directed to a first data store of a storage group, a state transition indicator is identified, corresponding to a modification that has been applied at the data store before a response to the read is prepared. A read descriptor that includes the state transition indicator and read repeatability verification metadata is prepared. The metadata can be used to check whether the read request is a repeatable read. The read descriptor is transmitted to a client-side component of the storage group.

Подробнее
11-04-2017 дата публикации

Biasing active-standby determination

Номер: US0009619349B2

In computing systems that provide multiple computing domains configured to operate according to an active-standby model, techniques are provided for intentionally biasing the race to gain mastership between competing computing domains, which determines which computing domain operates in the active mode, in favor of a particular computer domain. The race to gain mastership may be biased in favor of a computing domain operating in a particular mode prior to the occurrence of the event that triggered the race to gain mastership. For example, in certain embodiments, the race to mastership may be biased in favor of the computing domain that was operating in the active mode prior to the occurrence of an event that triggered the race to gain mastership.

Подробнее
06-07-2017 дата публикации

CONTROL AND ADDRESS REDUNDANCY IN STORAGE BUFFER

Номер: US20170192842A1
Принадлежит: Arteris, Inc.

A system and method for detecting writes of data to errant locations in storage arrays. Address information and information redundant with address information is encoded and stored in proximity with data. Upon reading the stored data, the corresponding address information is decoded and compared to the address of the intended read. A mismatch indicates a possible write to an errant location.

Подробнее
04-01-1994 дата публикации

Multi processor sorting network for sorting while transmitting concurrently presented messages by message content to deliver a highest priority message

Номер: US0005276899A
Автор:
Принадлежит:

A multiprocessor system intercouples the processors with an active logic network having a plurality of priority determining nodes. Messages applied concurrently to the network in groups are sorted, using the data content of the messages, to a single or common priority message which is distributed to all the processors with a predetermined total network delay time. Losing messages are again retried concurrently in groups at a later time. Message routing is determined by local acceptance or rejection of messages at the processors, based upon destination data in the messages. All messages occupy places in a coherent priority scheme and are transferred in contending groups with prioritization on the network. Using data, status, control and response messages, and different multiprocessor modes, the system is particularly suited for configuration in a relational data base machine having capability for maintaining an extended data base and handling complex queries.

Подробнее
25-08-2016 дата публикации

APPLICATION CACHE REPLICATION TO SECONDARY APPLICATION(S)

Номер: US20160246719A1
Принадлежит:

Replicating a primary application cache that serves a primary application on one network node into a secondary application cache that serves a secondary application on a second network node. Cache portions that are within the primary application cache are identified, and then identifiers (but not the cache portions) are transferred to the second network node. Once these identifiers are received, the cache portions that they identify may then be retrieved into the secondary application caches. This process may be repeatedly performed such that the secondary application cache moves towards the same state as the primary application cache though the state of the primary application cache also changes as the primary application operates by receiving read and write requests.

Подробнее
21-05-2009 дата публикации

FAULT-TOLERANT DISTRIBUTED SERVICES METHODS AND SYSTEMS

Номер: US2009132716A1
Принадлежит:

Methods and apparatuses are provided for use in fault-tolerant distributed services. One method includes establishing a plurality of server processes each associated with different non-overlapping subspace range of a distributed data structure, associating a data object with a corresponding server process based, at least in part, on mapping the data object to the subspace range associated with the server process, and manipulating the data object using the server processes.

Подробнее
06-06-2019 дата публикации

MANAGING BIG DATA ON DOCUMENT BASED NoSQL DATABASES

Номер: US20190171532A1
Принадлежит:

A document management system including a document manager connected to a document storage, and a backup coordinator connected to the document manager and adapted to: continuously receive a plurality of notifications from the document manager, each including information describing a change in a document stored in the document manager's document storage; and for each notification in the plurality of notifications: select a backup agent from a plurality of backup agents connected to the backup coordinator; send a backup request to the backup agent including the information, for the purpose of updating a copy of the document stored in a backup storage connected to the backup agent; wait for an acknowledgement message from the backup agent; and send another backup request to another backup agent selected from the plurality of backup agents upon a failure to receive the acknowledgement message within an identified amount of time after sending the backup request.

Подробнее
01-05-2018 дата публикации

Node, arithmetic processing device, and arithmetic processing method

Номер: US0009959173B2
Принадлежит: FUJITSU LIMITED, FUJITSU LTD

A node includes: an arithmetic processing device; and a first memory, wherein the arithmetic processing device includes: a processor core; a storing circuit to store a first failure node list in which first information indicating that a failure has occurred or second information indicating that no failure has occurred is set for each of nodes; a request issuing circuit to issue a first request to a second memory provided at a first node among the nodes; a setting circuit to set the first information for the first node in the first failure node list when the first request has timed out; and an issuance inhibition circuit to inhibit, based on a second request to the second memory from the processor core, the second request from being issued by the request issuing circuit when the first information is set for the first node in the first failure node list.

Подробнее
23-01-2007 дата публикации

Transaction processing apparatus and method

Номер: US0007168001B2

A method of processing a transaction includes processing a transaction workload in a primary process pair on a first node in a cluster of nodes, the processing using at least one stable storage volume for storing a database and another stable storage volume for storing a log, the at least one stable storage volume and the log storage volume forming a log storage group. The method further includes performing checkpointing operations via the network from the primary process pair to a backup process pair while processing the transaction workload, the backup process pair operating on a second node in the cluster of nodes. The method further includes detecting a failure making the first node inoperable or inaccessible, and after detecting the failure, engaging the backup process pair to take over the transaction processing workload of the primary process pair, the backup process pair being configured to operate with the log storage group used by the primary process pair on the failed node.

Подробнее
10-06-2021 дата публикации

HOSTING VIRTUAL MACHINES ON A SECONDARY STORAGE SYSTEM

Номер: US20210173698A1
Принадлежит:

At least a portion of a virtual machine is hosted on at least one node of a first subset of a plurality of nodes of a secondary storage system. The virtual machine comprises a plurality of portions that can be distributed between the plurality of nodes and is configured into a first state of a plurality of states, such that, in the first state, the plurality of portions is distributed between a first subset of the plurality of nodes and each of the first subset of nodes stores a portion of the virtual machine in its corresponding storage device. A node from the second subset of the plurality of nodes to host the virtual machine in a second state of the plurality of states is selected based on at least one of storage, memory or processing resources of one or more nodes of a second subset of the plurality of nodes.

Подробнее
20-07-2016 дата публикации

システム冗長化確認方法及び計算機システム

Номер: JP0005955977B2
Принадлежит:

Подробнее
27-08-2001 дата публикации

НЕДОРОГОЙ ИМЕЮЩИЙ ВЫСОКУЮ НАДЕЖНОСТЬ КОМПЛЕКС ЭЛЕКТРОННОЙ АППАРАТУРЫ С МОДУЛЬНОЙ АРХИТЕКТУРОЙ, ПРЕДНАЗНАЧЕННЫЙ ДЛЯ ПИЛОТИРОВАНИЯ ЛЕТАТЕЛЬНЫХ АППАРАТОВ

Номер: RU99120177A
Принадлежит:

... 1. Комплекс электронной аппаратуры с модульной архитектурой, предназначенный для управления промышленным процессом, включающий, с одной стороны, приемные блоки (1, 2), содержащие модули (24, 25) сбора данных и модули (22-24) обработки данных, которые питаются энергией от модулей (21) питания, и, с другой стороны, дисплеи (3-7) отображения критической информации, которые соединены передающими устройствами с датчиками (8-11) критических данных, датчиками (13) некритических данных и исполнительными механизмами (12), отличающийся тем, что датчики (8-11) критических данных передают, с одной стороны, критическую информацию непосредственно на дисплеи (3-7) отображения критической информации, а, с другой стороны, в модули (24, 25) сбора и обработки данных приемных блоков (1, 2), при этом модули (24, 25) сбора данных на основе информации от датчиков критической (8-11) и некритической (13) информации передают по многоканальной шине (18, 19) последовательной передачи цифровых сигналов некритические ...

Подробнее
30-05-2012 дата публикации

Replication of state data of processing tasks from an active entity to a backup entity

Номер: GB0002485866A
Принадлежит:

An active entity executes a number of processing tasks. Each task persists for a finite time period. A backup entity is configured to take over processing from the active entity if the active entity fails. When a trigger indicates that the state of the tasks on the active entity should be replicated on the backup entity, any new processing tasks are replicated between the entities. Tasks that are already executing on the active entity are not replicated until after a delay period. Then any of those tasks that are still executing are replicated. The delay may be dependent on the average duration of the tasks or on the resource usage on the entities. The delay period may change and may be different for different tasks. The tasks may be handling telephone calls and may be SIP processes.

Подробнее
06-03-2013 дата публикации

Migrating virtual machines among networked servers upon detection of degrading network link operation

Номер: GB0002494325A
Принадлежит:

Migrating virtual machines among networked servers, the servers coupled for data communications with a data communications network that includes a networking device, where migrating includes: establishing, by a virtual machine management module ('VMMM'), one or more virtual machines on a particular server; querying, by the VMMM, the networking device for link statistics of a link coupling the network device to the particular server for data communications; determining, by the VMMM in dependence upon the link statistics, whether the link coupling the network device to the particular server is degrading; and if the link coupling the network device to the particular server is degrading, migrating a virtual machine executing on the particular server to a destination server. In some embodiments, migrating occurs is carried out only if non-degrading link is available. If no non-degrading links are available, the network device, rather than the link, may be failing.

Подробнее
05-03-1986 дата публикации

A MULTIPLE DATA PROCESSING SYSTEM

Номер: GB0002122393B

Подробнее
09-02-2000 дата публикации

Information processing system

Номер: GB0009929773D0
Автор:
Принадлежит:

Подробнее
25-03-2004 дата публикации

Preserving consistency of passively-replicated non-deterministic objects

Номер: AU0000771514B2
Принадлежит:

Подробнее
26-05-1992 дата публикации

OPERATIONS CONTROLLER FOR A FAULT TOLERANT MULTIPLE NODE PROCESSING SYSTEM

Номер: CA0001301938C
Принадлежит: ALLIED CORP, ALLIED CORPORATION

An operations controller (12) for a multiple node fault tolerant processing system having a transmitter (30) for transmitting inter-node messages, a plurality of receivers (32a...32n), each receiving inter-node messages from only one of the nodes and a message checker (34) for checking each received message for physical and logical errors. A fault tolerator (36) assembles all of the errors detected and decides which nodes are faulty based on the number and severity of the detected errors. A voter (38) generates a voted value for each value which is received from the other nodes which is stored in a data memory (42) by a task communicator (44). A scheduler (40) selects the tasks to be executed by an applications processor (14) which is passed to the task communicator (44). The task communicator (44) passes the selected task and the data required for the execution of that task to the applications procesor (14) and transmits the data resulting from that task to all of the nodes in the system ...

Подробнее
30-09-2020 дата публикации

COMPUTERIZED SYSTEM INTERLOCKING AND METHOD OF ITS RESERVE SWITCHING

Номер: EA0202091031A1
Автор:
Принадлежит:

Подробнее
15-01-1982 дата публикации

SYSTEME DE TRAITEME NT DE SIGNAUX DISTRIBUE

Номер: FR0002486682A
Автор: GEORGE ALLEN WORKS
Принадлежит:

SYSTEME DE TRAITEMENT DE SIGNAUX, CARACTERISE EN CE QU'IL COMPREND PLUSIEURS ELEMENTS, COMPRENANT UN PROCESSEUR DE SIGNAL, UNE MEMOIRE DE MASSE, ET UN CONTROLEUR D'ENTREESORTIE; DES MOYENS POUR INTERCONNECTER LESDITS PLUSIEURS ELEMENTS DE FACON A FORMER UN SYSTEME DE TRAITEMENT DE SIGNAUX DISTRIBUES; ET DES MOYENS POUR INTERCONNECTER PLUSIEURS DES SYSTEMES AINSI FORMES.

Подробнее
12-06-1990 дата публикации

Operations controller for a fault tolerant multiple node processing system

Номер: US0004933940A1
Принадлежит: Allied-Signal Inc.

A fault tolerator for an operations controller of a multiple node fault tolerant processing system having a data memory for storing the content of all received error free messages, an error file for storing the content of all received inner node error reports, an error handler for generating a base penalty count for each node based on the content of the errors recorded in the error file and for excluding each node from the operation of the multiple node processing system whose base penalty count exceeds an exclusion threshold. The fault tolerator also includes a synchronizer interface for passing the selected fields of the received messages to a synchronizer, a scheduler interface for passing selected information to a scheduler, and a message interface which stores the error free messages in the data memory and passes the selected fields of the messages to the synchronizer.

Подробнее
21-05-2015 дата публикации

DATA CONFIGURATION AND MIGRATION IN A CLUSTER SYSTEM

Номер: US20150143066A1
Принадлежит:

A cluster system includes a plurality of computing nodes connected to a network. Each node is configured to access its own storage device, and to send and receive input/output (I/O) operations associated with its own storage device. Further, each node of the plurality of nodes may be configured to have a function of acting as a first node, which sends a first message to other nodes of the plurality of nodes. The first message may include configuration information indicative of a data placement of data on the plurality of nodes in the cluster system according to an event. Following receipt of the first message from the first node, each of the other nodes may be configured to determine, based at least in part on the configuration information, whether data stored on its own storage device is affected by the event.

Подробнее
12-10-2017 дата публикации

MANAGEMENT SYSTEM FOR VIRTUAL MACHINE FAILURE DETECTION AND RECOVERY

Номер: US20170293537A1
Принадлежит: NEC Corporation

A Management system 10 includes: resource pools 111-114 which act as the hardware components on which multiple virtual machines are running; an inter-connecting network 12 which connects various resource pools; and a HA manager 13 which snoops all traffic of the inter-connecting network 12 to detect failure of a target VM and triggers corresponded actions when failure is detected.

Подробнее
11-05-2006 дата публикации

Integrated customer web station for web based call management

Номер: US20060098583A1
Принадлежит: WorldCom, INC.

A Web-based call routing management workstation application which allows authorized customers to control toll free routing and monitor call center status. An architecture including one or more web servers located in a firewalled demilitarized zone (DMZ) as communications medium between the customer workstations at the customer sites and the enterprise back-end applications providing the call routing management services, provides a secure infrastructure for accessing the enterprise applications via the otherwise insecure public Internet. The present invention enables creation and management of call by call routing rules by a customer with a workstation having an Internet access and a supported Web browser. The customized rules may be tested and/or debugged via the Web-enabled workstation, using a debugger/tester which runs the routing rules under a simulated environment. In addition, customers may provision hierarchies for their business; create, modify or delete agent pools; manipulate ...

Подробнее
20-05-2004 дата публикации

System and method for reducing user-application interactions to archivable form

Номер: US20040098729A1
Принадлежит:

Systems and methods are disclosed for a distributed computing infrastructure on a computer network comprising a plurality of computers. The distributed computing infrastructure (DCI) provides a software platform for creating, running, and managing distributed applications. DCI may include XML-capable software applications on a peer-to-peer network. DCI may include small, network-unaware applications called peerlets. DCI may include a system and method for creating complex distributed applications using pre-complied binaries. DCI may include a capability for multiple, independent collaborative sessions for distributed collaborative applications (e.g., chat, instant messaging, shared whiteboard, etc.). DCI may include systems and methods for reducing interactions between users and applications to archivable form and then playing back the interactions. DCI may include a system and method for automatic software retrieval on a peer-to-peer network.

Подробнее
02-03-2017 дата публикации

Maintaining High Availability During Network Partitions for Virtual Machines Stored on Distributed Object-Based Storage

Номер: US20170060620A1
Принадлежит:

Techniques are disclosed for maintaining high availability (HA) for virtual machines (VMs) running on host systems of a host cluster, where each host system executes a HA module in a plurality of HA modules and a storage module in a plurality of storage modules, where the host cluster aggregates, via the plurality of storage modules, locally-attached storage resources of the host systems to provide an object store, where persistent data for the VMs is stored as per-VM storage objects across the locally-attached storage resources comprising the object store, and where a failure causes the plurality of storage modules to observe a network partition in the host cluster that the plurality of HA modules do not. In one embodiment, a host system in the host cluster executing a first HA module invokes an API exposed by the plurality of storage modules for persisting metadata for a VM to the object store. If the API is not processed successfully, the host system: (1) identifies a subset of second ...

Подробнее
07-10-2003 дата публикации

Integrated proxy interface for web based report requester tool set

Номер: US0006631402B1
Принадлежит: WorldCom, Inc., WORLDCOM INC, WORLDCOM, INC.

A Web/Internet based reporting system provides a common GUI enabling the requesting, customizing, scheduling and viewing of various types of reports generated by different server applications and/or application platforms. The reporting system includes a report manager, report scheduler and report requestor applications capable of defining, creating, managing and tracking specific reports that are available to customers in accordance with customer entitlements. Metadata messaging employed to enable specific report option presentation, report customization and report execution/scheduling options. A Web-based system infrastructure is provided that enables the acquisition and secure presentation of customer reports to customers from any client browser application.

Подробнее
21-10-2014 дата публикации

Automating diagnoses of computer-related incidents

Номер: US0008868973B2

A technique includes using a computer agent to observe diagnoses of computer-related incidents. Based on the observation, patterns are identified in the diagnoses, and based at least in part on the patterns, the diagnoses are selectively automated.

Подробнее
02-07-2013 дата публикации

Secure customer interface for web based data management

Номер: US0008479259B2

An integrated series of security protocols is disclosed that protect remote user communications with remote enterprise services, and simultaneously protect the enterprises services from third parties. In the first layer, an implementation of the Secure Sockets Layer (SSL) version of HTTPS provides communications security, including authentication of the enterprise web server and the security of the transmitted data. The protocols provide for an identification of the user, and an authentication of the user to ensure the user is who he/she claims to be and a determination of entitlements that the user may avail themselves of within the enterprise system. Session security is described, particularly as to the differences between a remote user's copper wire connection to a legacy system and a user's remote connection to the enterprise system over a "stateless" public Internet, where each session is a single transmission, rather than an interval of time between logon and logoff, as is customary ...

Подробнее
06-02-2020 дата публикации

FAST NON-VOLATILE STORAGE DEVICE RECOVERY TECHNIQUES

Номер: US20200042452A1
Принадлежит:

Techniques are disclosed herein for providing accelerated recovery techniques of a memory device. Such techniques can allow for recovery of the memory device, such as, but not limited to, a flash memory device, following an unexpected reset event.

Подробнее
06-08-2020 дата публикации

HOSTING VIRTUAL MACHINES ON A SECONDARY STORAGE SYSTEM

Номер: US20200249988A1
Принадлежит:

At least a portion of a virtual machine is hosted on at least one node of a first subset of a plurality of nodes of a secondary storage system. The virtual machine comprises a plurality of portions that can be distributed between the plurality of nodes and is configured into a first state of a plurality of states, such that, in the first state, the plurality of portions is distributed between a first subset of the plurality of nodes and each of the first subset of nodes stores a portion of the virtual machine in its corresponding storage device. A node from the second subset of the plurality of nodes to host the virtual machine in a second state of the plurality of states is selected based on at least one of storage, memory or processing resources of one or more nodes of a second subset of the plurality of nodes.

Подробнее
28-08-1991 дата публикации

Message transmission network

Номер: EP0000233993B1
Принадлежит: Teradata Corporation

Подробнее
27-09-2023 дата публикации

USING CLUSTERS TO CREATE TEST INSTANCES

Номер: EP4250122A1
Принадлежит:

Described is a system and method that includes executing, by a processing device, a staging creation service (SCS) to monitor a development of an application in a development environment of the application. The method includes removing, by the SCS, a production cluster of a plurality of production clusters from a production cluster pool in response to one or more development tests of the application having passed in the development environment. Each production cluster of the plurality of production clusters comprises a production environment of the application. The method includes assigning, by the SCS, the production cluster to be a staging cluster comprising a staging environment of the application to perform one or more staging tests of the application.

Подробнее
29-01-2019 дата публикации

СПОСОБ И СИСТЕМА ЧЕРЕДОВАНИЯ ДВУХ БЛОКОВ УПРАВЛЕНИЯ ПОЕЗДА И ПОЕЗД

Номер: RU2678460C1

Изобретение относится к средствам контроля блоков управления. Система содержит модуль записи, для записи числовых идентификаторов включения первого блока управления и второго блока управления поезда, причем первый блок управления представляет собой блок, запускаемый в случае нечетного идентификатора, а второй блок управления представляет собой блок, запускаемый в случае четного идентификатора; модуль получения идентификаторов, выполненный с возможностью получения первого идентификатора первого блока управления и второго идентификатора второго блока управления при запуске поезда; модуль определения определяет, равен ли первый идентификатор второму идентификатору; и исполнительный модуль, выполненный с возможностью прибавления нечетного числа к первому идентификатору и ко второму идентификатору для получения новых идентификаторов, если первый идентификатор равен второму идентификатору, и запуска соответствующего блока управления на основе четности новых идентификаторов. Достигается повышение ...

Подробнее
30-04-1986 дата публикации

METHOD OF APPOINTING AN EXECUTIVE IN A DISTRIBUTED PROCESSING SYSTEM

Номер: GB2166271A
Принадлежит:

A distributed, fault-tolerant, self-repairable, reconfigurable signal processing system with redundant elements comprising signal processors, mass memories and input-output controllers interconnected by redundant busses forming a high reliability system. The input-output controller element has redundant busses for interconnecting multiple fault-tolerant distributed signal processing systems into a network configuration. One signal processor element in a system is initially designated as the executive and assigns processing tasks from a mass memory to the other elements or other systems. When a failure is detected, the executive verifies the failure, isolates the faulty element and reassigns the task to another spare element. If another element is not available, the executive reconfigures the system to permit degraded operation using the available elements. The executive element, itself, is fault monitored by one of the other elements which is capable of assuming the role of executive as ...

Подробнее
16-07-2014 дата публикации

Virtual infrastructure recovery configuration

Номер: GB0201409910D0
Автор:
Принадлежит:

Подробнее
12-10-2011 дата публикации

Data replication for a backup entity

Номер: GB0201114826D0
Автор:
Принадлежит:

Подробнее
15-06-2006 дата публикации

SYNCHRONISATION OF THE DATA PROCESSING IN REDUNDANT DATA PROCESSING UNITS OF A DATA PROCESSING SYSTEM

Номер: AT0000329310T
Принадлежит:

Подробнее
15-04-1987 дата публикации

COMMUNICATION SYSTEM.

Номер: AT0000026515T
Принадлежит:

Подробнее
22-07-2004 дата публикации

A METHOD OF STANDBY AND CONTROLLING LOAD IN DISTRIBUTED DATA PROCESSING SYSTEM

Номер: AU2002357568A1
Принадлежит:

Подробнее
28-11-2019 дата публикации

SYSTEM AND METHOD FOR DEPLOYING A DISTRIBUTED COMPONENT-BASED APPLICATION

Номер: AU2019203092A1
Принадлежит: Murray Trento & Associates Pty Ltd

A system and method for deploying a distributed component-based application is disclosed. The system may include a plurality of uniform base components. Each base component of the plurality of uniform base components may host a respective service component, and may include an input port, an output port, a service port, an error, log, and exception port, a monitoring port, and a control port. A first base component may process event messages asynchronously with a second base component and a third base component. The system and method may also support auto-scalability of each base component. Co uJ > U)< * ui PORT z EXCEP. PORT COTO U)C MONITORING PORT < PORT I - 0 C)~ ui ...

Подробнее
17-01-1985 дата публикации

COMMUNICATION SYSTEM

Номер: AU0003050684A
Принадлежит:

Подробнее
07-02-1987 дата публикации

DUPLEX CENTRAL PROCESSING UNIT SYNCHRONIZATION CIRCUIT

Номер: CA0001217871A1
Принадлежит:

Подробнее
10-11-1992 дата публикации

INTERPROCESSOR SWITCHING NETWORK

Номер: CA0001310134C
Принадлежит: READ EDGAR L, READ, EDGAR L.

A message transport network is provided for high speed switching between processing elements. Clusters of low speed processing elements may be connected to the message transport network through a transport node controller. The transport node controller and the high speed processors are connected to the gateways. A pair of gateways may be connected through a transport interchange node to allow communication between processors associated with the gateways. A transport interchange supervisor maintains a record of the status of each gateway and generates commands to form connection between gateways in the transport interchange node. A maintenance controller and system maintenance processor oversee the validity of the data being passed through the system on paths independent of the data transfer paths. The message transport network may be used in a variety of applications, such as a telephony switch, a signaling transfer point system or a fault tolerant minicomputer.

Подробнее
04-11-2003 дата публикации

FAULT-TOLERANT JAVA VIRTUAL MACHINE

Номер: CA0002294654C

A method for providing a first JVM with support for fault tolerance by using information maintained by the first JVM to checkpoint objects that are created, modified, and/or deleted during the process of responding to an event of a transaction. The checkpointed objects are sent to and stored in a second JVM such that the second JVM is fully capable of continuing the processing of the transaction in the event of the failure of the first JVM.

Подробнее
22-04-1999 дата публикации

INTEGRATED CUSTOMER INTERFACE FOR WEB BASED COMMUNICATIONS NETWORK MANAGEMENT

Номер: CA0002304543A1
Принадлежит:

A web-based, integrated customer interface system (30) for enabling customer management of their communication network assets. A web-based GUI (20) enables a customer to interact with one or more network management resources and telecommunication services. The integrated interface system (30) includes: 1) a customer's network report management; 2) a centralized in-box system for online notifications to client workstation; 3) a real-time network services monitoring system; 4) broadband system for presenting physical and logical views of data networks and performance information; 5) a toll-free network management system enabling customization of 800/8xx toll free number routing; 6) Outbound Network Management (ONM); 7) packet-switched events monitoring; 8) a trouble ticket tool; 9) web-based invoice reporting for access to billing information; 10) web-based call manager; 11) on-line order entry and administrative service; 12) system for handling security and authentication.

Подробнее
01-04-1999 дата публикации

GRAPHICAL USER INTERFACE FOR WEB ENABLED APPLICATIONS

Номер: CA0002304619A1
Принадлежит:

An integrated system of user interfaces (20) is provided for communicating with remote services. A backplane architecture controls and manages the user interfaces by instantiating, launching, overseeing and closing the user interfaces associated with a plurality of applications residing in a plurality of remote servers (24, 26, 28, 31, 32, 34, 52). Each application communicates with one another and with the backplane via messaging interfaces.

Подробнее
01-02-2019 дата публикации

IIC memory chip common circuit, electronic device and standby circuit switch method thereof

Номер: CN0109298980A
Принадлежит:

Подробнее
24-06-2011 дата публикации

MODULATE INPUTS/OUTPUTS FOR SENSORS AND/OR ACTUATORS EXCHANGING OF INFORMATION WITH TWO CENTRAL PROCESSING UNITS.

Номер: FR0002945643B1
Автор: PLECHE PHILIPPE
Принадлежит: ABB AG

Подробнее
20-08-2009 дата публикации

SECURE BUSINESS CONTINUITY AND DISASTER RECOVERY PLATFORM FOR MULTIPLE PROTECTED SYSTEMS

Номер: WO000002009103080A3
Принадлежит:

A data processing system comprises multiple customer premises equipment (CPE) servers at different active sites, each CPE server comprising a local storage unit, each CPE server configured to collect copies of servers, applications or data of the active site at which that CPE server is located and to store the copies in the local storage; a data storage and compute unit that is coupled to the CPE servers and configured to receive the copies, verify the copies, and store the copies in online accessible secure storage that is segregated by business entity; logic operable to receive a request from a particular active site to restore one or more data elements contained in the secure storage of the data storage unit associated with the particular active site, to inflate the data elements, and to provide the particular active site with online access to the data elements that are inflated.

Подробнее
01-09-2005 дата публикации

NONSTOP SERVICE SYSTEM USING VOTING AND, INFORMATION UPDATING AND PROVIDING METHOD IN THE SAME

Номер: WO2005081453A1
Принадлежит:

A nonstop service system using voting and a method for updating and providing information in the nonstop service system. The nonstop service system includes a plurality of groups of nodes for storing and managing information on the basis of identifiers for distinguishing clients, each group including a plurality of nodes each of which is capable of storing and managing information independently. The nonstop service system further includes a control dispatcher server, which is located between the group of nodes and the clients and manages state information and connection information of the nodes belonging to the plurality of groups of nodes. The control dispatcher server selects a group of nodes corresponding to a client according to an information update and provision request from the client, transmits the information update and provision request to the nodes belonging to the selected group of nodes, and, when information is provided from the nodes, provides information which is selected ...

Подробнее
22-08-2002 дата публикации

AUTOMATIC STARTUP OF A CLUSTER SYSTEM AFTER OCCURRENCE OF A RECOVERABLE ERROR

Номер: WO0002065289A1
Принадлежит:

The invention relates to a method for the automatic startup of a cluster (10) after an error has occurred in a node (12, 14) of said cluster (10) that led to a reboot of the node (12, 14). The inventive method is characterized in that it automatically recognizes whether the error can be recovered and the cluster (10) can be automatically started up. The inventive method allows for the automatic return of the cluster (10) to its operation state after occurrence of an error, thereby reducing down-times of the system.

Подробнее
14-12-2006 дата публикации

Nonstop service system using voting, and information updating and providing method in the same

Номер: US20060282435A1
Автор: Jang Moon, Jung Moon, Chan Chun
Принадлежит:

A nonstop service system using voting and a method for updating and providing information in the nonstop service system. The nonstop service system includes a plurality of groups of nodes for storing and managing information on the basis of identifiers for distinguishing clients, each group including a plurality of nodes each of which is capable of storing and managing information independently. The nonstop service system further includes a control dispatcher server, which is located between the group of nodes and the clients and manages state information and connection information of the nodes belonging to the plurality of groups of nodes. The control dispatcher server selects a group of nodes corresponding to a client according to an information update and provision request from the client, transmits the information update and provision request to the nodes belonging to the selected group of nodes, and, when information is provided from the nodes, provides information which is selected ...

Подробнее
06-11-2018 дата публикации

Operation of I/O in a safe system

Номер: US0010120772B2

A module health system includes a module health circuit comprising a hardware register that is set to a first value in response to the system starting, an application register that is set to the first value in response to the system starting and a watchdog timer register that is set to the first value in response to the system starting. The system further includes a power on self-test that determines whether the system has passed a plurality of tests and that selectively sets the hardware register to a second value based on the determination, an external software application that determines whether a safety critical system is healthy and selectively sets the application register based on the determination, a watchdog timer application that selectively sets the watchdog timer register, a central processing unit that determines whether to de-assert a module health signal.

Подробнее
13-11-2018 дата публикации

Resiliency to memory failures in computer systems

Номер: US0010127109B2
Принадлежит: Cray, Inc., CRAY INC, Cray Inc.

A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.

Подробнее
14-04-2009 дата публикации

Non-invasive latency monitoring in a store-and-forward replication system

Номер: US0007519736B2

A method for monitoring replica servers in a networked computer system is provided, in which each server in the system has a replica partner vector table that includes state information about other servers in the system. The replica partner vector table includes data fields for storing an update sequence number (USN) and timestamp information that identifies the time of the last update and/or the time last successful replication attempt for each replica server in the system. After each successful replication, the server updates the entries in the replica partner vector to reflect the updated USN and timestamp information. The replica monitoring method evaluates the USN and timestamp entries in the replica partner vector table to determine if any servers in the system are latent. If the monitoring method detects that a server in the system is latent, an alert is generated whereby users and/or a network administrator are informed of the problem.

Подробнее
05-01-2021 дата публикации

Client session reclaim for a distributed storage system

Номер: US0010884863B2
Принадлежит: Red Hat, Inc., RED HAT INC

The technology disclosed herein that may enable a client of a distributed storage system to recover a storage session after a failure occurs. An example method may include: identifying a storage session of a distributed storage service, the storage session comprising session data that corresponds to a storage object of the distributed storage service; providing, by a processing device of a client, an indication that the client is recovering the storage session; and obtaining, by the client, the session data of the storage session from one or more devices that accessed the storage object of the distributed storage service.

Подробнее
14-09-2010 дата публикации

System and method of recovering from failures in a virtual machine

Номер: US0007797587B2

A method and systems for recovering from a failure in a virtual machine are provided. In accordance with one embodiment of the present disclosure, a method for recovering from failures in a virtual machine is provided. The method may include, in a first physical host having a host operating system and a virtual machine running on the host operating system, monitoring one or more parameters associated with a program running on the virtual machine, each parameter having a predetermined acceptable range. The method may further include determining if the one or more parameters are within their respective predetermined acceptable ranges. In response to determining that the one or more parameters associated with the program running on the virtual machine are not within their respective predetermined acceptable ranges, a management module may cause the application running on the virtual machine to be restarted.

Подробнее
01-08-2019 дата публикации

SYSTEMS AND METHODS OF DYNAMIC PORT ASSIGNMENT

Номер: US20190235882A1
Принадлежит:

A system provides a listener application which can be notified about specific ports used by specific instances of a Web Socket application. A Web Socket application opens multiple dynamic ports in certain scenarios with a dynamic context. When an application is executed, a listener application is made aware of the context and port information. A system rewrites a reverse proxy configuration on the fly so that any request coming into the reverse proxy will read the change and assign the correct port. A notification to the listener is received across multiple nodes, and the configuration can be updated on all nodes based on the data provided in the configuration.

Подробнее
07-12-2017 дата публикации

MULTI-CHANNEL CONTROL SWITCHOVER LOGIC

Номер: US20170351233A1
Принадлежит:

A multi-channel control system includes at least a primary control microprocessor and a back-up control microprocessor operable to control a device. The primary control microprocessor and the back-up control microprocessor assert control over a controlled device according to a locally stored method of controlling a back-up microprocessor assumption of control of a device.

Подробнее
10-11-2015 дата публикации

Virtual infrastructure recovery configurator

Номер: US0009183097B2

Systems, methods and procedures to capture, format and process configuration information needed for a Managed Recovery Program (MRP) solution that supports orderly handling of virtual machines in an information technology production environment. An MRP automation package is portable and contains all of the required configuration data to bring virtual infrastructure on line in a recovery environment as well as associated scripts that can be executed to automatically process the configuration data.

Подробнее
05-01-2012 дата публикации

Managing Shared Resources In A Multi-Computer System With Failover Support

Номер: US20120005348A1
Принадлежит: International Business Machines Corp

Managing shared resources in a multi-computer system with failover support, including: reading priority detection signals from a computer inserted into the multiple-computer system, the priority detection signals representing a priority of the inserted computer; reading planar detection signals from the computer, the planar detection signals representing an insertion state of all computers currently inserted into the multiple-computer system; determining if the computer has the highest priority among all the computers inserted into the multiple-computer system in accordance with the priority detection signals and the planar detection signals; and, in response to determining that the computer has the highest priority, monitoring shared resources and outputting a specific output signal associated with the highest priority computer, the specific output signal providing an identification of the highest priority computer to other computers currently inserted into the multiple-computer system and representing control, by the highest priority computer, of the shared resources.

Подробнее
05-01-2012 дата публикации

Simplifying automated software maintenance of data centers

Номер: US20120005520A1
Принадлежит: Oracle International Corp

An aspect of the present invention simplifies software maintenance of nodes in a data center. In one embodiment, a management system receives data specifying a set of commands to be executed on a node in the data center, and then forms a maintenance script by programmatically incorporating instructions for executing the set of commands on the node and to perform a set of management actions. The management system then executes the maintenance script to cause execution of the set of commands on the nodes, thereby performing maintenance of the node. A user/administrator of the data center needs to specify only the commands, thereby simplifying the software maintenance of data centers. According to another aspect, the maintenance scripts (formed by incorporating the commands provided by a user) are executed as part of a disaster recovery process in the data center.

Подробнее
12-01-2012 дата публикации

Match server for a financial exchange having fault tolerant operation

Номер: US20120011391A1
Принадлежит: Chicago Mercantile Exchange Inc

Fault tolerant operation is disclosed for a primary match server of a financial exchange using an active copy-cat instance that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. Fault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has been processed. The outputs of the backup match server are then compared with the outputs of the primary match server to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup match server to take over for the primary match server in a fault situation wherein the primary and backup match servers are loosely coupled. As such, the primary match server need not be specifically designed or programmed to interact with the fault tolerant mechanisms.

Подробнее
02-02-2012 дата публикации

Method and apparatus for managing data of operation system

Номер: US20120030323A1
Автор: Akinori Matsuno
Принадлежит: Fujitsu Ltd

A server for an operation system includes a monitor to monitor a status of another server, a first storage to retain a first network configuration information, a second storage to copy the first network configuration information when an abnormality is detected in the another server, a third storage to retain a first update history information including update information of a network configuration information obtained from a client in the operation system, and an operation configuration manager to update the first network configuration information and a second network configuration information retained in the another server when the another server recovers from the abnormality. The operation configuration manager is configured to update the first network configuration information and the second network configuration information based on the first update history information and a second update history information retained in the another server.

Подробнее
29-03-2012 дата публикации

Application migration and power consumption optimization in partitioned computer system

Номер: US20120079227A1
Автор: Tomohiko Suzuki
Принадлежит: Individual

A storage device including a migration source logical volume of an application copies data stored in the logical volume into a migration destination logical volume of the application. After the copy process is started, the storage device stores data written into the migration source logical volume as differential data without storing the data into the migration source logical volume. When the copy process is completed for the data stored in the migration source logical volume, a management computer starts copying of the differential data, and in a time interval after the copying of the data stored in the migration source logical volume is completed but before the copying of the differential data is completed, a computer being a migration destination of the application is turned ON, thereby reducing power consumption at the time of application migration.

Подробнее
07-06-2012 дата публикации

Validation of access to a shared data record subject to read and write access by multiple requesters

Номер: US20120143836A1
Принадлежит: International Business Machines Corp

According to a method of access to a shared data record subject to contemporaneous read and write access by multiple requesters, a requester reads a shared data record including a payload and a first checksum. The requester calculates a second checksum of the payload of the data record. If the first and second checksums are not equal, the requester again reads the shared data record, including a third checksum, and calculates a fourth checksum of the payload of the shared data record. If the third and fourth checksums are equal, the requester processes the shared data record as valid, and if the second and fourth checksums are equal, the requester handles the shared data record as corrupt.

Подробнее
07-06-2012 дата публикации

Obviation of Recovery of Data Store Consistency for Application I/O Errors

Номер: US20120144233A1
Принадлежит: International Business Machines Corp

Embodiments comprise a plurality of computing devices that dynamically intercept process application I/O errors. Various embodiments comprise two or more computing devices, such as two or more servers, each having access to a shared data storage system. An application may be executing on the first computing device and performing an I/O operation when an I/O error occurs. The first computing device may intercept the I/O error, rather than passing it back to the application, and prevent the error from affecting the application. The first computing device may complete the I/O operation, and any other pending I/O operations not written to disk, via an alternate path, perform a checkpoint operation to capture the state of the set of processes associated with the application, and transfer the checkpoint image to the second computing device. The second computing device may resume operation of the application from the checkpoint image.

Подробнее
26-07-2012 дата публикации

Heuristic approach on checking service instance protection for availability management framework (amf) configurations

Номер: US20120192157A1
Принадлежит: Telefonaktiebolaget LM Ericsson AB

A configuration including Service Instances (SIs) and a list of Service Units (SUs) is to be validated. The SIs are to be allocated to the SUs for protection of the service represented by the SIs. A set of heuristics is applied to determine whether, for each of the SI assignments, the SI can be allocated to one of the SUs whose capacities support the required capacities of the SI. The heuristic then walks the list in order, to find a first SU that supports a current SI. If none of the SUs in the list can support the current SI, the heuristic indicates that the configuration is not validated. In response to a result that at least one of the heuristics in the set indicates the SUs can support all of the SIs, a final result is generated indicating that the configuration is valid.

Подробнее
04-10-2012 дата публикации

Fault detection and recovery as a service

Номер: US20120254652A1
Принадлежит: Microsoft Corp

The monitoring by a monitoring node of a process performed by a monitored node is often devised as a tightly coupled interaction, but such coupling may reduce the re-use of monitoring resources and processes and increase the administrative complexity of the monitoring scenario. Instead, fault detection and recovery may be designed as a non-proprietary service, wherein a set of monitored nodes, together performing a set of processes, may register for monitoring by a set of monitoring nodes. In the event of a failure of a process, or of an entire monitored node, the monitoring nodes may collaborate to initiate a restart of the processes on the same or a substitute monitored node (possibly in the state last reported by the respective processes). Additionally, failure of a monitoring node may be detected, and all monitored nodes assigned to the failed monitoring node may be reassigned to a substitute monitoring node.

Подробнее
25-10-2012 дата публикации

Partial fault processing method in computer system

Номер: US20120272091A1
Принадлежит: HITACHI LTD

As regards a hardware fault which has occurred in a computer, a hypervisor notifies an LPAR which can continue execution, of a fault occurrence as a hardware fault for which execution can be continued. Upon receiving the notice, the LPAR notifies the hypervisor that it has executed processing to cope with a fault. The hypervisor provides an interface for acquiring a situation of a notice situation. It is made possible to register and acquire a situation of coping with a hardware fault allowing continuation of execution through the interface, and it is made possible to make a decision as to the situation of coping with a fault in the computers as a whole.

Подробнее
27-12-2012 дата публикации

Adding individual database failover/switchover to an existing storage component with limited impact

Номер: US20120331336A1
Принадлежит: Microsoft Corp

High availability architecture that employs a mid-tier proxy server to route client communications to active data store instances in response to failover and switchover. The proxy server includes an active manager client that interfaces to an active manager in each of the backend servers. State information and configuration information are maintained separately and according to semantics consistent with needs of corresponding data, the configuration information changing less frequently and more available, the state information changing more frequently and less available. The active manager indicates to the proxy server which of the data storage instances is the currently the active instance. In the event that the currently active instance is inactive, the proxy server selects a different backend server that currently hosts the active data store instance. Client communications are then routed to the different backend server with minimal or no interruption to the client.

Подробнее
28-02-2013 дата публикации

Routing traffic after power failure

Номер: US20130054493A1
Принадлежит: VERIZON PATENT AND LICENSING INC

A device, of a first data center, detects a power failure associated with the first data center when the first data center stops receiving power. The device further identifies a second data center for traffic to be processed by the first data center. The device also receives the traffic and routes the traffic to the second data center.

Подробнее
28-03-2013 дата публикации

Distributed job scheduling in a multi-nodal environment

Номер: US20130080824A1
Принадлежит: International Business Machines Corp

Techniques are described for decentralizing a job scheduler in a distributed system environment. Embodiments of the invention may generally include receiving a job to be performed by a multi-nodal system which includes a cluster of nodes. Instead of a centralized job scheduler assigning the job to a node or nodes, each node has a job scheduler which scans a shared-file system to determine what job to execute on the node. In a job requiring multiple nodes, one of the nodes that joined the multi-nodal job becomes the primary node which then assigns and monitors the job's execution on the multiple nodes.

Подробнее
25-04-2013 дата публикации

METHOD FOR SWITCHING A NODE CONTROLLER LINK, PROCESSOR SYSTEM, AND NODE

Номер: US20130103975A1
Принадлежит: Huawei Technologies Co., Ltd.

Embodiments of the present invention disclose a method for switching an NC link, a processor system, and a node, where the processor system includes more than two nodes capable of communicating with each other, each node includes a node controller NC chip, a host bus adapter HBA apparatus, and at least one CPU, the NC chip is connected to each CPU in a node where the NC chip is located, and the HBA apparatus is connected to each CPU in a node where the HBA apparatus is located; an NC link borne by the NC chip is corresponding to an HBA link borne by the HBA apparatus. By using an HBA apparatus to deploy a redundant link, the cost of deploying the redundant link is reduced effectively under a premise of ensuring the reliability of the processor system. 1. A method for switching a node controller link in a processor system having more than two nodes , the method comprising:the more than two nodes in the processing system communicating with each other via a corresponding pair of node controller (NC) link and a host bus adapter (HBA) link in each node, wherein each node comprises a single node controller (NC) chip, a single host bus adapter (HBA) apparatus, and one or more CPU, wherein the single NC chip and the single HBA apparatus are both connected to each of the one or more CPU in a node where the single NC chip and the single HBA apparatus are both located, wherein the NC link is borne by the single NC chip and the corresponding HBA link is borne by the single HBA apparatus in the node ;detecting for failure occurrence in the single NC chip of the node; andin case if a failure occurrence has been detected in the single NC chip, switching a service on the NC link to the corresponding HBA link.2. The method according to claim 1 , wherein the communicating by the corresponding pair of NC link and the HBA link in each node comprising: presetting a first routing table and a second routing table in each node claim 1 , wherein the first routing table is a routing table of ...

Подробнее
23-05-2013 дата публикации

Mechanism to Provide Assured Recovery for Distributed Application

Номер: US20130132765A1
Принадлежит: CA Inc

A system and method is provided for providing assured recovery for a distributed application. Replica servers associated with the distributed application may be coordinated to perform integrity testing together for the whole distributed application. The replica servers connect to each other in a manner similar to the connection between master servers associated with the distributed application, thereby preventing the replica servers from accessing and/or changing application data on the master servers during integrity testing.

Подробнее
30-05-2013 дата публикации

Multi-core decompression of block coded video data

Номер: US20130136188A1
Принадлежит: FREESCALE SEMICONDUCTOR INC

Apparatus for and a method of decompression of block coded video data in a multi-core processor. The processor cores decode respective coded groups of blocks of video data independently, in parallel and deblock respective decoded groups of blocks of video data independently and in parallel with the decode operations and with other deblock operations.

Подробнее
30-05-2013 дата публикации

Method for switching application server, management computer, and storage medium storing program

Номер: US20130138998A1
Принадлежит: HITACHI LTD

It is provided a management computer which refers to switching level information including switching patterns to be used at a time of switching the first task to the second application server; sets a level of a degree of safety for each of the switching patterns; refers to a stop time for each first task which is allowed upon switching the first task to the second application server; selects one of the switching patterns having a switching time that is shorter than the stop time of the task requirement information which is set to the first task and having the level of the degree of safety that is highest among the switching patterns of the switching level information; stops the second task of the second application server by the selected one of the switching patterns; and then controls the second application server to provide the first task.

Подробнее
06-06-2013 дата публикации

Law breaking/behavior sensor

Номер: US20130144459A1
Автор: Christopher P. Ricci
Принадлежит: FLEXTRONICS AP LLC

Methods and systems for a complete vehicle ecosystem are provided. Specifically, systems that when taken alone, or together, provide an individual or group of individuals with an intuitive and comfortable vehicular environment. The present disclosure builds on integrating existing technology with new devices, methods, and systems to provide a complete vehicle ecosystem.

Подробнее
06-06-2013 дата публикации

Configurable vehicle console

Номер: US20130144463A1
Принадлежит: FLEXTRONICS AP LLC

Methods and systems for a configurable vehicle console are provided. Specifically, a configurable console may comprise one or more displays that are capable of receiving input from a user. At least one of these displays may be removed from the console of a vehicle and operated as a stand-alone computing platform. Moreover, it is anticipated that each one or more of the displays of the console may be configured to present a plurality of custom applications that, when manipulated by at least one user, are adapted to control functions associated with a vehicle and/or associated peripheral devices.

Подробнее
11-07-2013 дата публикации

DUAL-CHANNEL HOT STANDBY SYSTEM AND METHOD FOR CARRYING OUT DUAL-CHANNEL HOT STANDBY

Номер: US20130179723A1
Принадлежит: Beijing Jiaotong University

A dual-channel hot standby system and a method for carrying out dual-channel hot standby, the system comprises a hot standby status management layer including two hot standby management units, an application processing layer including two application processors, and a data communication layer including two communicators; the hot standby status management layer is used for controlling the setting and switching between a active status and a standby status of the two application processors, monitoring the working status of the data communication layer, and carrying out synchronization of the control cycles for the two channels of the system; wherein one of the hot standby management units controls one of the application processors, and together constitute a channel of the system therewith; the data communication layer is used for receiving data from outside, and forwarding the data to the application processing layer. The present invention avoids the occurrence of “dual-channel-active” or “dual-channel-standby” status; ensures synchronization of the control cycles of two channels; reduces the time of the system for responding to breakdowns; meets the real-time requirements; enhances the reliability and availability of the system; and ensures a seamless switching between active and standby statuses. 1. A dual-channel hot standby system , characterized in that , it comprises a hot standby status management layer including two hot standby management units , an application processing layer including two application processors , and a data communication layer including two communicators; the hot standby status management layer is used for controlling the setting and switching between a active status and a standby status of the two application processors , monitoring the working status of the data communication layer , and carrying out synchronization of the control cycles for the two channels of the system; wherein one of the hot standby management units controls one of the ...

Подробнее
18-07-2013 дата публикации

QUERY EXECUTION AND OPTIMIZATION WITH AUTONOMIC ERROR RECOVERY FROM NETWORK FAILURES IN A PARALLEL COMPUTER SYSTEM WITH MULTIPLE NETWORKS

Номер: US20130185588A1

A database query execution monitor determines if a network error or low performance condition exists and then where possible modifies the query. The query execution monitor then determines an alternate query execution plan to continue execution of the query. The query optimizer can re-optimize the query to use a different network or node. Thus, the query execution monitor allows autonomic error recovery for network failures using an alternate query execution. The alternate query execution could also be determined at the initial optimization time and then this alternate plan used to execute a query in the case of a particular network failure. 1. A computer apparatus comprising:a plurality of nodes each having a memory and at least one processor;a database residing in the memory;a plurality of networks connecting the plurality of nodes;a network monitor that periodically monitors the plurality of networks to determine network loading and maintains a network file that contains information about network utilization;a query optimizer and a query to the database residing in the memory;a query execution monitor residing in the memory and executed by the at least one processor, the query execution monitor detecting a network failure during execution of the query and invoking the query optimizer to re-optimize the query to use a different network to execute the query, the query execution monitor detecting poor performance of execution of the query and invoking the query optimizer to re-optimize the query to use a different network to execute the query.2. The computer apparatus of wherein the query execution monitor determines part of the query executed prior to the network failure and then modifies the query to utilize data from the part of the query that executed prior to the network failure.3. The computer apparatus of wherein the network file maintained by the network monitor is used by the query execution monitor and wherein the network file contains network file ...

Подробнее
25-07-2013 дата публикации

Transparent high availability for stateful services

Номер: US20130191831A1
Принадлежит: Brocade Communications Systems LLC

One embodiment of the present invention provides a system. The system includes a high availability module and a data transformation module. During operation, the high availability module identifies a modified object belonging to an application in a second system. A modification to the modified object is associated with a transaction identifier. The high availability module also identifies a local object corresponding to the modified object associated with a standby application corresponding to the application in the second system. The data transformation module automatically transforms the value of the modified object to a value assignable to the local object, including pointer conversion to point to equivalent object of the second system. The high availability module updates the current value of the local object with the transformed value.

Подробнее
01-08-2013 дата публикации

Data transfer and recovery

Номер: US20130198557A1
Автор: Andrew Bensinger
Принадлежит: DSSDR LLC

A backup image generator can create a primary image and periodic delta images of all or part of a primary server. The images can be sent to a network attached storage device and one or more remote storage servers. In the event of a failure of the primary server, an updated primary image may be used to provide an up-to-date version of the primary system at a backup or other system. As a result, the primary data storage may be timely backed-up, recovered and restored with the possibility of providing server and business continuity in the event of a failure.

Подробнее
08-08-2013 дата публикации

Redundant computer control method and device

Номер: US20130205162A1
Принадлежит: Fujitsu Ltd

Disclosed is a non-transitory computer-readable medium storing a program, which causes a computer to execute a sequence of processing. The sequence of processing includes receiving status information by a second server device from a client device, the status information being collected by the client device, and including a status of a first server device and statuses of one or more standby servers configured to operate when the first server device fails, and causing the second server device to operate, when the status information indicates a predetermined first status, as at least one of the first server device and the one or more standby servers in a failure status.

Подробнее
08-08-2013 дата публикации

COMMUNICATING IN A COMPUTER ENVIRONMENT

Номер: US20130205163A1
Принадлежит: TANGOME, INC.

Communicating in a peer-to-peer computer environment. A portion of a communication is received from a first user device at a relay peer, wherein the relay peer is one of a list of potential peers and wherein the first user device and a second user device have disparate CPU power and bandwidth capabilities. The portion of the communication is transcoded to comprise a base layer and an enhanced layer. In one embodiment, transcoding encompasses changing the resolution of the communication. The base layer of the portion of the communication is sent to the second user device from the relay peer. The enhanced layer of the portion of the communication is selectively sent to the second user device depending upon a set of capabilities of the second user device. 1. A computer implemented method for communicating in a peer-to-peer computer environment , said method comprising:receiving a portion of a communication from a first user device at a relay peer, wherein said relay peer is one of a list of potential peers and wherein said first user device and a second user device have disparate CPU power and bandwidth capabilities;transcoding said portion of said communication to comprise a base layer and an enhanced layer;sending said base layer of said portion of said communication to said second user device from said relay peer; andselectively sending said enhanced layer of said portion of said communication to said second user device depending upon a set of capabilities of said second user device.2. The computer implemented method as recited in claim 1 , further comprising:terminating said receiving said portion of said communication from said first user device at said relay peer during said communication;receiving said portion of said communication from said first user device at a second relay peer; andsending said portion of said communication to said second user from said second relay peer.3. The computer implemented method as recited in wherein said relay peer replicates said ...

Подробнее
15-08-2013 дата публикации

Match Server for a Financial Exchange Having Fault Tolerant Operation

Номер: US20130212423A1
Принадлежит: CHICAGO MERCANTILE EXCHANGE INC.

Fault tolerant operation is disclosed for a primary match server of a financial exchange using an active copy-cat instance, a.k.a. backup match server, that mirrors operations in the primary match server, but only after those operations have successfully completed in the primary match server. Fault tolerant logic monitors inputs and outputs of the primary match server and gates those inputs to the backup match server once a given input has been processed. The outputs of the backup match server are then compared with the outputs of the primary match server to ensure correct operation. The disclosed embodiments further relate to fault tolerant failover mechanism allowing the backup match server to take over for the primary match server in a fault situation wherein the primary and backup match servers are loosely coupled, i.e. they need not be aware that they are operating in a fault tolerant environment. 1. A method of providing fault tolerant operation for a primary match server , the method comprising:performing, by a duplicate of the primary match server each operation, of a sequence of operations to be performed by the primary match server, subsequent to the operation being successfully completed by the primary match server or after determination that the primary match server will not successfully do so.2. The method of further comprising:comparing a result of the performance of one of the sequence of operations by the primary match server with a result of the performance of the same one of the sequence of operations by the back up match server and indicating a failure of the backup match server, the primary match server, or a combination thereof, when the results are at least partially different.3. The method of further comprising:preventing the primary match server from completing the at least one of the sequence of operations when at least one of the sequence of operations is not likely to be completed.4. The method of claim 3 , wherein the preventing further ...

Подробнее
15-08-2013 дата публикации

COMPUTER SYSTEM AND BOOT CONTROL METHOD

Номер: US20130212424A1
Принадлежит: Hitachi, Ltd.

When a primary computer is taken over to a secondary computer in a redundancy configuration computer system where booting is performed via a storage area network (SAN), a management server delivers an information collecting/setting program to the secondary computer before the user's operating system of the secondary computer is started. This program assigns a unique ID (World Wide Name), assigned to the fibre channel port of the secondary computer to allow a software image to be taken over from the primary computer to the secondary computer. 1. A boot control method for a computer system having a plurality of computers , a management server that controls said plurality of computers , and a storage device that is shared by said plurality of computers , each computer having a port , a program for each computer is stored in a logical unit of said storage device , the computer system is configured to boot the program for each computer by using a unique ID that is set on the port of each computer , the unique ID that is set on the port of each computer is associated with the program stored in the logical unit , said boot control method comprising the steps of:managing, by said management server, unique IDs assigned to ports of the computers, and delivering a first unique ID assigned to a port of a failed computer to a secondary computer among said plurality of computers;setting, by said secondary computer, the delivered first unique ID on the port of the secondary computer, and notifying the management server of the first unique ID set on the port of the secondary computer;managing, by the management server, managing the notified first unique ID as a unique ID assigned to the port of the secondary computer;accessing, by said secondary computer, a logical unit associated with the failed computer using a logical connection newly created between the secondary computer and the storage device based on the setting of the first unique ID, and booting the program for the failed ...

Подробнее
29-08-2013 дата публикации

Failover Processing

Номер: US20130227339A1
Автор: Lund Christian
Принадлежит: METASWITCH NETWORKS LTD.

A method of providing failover processing between a first element and a second element in a data communications network, the method comprising configuring a first channel and a second channel between the first and second elements, the first and second channels comprising different physical data paths, receiving at the first element, via the first channel, first data signals representative of functioning statuses of the second element, the first channel being configured to allow a non-optimal, partly functioning status of the second element to be communicated to the first element; and receiving at the first element, via the second channel, second data signals representative of functioning statuses of the second element, the second channel being configured to allow a failed functioning status of the second element to be communicated to the first element; and conducting failover processing based on both the first and second data signals. 1. A method of providing failover processing between a first element and a second element , the first element and the second element each being suitable for performing a data processing function in a data communications network , the method comprising:configuring a first channel and a second channel between the first and second elements, said first and second channels comprising different physical data paths;receiving at the first element, via the first channel, first data signals representative of functioning statuses of the second element, the first channel being configured to allow a non-optimal, partly functioning status of the second element to be communicated to the first element;receiving at the first element, via the second channel, second data signals representative of functioning statuses of the second element, the second channel being configured to allow a failed functioning status of the second element to be communicated to the first element; andconducting failover processing based on both the first and second data signals. ...

Подробнее
29-08-2013 дата публикации

Fault tolerant routing in a non-hot-standby configuration of a network routing system

Номер: US20130227340A1
Принадлежит: Google LLC

Methods and systems for facilitating fault tolerance in a non-hot-standby configuration of a network muting system are provided. According to one embodiment, a failover method is provided. One or more processing engines of a network routing system are configured to function as active processing engines, each of which having one or more software contexts. A control blade is contoured to monitor the active processing engines. One or more of the processing engines are identified to function as non-hot-standby processing engines, each of which having no pre-created software contexts corresponding, to the software contexts of the active processing engines. The control blade monitors the active processing engines. Responsive to detecting a fault associated with an active processing engine the active processing engine is dynamically replaced with a non-hot-standby processing engine by creating one or more replacement software contexts within the non-hot-standby processing engine corresponding to those of the active processing engine.

Подробнее
26-09-2013 дата публикации

Facility Control System and Facility Control Method

Номер: US20130253665A1
Автор: Kazuto Mori, Kouichi Ikawa
Принадлежит: Daifuku Co Ltd

A facility control system comprises a selection processing portion that selects, based on a manual operation and when an abnormal condition occurs in a first-layer computer that executes a first-layer program which issues an apparatus operating command to an apparatus controller, whether to cause a second-layer computer to execute the first-layer program that had been executed by the first-layer computer, and a substitute command output processing portion which outputs a substitute command in accordance with selection information selected by the selection processing portion. The second-layer computer executes the first-layer program that had been executed by the first-layer computer in which the abnormal condition occurred based on a substitute command outputted by the substitute command output processing portion.

Подробнее
26-09-2013 дата публикации

STANDBY SYSTEM DEVICE, A CONTROL METHOD, AND A PROGRAM THEREOF

Номер: US20130254588A1
Автор: FUJIEDA Tsuyoshi
Принадлежит:

A standby system device which is connected to an active system device includes a process information sharing unit B and a standby process management unit C. The process information sharing unit B receives active side process information indicating usage of resources of an active system process A operating on the active system device from the active system device The standby process management unit C terminates a standby process A before activating a takeover process D used for taking over processing of the active system process A when a takeover of the active system process is requested on the standby system device the standby process A referring to the active side process information and acquiring resources in such a way that usage of resources of the standby process A is equal to or greater than the usage of resources of the active system process A. 1. A standby system device which is connected to an active system device comprising:a process information sharing unit which receives active side process information indicating usage of resources of an active system process operating on said active system device from said active system device; anda standby process management unit which terminates a standby process before activating a takeover process that is used for taking over processing of said active system process when a takeover of said active system process is requested on said standby system device, said standby process referring to said active side process information and acquiring resources in such a way that usage of resources of said standby process is equal to or greater than said usage of resources of said active system process.2. The standby system device according to claim 1 , wherein said standby process management unit activates said standby process before said takeover of said active system process is requested.3. The standby system device according to claim 1 , wherein said standby process management unit activates said standby process at a time at ...

Подробнее
03-10-2013 дата публикации

CLUSTER MONITOR, METHOD FOR MONITORING A CLUSTER, AND COMPUTER-READABLE RECORDING MEDIUM

Номер: US20130262916A1
Автор: SATO Yoichi
Принадлежит: NEC Corporation

A cluster monitor () controls activation of a business application program and a monitoring agent in a cluster system () that includes a plurality of servers. The cluster monitor () includes a business server identifying unit () that identifies a server on which the business application program is operating among the servers, and an agent server selecting unit () that selects a server for activating the monitoring agent from among the servers based on the identified server. 1. A cluster monitor for controlling activation of a business application program and a monitoring agent in a cluster system including a plurality of servers , comprising:a business server identifying unit that identifies a server on which the business application program is operating from among the plurality of servers; andan agent server selecting unit that selects a server for activating the monitoring agent from among the plurality of servers, based on the identified server.2. The cluster monitor according to claim 1 ,wherein in a case where the monitoring agent is activated on one of the plurality of servers, if a failure occurs in the server on which the monitoring agent is activated, the business server identifying unit identifies, in response to the occurrence of the failure, the server on which the business application program is operating, and the agent server selecting unit selects a server for activating the monitoring agent.3. The cluster monitor according to claim 1 ,wherein in a case where the monitoring agent is activated on one of the plurality of servers, if a failure relating to the business application program occurs and fail-over of the business application program is executed, the business server identifying unit identifies, in response to the execution of the fail-over, a server to take over the business application program due to the fail-over, and the agent server selecting unit selects a server for activating the monitoring agent.4. The cluster monitor according to claim ...

Подробнее
03-10-2013 дата публикации

REDUNDANT SYSTEM CONTROL METHOD

Номер: US20130262917A1
Автор: TAKEMORI Yasushi
Принадлежит: NEC Corporation

The redundant system includes a redundant server of a first system and a redundant server of a second system. The redundant servers of the first system and the second system operate in lockstep. When a failure occurs in the redundant server of the second system, the redundant server of the first system separates the redundant server of the second system in which the failure has occurred and continues the operation, and then prepares for restoration to a duplexed operation with a configuration in which the failed part is fallen back. When the preparation is completed, both redundant servers of the first system and the second system start a lockstep operation from initialization processing by synchronous reset, and resume the duplexed operation with the configuration in which the failed part is fallen back. 1. A control method for a redundant system including a redundant server of a first system and a redundant server of a second system , the redundant server of the first system and the redundant server of the second system operating in lockstep , the method comprising:when a failure occurs in the redundant server of the second system, by the redundant server of the first system, separating the redundant server of the second system in which the failure has occurred and continuing an operation;by the redundant server of the first system, preparing for restoration to a duplexed operation with a configuration in which a failed part is fallen back; andby both the redundant server of the first system and the redundant server of the second system, starting a lockstep operation from initialization processing by synchronous reset, and resuming the duplexed operation with the configuration in which the failed part is fallen back.2. The control method for the redundant system claim 1 , according to claim 1 , whereinthe redundant server of the first system includes a first CPU, a plurality of first memories, and a first fault tolerant control section, and the redundant server of ...

Подробнее
10-10-2013 дата публикации

SERVER MANAGEMENT APPARATUS, SERVER MANAGEMENT METHOD, AND PROGRAM

Номер: US20130268801A1
Автор: Yamato Junichi
Принадлежит: NEC Corporation

A server management apparatus monitors activity state of an active server that provides a service to a client(s) via a plurality of switches, instructs a route control apparatus, managing routing for the plurality of switches, to change a packet forwarding route if there is no reply from the active server; and recognizes that the active server is stopped if there is no reply from the active server after a forwarding route is changed and instructs a standby server to provide the service instead of the active server. 1. A server management apparatus , comprising:a server monitoring unit that monitors activity state of an active server that provides a service to a client(s) via a plurality of switches;a route change instruction unit that instructs a route control apparatus, managing routing for the plurality of switches, to change a packet forwarding route if there is no reply from the active server; anda service provision instruction unit that recognizes that the active server is stopped if there is no reply from the active server after a forwarding route is changed and instructs a standby server to provide the service instead of the active server.2. The server management apparatus according to claim 1 , wherein:the route change instruction unit instructs the route control apparatus to change a packet forwarding route between the client(s) and the active server to a packet forwarding route between the client(s) and the standby server if the route change instruction unit recognizes that the active server is stopped.3. The server management apparatus according to claim 1 , wherein:the server monitoring unit monitors the activity state of the active server via a switch connected to the client(s) with a least hop number among the plurality of switches.4. The server management apparatus according to claim 1 , wherein:the service provision instruction unit instructs the standby server to activate an application program relating to provision of the service if the service ...

Подробнее
31-10-2013 дата публикации

Redundant Automation System and Method for Operating the Redundant Automation System

Номер: US20130290776A1
Принадлежит: SIEMENS AG

A redundant automation system and a method for operating the redundant automation system which is provided with a first subsystem and a second subsystem that each process a control program while controlling a technical process, one of these subsystems operating as a master and the other subsystem operating as a slave, and the slave assuming the function of the master if the master fails such that it becomes possible to dispense with temporally synchronous communication between the participants with regard to the synchronization of the program processing in the two subsystems, thus reducing the communication load.

Подробнее
14-11-2013 дата публикации

SERVER CONTROL AUTOMATION

Номер: US20130305084A1
Автор: MALNATI JAMES
Принадлежит:

Control over servers and partitions within a computer network may be automated to improve response to disaster events within the computer network. For example, a monitoring server may be configured to automatically monitor servers through remote communications sessions. A disaster event may be detected based on information received from the partitions and servers within the network. After a disaster event occurs, the monitoring server may automatically execute a script or take other action to make a backup server or partition available. For example, the monitoring server may stop and deactivate a first partition that has failed, activate a second partition that is a mirror image of the first partition, and start the second partition. 1. A method , comprising:detecting, by a monitoring server, a disaster event affecting a first partition of a first server;deactivating, by the monitoring server, the first partition of the first server;activating, by the monitoring server, a second partition of a second server; andstarting, by the monitoring server, the second partition of the second server.2. The method of claim 1 , further comprising applying claim 1 , by the monitoring server claim 1 , boot settings to the second partition before starting the second partition.3. The method of claim 1 , further comprising selecting claim 1 , by the monitoring server claim 1 , an application to recover before activating the second partition claim 1 , in which the second partition of the second server corresponds to the selected application.4. The method of claim 1 , further comprising communicating claim 1 , by the monitoring server claim 1 , with the second server through a remote communications session.5. The method of claim 1 , further comprising stopping claim 1 , by the monitoring server claim 1 , the first partition of the first server.6. The method of claim 1 , further comprising:activating, by the monitoring server, a third partition of a third server; andstarting, by the ...

Подробнее
14-11-2013 дата публикации

NETWORK TRAFFIC ROUTING

Номер: US20130305085A1
Принадлежит:

A service appliance is installed between production servers running service applications and service users. The production servers and their service applications provide services to the service users. In the event that a production server is unable to provide its service to users, the service appliance can transparently intervene to maintain service availability. To maintain transparency to service users and service applications, service users are located on a first network and production servers are located on a second network. The service appliance assumes the addresses of the service users on the second network and the addresses of the production servers on the first network. Thus, the service appliance obtains all network traffic sent between the production server and service users. While the service application is operating correctly, the service appliance forwards network traffic between the two networks using various network layers. 1. Apparatus comprising a storage medium storing a program for maintaining availability of a first service on a first server to plural client systems via a network , the instructions of the program for:synchronizing a second service provided by a second server with the first service provided by the first server;monitoring availability of the first service;if the first service is unavailable, causing the second service to be substituted in place of the first service and monitoring a third service;if the third service is available and capable of handling access by client systems, causing the third service to synchronize with the second service;monitoring synchronization of the third service with the second service;if the third service is synchronized with the second service, causing the third service to be substituted in place of the second service, such that the third service is responsive to communications from the client systems directed to the first service.2. The apparatus of further comprising the second server claim 1 , the ...

Подробнее
14-11-2013 дата публикации

Automated trouble ticket generation

Номер: US20130305102A1
Автор: James Malnati
Принадлежит: Unisys Corp

Control over servers and partitions within a computer network may be automated to improve response to disaster events within the computer network. For example, a monitoring server may be configured to automatically monitor servers through remote communications sessions. A disaster event may be detected based on information received from the partitions and servers within the network. When a disaster event or events leading to a disaster event are detected, a trouble ticket may be generated. The trouble ticket may also generate an alert displayed to an administrator through a customized hierarchical graphical display. When the administrator is not logged in, messages may be generated to alert the administrator to the problem. The administrator may then log in remotely and respond to the alert.

Подробнее
21-11-2013 дата публикации

Information processing system and method for controlling the same

Номер: US20130311514A1
Принадлежит: HITACHI LTD

An information processing system includes a plurality of edge nodes to provide services relating to files, and a core node communicatively coupled to each of the edge nodes and configured to send or receive data of the files to or from the edge nodes and to manage the data of the files. Any one of the edge nodes is granted a first access right permitting update of the files, whereas any two or more of the edge nodes are granted a second access right to prohibit update of the files. The core node stores the access right granted to each of the edge nodes. When detecting that a failure has occurred in the edge node granted the first access right, the core node sends one of the edge nodes granted the second access right a first instruction to take over the first access right granted to the failed edge node.

Подробнее
21-11-2013 дата публикации

Resiliency to memory failures in computer systems

Номер: US20130311823A1
Принадлежит: Cray Inc

A resiliency system detects and corrects memory errors reported by a memory system of a computing system using previously stored error correction information. When a program stores data into a memory location, the resiliency system executing on the computing system generates and stores error correction information. When the program then executes a load instruction to retrieve the data from the memory location, the load instruction completes normally if there is no memory error. If, however, there is a memory error, the computing system passes control to the resiliency system (e.g., via a trap) to handle the memory error. The resiliency system retrieves the error correction information for the memory location and re-creates the data of the memory location. The resiliency system stores the data as if the load instruction had completed normally and passes control to the next instruction of the program.

Подробнее
05-12-2013 дата публикации

Failover of interrelated services on multiple devices

Номер: US20130326261A1
Автор: Eric Milo Paulson
Принадлежит: VERIZON PATENT AND LICENSING INC

A device may include a network interface for communicating with a failover device, a memory for instructions, and a processor for executing the instructions. The processor may execute the instructions to communicate with the failover device, via the network interface, to fail over the device to the failover device in a cluster by pushing a process on the device to the failover device when a first failover event occurs. The failover device is configured to fail over the device to the failover device by pulling the process on the device on the second device when a second failover event occurs. The device is in the cluster.

Подробнее
19-12-2013 дата публикации

Processor management method

Номер: US20130339632A1
Принадлежит: Fujitsu Ltd

A processor management method includes setting a master mechanism in a given processor among multiple processors, where the master mechanism manages the processors; setting a local master mechanism and a virtual master mechanism in each of processors other than the given processor among the processors, where the local master mechanism and the virtual master mechanism manage each of the processors; and notifying by the master mechanism, the processors of an offset value of an address to allow a shared memory managed by the master mechanism to be accessed as a continuous memory by the processors.

Подробнее
19-12-2013 дата публикации

Recovery of a System for Policy Control and Charging, Said System Having a Redundancy of Policy and Charging Rules Function

Номер: US20130339783A1
Принадлежит: TELEFONAKTIEBOLAGET L M ERICSSON (PUBL)

A first Policy and Charging Rules Function “PCRF” server for recovery of a Policy and Charging Control “PCC” system. The PCC system also has a second PCRF server previously in charge of controlling an Internet Protocol Connectivity Access Network “IP-CAN” session previously established with a UE, and a PCRF-client. The first PCRF server includes a network interface unit of the first PCRF server arranged for receiving a modification request of the IP-CAN session from the PCRF-client after failure of the second PCRF server which was in active mode. The first PCRF server has a PCRF identifier which is shared with the second PCRF server that has failed. The first PCRF server now in active mode. The modification request requesting new rules for the IP-CAN session, including modification data and excluding access data and supported features for the IP-CAN session. The first PCRF server includes a processing unit of the first PCRF server arranged for determining that the IP-CAN session is unknown, and arranged for submitting a request from the network interface unit of the first PCRF server to the PCRF-client to provide all information that the PCRF-client has regarding the IP-CAN session. The information includes all data required to be sent for the IP-CAN session establishment and synchronization data. A Policy and Charging Rules Function “PCRF”-client for recovery of a Policy and Charging Control “PCC” system. Methods for recovery of a Policy and Charging Control “PCC” system with a first Policy and Charging Rules Function “PCRF” server in standby mode, a second PCRF server in active mode, and a PCRF-client, wherein an IP-CAN session is already established with a UE and controlled by the second PCRF server. A computer program embodied on a computer readable medium for recovery of a Policy and Charging Control “PCC” system. 121.-. (canceled)22. A method for recovery of a Policy and Charging Control (PCC) system with a first Policy and Charging Rules Function (PCRF) ...

Подробнее
19-12-2013 дата публикации

Node

Номер: US20130339981A1

To facilitate changing a system configuration and allow having high redundancy in a computer system connecting a plurality of nodes. A node includes a CPU and constitutes a computer system. The node executes one or more processes and including predetermined functions. The node includes a shared memory that stores system information including process information related to each process executed by each node, in a state accessible from each process of its own node. In the node, the system information including the process information related to each process of its own node is multicast to the other nodes. A shared memory control process of the node receives the system information multicast from the other nodes and stores the system information in the shared memory.

Подробнее
16-01-2014 дата публикации

Job management server and job management method

Номер: US20140019613A1
Автор: Yohey Ishikawa
Принадлежит: HITACHI LTD

A job management server for managing a plurality of jobs to be executed by a virtual computer generated on a computer, a job management part to manage information on a job net which configures a plurality of jobs and allocate a plurality of jobs included in a job net to the virtual computer, and a recovery part to monitor an execution status of each of the plurality of jobs included in the job net and perform recovery processing, wherein the job management server is configured to: specify a target job for changing allocation, in a case where a failure has occurred in a first virtual computer to execute a first job included in the first job net; determine a performance of a virtual computer required to execute the target job.

Подробнее
23-01-2014 дата публикации

Systems, Methods and Media for Distributing Peer-to-Peer Communications

Номер: US20140025987A1
Принадлежит: Don Hoffman, Mark Kern, Sean Culhane

Systems and methods for distributing peer-to-peer communications are provided herein. Exemplary methods may include masking identification of two or more client nodes on a communications channel of a peer-to-peer communications network by directing peer-to-peer communications of the two or more client nodes through a proxy node, the proxy node including a disinterested client node relative to the two or more client nodes, the disinterested client node providing network resources to the peer-to-peer communications network.

Подробнее
30-01-2014 дата публикации

Systems and methods for health based spillover

Номер: US20140032750A1
Принадлежит: Citrix Systems Inc

The present solution provides a spillover management technique for virtual servers of an appliance based on health. Using a health based spillover technique, a network appliance may direct requests to a backup or second virtual server upon determining that a predetermined percentage of services being load balanced are down. In this manner, the spillover will occur based on a user controlled determination of a level of services being down to the number of services enabled. Instead of waiting for a last service of a virtual server to be marked down to spillover to another virtual server, the spillover may occur based on a user specified percentage. For example, the appliance may spillover from one virtual server to another virtual server when the number of services marked down relative to the number of enabled services falls below a specified percentage.

Подробнее
06-02-2014 дата публикации

INFORMATION PROCESSING APPARATUS, COMPUTER READABLE STORAGE MEDIUM, AND COLLECTING METHOD

Номер: US20140040663A1
Принадлежит: FUJITSU LIMITED

An information processing apparatus includes a processor, and a memory connected to the processor, that stores a piece of identification information allocated to a physical partition in the information processing apparatus. The processor executes a process including collecting pieces of the identification information that are stored by other information processing apparatuses included in an information processing system. The process includes notifying an operating system of the pieces of the identification information collected at the collecting. 1. An information processing apparatus comprising:a processor; anda memory, connected to the processor, that stores a piece of identification information allocated to a physical partition in the information processing apparatus, wherein the processor executes a process comprising:collecting pieces of the identification information that are stored by other information processing apparatuses included in an information processing system; andnotifying an operating system of the pieces of the identification information collected at the collecting.2. The information processing apparatus according to claim 1 , wherein claim 1 , the notifying includes instructing claim 1 , when a failure has occurred in one of the other information processing apparatuses claim 1 , the operating system to exclude a piece of the identification information collected at the collecting from the failed information processing apparatus from being a target for allocation to the physical partition.3. The information processing apparatus according to claim 1 , wherein the process further comprises selecting claim 1 , as an active information processing apparatus claim 1 , in cooperation with the other information processing apparatuses claim 1 , and in accordance with a predetermined rule claim 1 , a single information processing apparatus from among the information processing apparatuses included in the information processing system.4. The information ...

Подробнее
13-02-2014 дата публикации

SYNCHRONOUS LOCAL AND CROSS-SITE FAILOVER IN CLUSTERED STORAGE SYSTEMS

Номер: US20140047263A1
Принадлежит:

Synchronous local and cross-site switchover and switchback operations of a node in a disaster recovery (DR) group are described. In one embodiment, during switchover, a takeover node receives a failover request and responsively identifies a first partner node in a first cluster and a second partner node in a second cluster. The first partner node and the takeover node form a first high-availability (HA) group and the second partner node and a third partner node in the second cluster form a second HA group. The first and second HA groups form the DR group and share a storage fabric. The takeover node synchronously restores client access requests associated with a failed partner node at the takeover node. 1. A method comprising:receiving, by a takeover node in a first cluster at a first site of a cross-site clustered storage system, a failover request;processing, by the takeover node, the failover request to identify a first partner node in the first cluster and a second partner node in a second cluster at a second site, the first partner node and the takeover node forming a first high-availability (HA) group, the second partner node and a third partner node in the second cluster forming a second HA group, the first HA group and the second HA group forming a disaster recovery (DR) group and sharing a storage fabric with each other; andresuming, by the takeover node, client access requests associated with a failed partner node synchronously at the takeover node.2. The method of claim 1 , further comprising:synchronously replicating, by the takeover node, cache data associated with the takeover node to the first partner node at the first site and the second partner node at the second site during non-failover conditions, wherein the first site and the second site are geographically remote with respect to each other.3. The method of claim 2 , wherein synchronously replicating the cache data comprises synchronously replicating claim 2 , by the takeover node claim 2 , the ...

Подробнее
20-02-2014 дата публикации

Techniques for performing processing for database

Номер: US20140052826A1
Автор: Masahiro Ohkawa
Принадлежит: International Business Machines Corp

Embodiments relate to a method, system and program product for performing data processing. The system includes a plurality of computer servers configured to perform data processing, a client in processing communication with the computer servers and enabled to request data processing from any of the servers and a storing component included in the client for storing information relating to requested data to be processed. A processing component included in each computer server for applying a control lock to data being processed. A reprocessing request component is included in the client for enabling a new server to take over processing of requested data upon failure of previously processing computer server. The computer server obtains information relating to requested data from storing component and information relating to control lock information from the processing component such that the new computer server commences processing at a processing point exactly prior to the failure.

Подробнее
06-03-2014 дата публикации

Task execution & management in a clustered computing environment

Номер: US20140068620A1
Принадлежит: International Business Machines Corp

Machines, systems and methods for task management in a computer implemented system. The method comprises registering a task with brokers residing on one or more nodes to manage the execution of a task to completion, wherein a first broker is accompanied by a first set of worker threads co-located on the node on which the first broker is executed, wherein the first broker assigns responsibility of execution for the task to the one or more worker threads in the first set of co-located worker threads, wherein in response to a failure associated with a first worker thread in the first set, the first broker reassigns the responsibility of execution for the task to a second worker thread in the first set, wherein in response to a failure associated with the first broker, a second broker assigns responsibility of execution for the task to one or more co-located worker threads.

Подробнее
13-03-2014 дата публикации

Verifying processor-sparing functionality in a simulation environment

Номер: US20140074451A1
Принадлежит: International Business Machines Corp

A simulation environment verifies processor-sparing functions in a simulated processor core. The simulation environment executes a first simulation for a simulated processor core. During the simulation, the simulation environment creates a simulation model dump file. At a later point in time, the simulation environment executes a second simulation for the simulated processor core. The simulation environment saves the state of the simulated processor core. The simulation environment then replaces the state of the simulated processor core by loading the previously created simulation model dump file. The simulation environment then sets the state of the simulated processor core to execute processor-sparing code and resumes the second simulation.

Подробнее
20-03-2014 дата публикации

SYSTEM AND METHOD FOR USING REDUNDANCY OF CONTROLLER OPERATION

Номер: US20140082413A1
Автор: Bilich Carlos
Принадлежит: ABB TECHNOLOGY AG

Exemplary embodiments are directed to a system and method for maintaining continuous operation applications in spite of hardware faults, maintenance, or replacement. The system having at least two physically redundant controllers, each controller being configured to achieve at least one of high availability and functional safety and having at least one control unit which actively participates in a control loop, and n redundant units that are kept synchronized in a stand-by mode. The at least two controllers are configured such that software code recorded on a first of the at least two controllers is replicated among others of the at least two controllers. Moreover, each of the at least two controllers include central processing units (CPUs) has a plurality of cores arranged within a single piece of silicon. 1. A system for maintaining continuous operation of applications during hardware faults , maintenance , or replacement the system comprising:at least two physically redundant controllers, each controller being configured to achieve at least one of high availability and functional safety and having at least one control unit which actively participates in a control loop, and n redundant units that are kept synchronized in a stand-by mode,wherein the at least two controllers are configured such that software code recorded on a first of the at least two controllers is replicated among others of the at least two controllers; andeach of the at least two controllers include central processing units (CPUs) having a plurality of cores arranged within a single piece of silicon.2. The system according to claim 1 , comprising:sensors configured to gather information from at least one of equipment and processes under control (EUC), andactuators configured to generate a response to the gathered information.3. The system according to claim 1 , comprising:I/O subsystems configured to collect signals output by the sensors, process the signals, and transmit the processed signals ...

Подробнее
27-03-2014 дата публикации

OPERATING METHOD OF SOFTWARE FAULT-TOLERANT HANDLING SYSTEM

Номер: US20140089731A1
Автор: LEE Kwang Yong

An exemplary embodiment provides an operating method of a software fault-tolerant handling system, and more particularly, to an operating method of a fault-tolerant handling system in which a fault recovery is easy in a fault-tolerant technology of copying with various faults which can occur in a computing device. 1. An operating method of a software fault-tolerance handling system , comprising:setting a mode of a first processor to an active mode and a mode of a second processor to a standby mode between the first and second processors in accordance with a setting reference;performing a self healing function to recover a system error which occurs in the first processor or transmitting a heartbeat message to the second processor when the system error does not occur;generating a diff app_context between a basic app_context for a predetermined process which is being executed in the first processor and an app_context at a current time and transmitting the generated diff app_context to the second processor, after the transmission; andswitching the mode of the first processor to the standby mode and switching the mode of the second processor to the active mode when a current state of the first processor meets a set reference to continuously execute the predetermined process, at the time of transmitting the diff app_context.2. The operating method of a software fault-tolerance handling system of claim 1 , wherein:the mode setting includes,calculating a first mode corresponding to an initial mode value calculated based on an init material stored in a memory of the first processor and calculating a second mode corresponding to a current mode value transmitted from the second processor; andsetting the modes of the first and second processors based on the first and second modes and the first setting reference.3. The operating method of a software fault-tolerance handling system of claim 2 , wherein:in the setting reference,when the first mode is the active mode and the second ...

Подробнее
27-03-2014 дата публикации

THREAD SPARING BETWEEN CORES IN A MULTI-THREADED PROCESSOR

Номер: US20140089732A1

Embodiments relate to thread sparing between cores in a processor. An aspect includes determining that a number of recovery attempts made by a first thread on the first core has exceeded a recovery attempt threshold, and sending a request to transfer the first thread. Another aspect includes, selecting a second core from a plurality of cores to receive the first thread from the first core, wherein the second core is selected based on the second core having an idle thread. Another aspect includes transferring a last good architected state of the first thread from the first core to the second core. Another aspect includes loading the last good architected state of the first thread by the idle thread on the second core. Yet another aspect includes resuming execution of the first thread on the second core from the last good architected state of the first thread by the idle thread. 1. A computer implemented method for thread sparing between cores in a processor , the method comprising:determining, by a first core of the processor, that a number of recovery attempts made by a first thread on the first core has exceeded a recovery attempt threshold;sending, by the first core to a processor controller in the processor, a request to transfer the first thread to another core of the processor;based on receiving the request, selecting, by the processor controller, a second core from a plurality of cores of the processor to receive the first thread from the first core, wherein the second core is selected based on the second core having an idle thread;transferring a last good architected state of the first thread from an error recovery logic of the first core to the second core;loading the last good architected state of the first thread by the idle thread on the second core; andresuming execution of the first thread on the second core from the last good architected state of the first thread by the idle thread.2. The method of claim 1 , wherein the recovery attempts made by the ...

Подробнее
10-04-2014 дата публикации

RESOURCE RECOVERY FOR CHECKPOINT-BASED HIGH-AVAILABILITY IN A VIRTUALIZED ENVIRONMENT

Номер: US20140101401A1

A computer-implemented method provides checkpoint high-available for an application in a virtualized environment with reduced network demands. An application executes on a primary host machine comprising a first virtual machine. A virtualization module receives a designation from the application of a portion of the memory of the first virtual machine as purgeable memory, wherein the purgeable memory can be reconstructed by the application when the purgeable memory is unavailable. Changes are tracked to a processor state and to a remaining portion that is not purgeable memory and the changes are periodically forwarded at checkpoints to a secondary host machine. In response to an occurrence of a failure condition on the first virtual machine, the secondary host machine is signaled to continue execution of the application by using the forwarded changes to the remaining portion of the memory and by reconstructing the purgeable memory. 1. A computer-implemented method for resource recovery , the method comprising:a processor executing an application on a primary host machine comprising a first virtual machine with the processor and a memory;receiving a designation from the application of a portion of the memory of the first virtual machine as purgeable memory, wherein the purgeable memory represents portions of memory that can be reconstructed by the application when the purgeable memory is unavailable;tracking changes to a processor state and to a remaining portion of the memory that is not designated by the application as purgeable memory;periodically stopping the first virtual machine;in response to stopping the first virtual machine, forwarding the changes to the remaining portion of the memory to a secondary host machine comprising a second virtual machine;in response to completing the forwarding of the changes, resuming execution of the first virtual machine; andin response to an occurrence of a failure condition on the first virtual machine, signaling the ...

Подробнее
07-01-2016 дата публикации

SEMICONDUCTOR DEVICE

Номер: US20160003910A1
Автор: ISHIMI Koichi
Принадлежит:

The disclosed invention provides a semiconductor device that enables early discovery of a sign of aged deterioration that occurs locally. An LSI has a plurality of modules and a delay monitor cluster including a plurality of delay monitors. Each delay monitor inducts a ring oscillator having a plurality of gate elements. Each delay monitor measures a delay time of the gate elements. A CPU #0 determines if a module proximate to a delay monitor suffers from aged deterioration, based on the delay time measured by the delay monitor. 1. A semiconductor device comprising:a plurality of modules;a plurality of delay monitors, wherein each delay monitor includes a ring oscillator having a plurality of gate elements and measures a delay time of said gate elements, anda control unit that whether or not the delay time measured by said delay monitor exceeds a predetermined reference value,wherein, of said delay monitors, every two delay monitors disposed near to each other form one pair, wherein a ring oscillator in one delay monitor of said pair continues to oscillate except for a predetermined number of cycles before and after a delay time measurement period and a ring oscillator in the other delay monitor of said pair oscillates only during a delay time measurement period, and said control unit determines a difference between a delay time measured by said one delay monitor and a delay time measured by said other delay monitor.2. The semiconductor device according to claim 1 , wherein said control unit issues an alert claim 1 , if having determined that said module suffers from aged deterioration.3. The semiconductor device according to claim 1 , wherein said control unit performs a built-in self test of said semiconductor device claim 1 , if having determined that said module suffers from aged deterioration.4. The semiconductor device according to claim 1 , wherein said control unit decreases the power supply voltage to the module proximate to said delay monitor claim 1 , if ...

Подробнее
07-01-2016 дата публикации

Method and device for synchronously running an application in a high availability environment

Номер: US20160004608A1
Автор: Georges Lecourtier
Принадлежит: Bull SA

A method for synchronously running an application in a high availability environment including a plurality of calculating modules interconnected by a very high-speed broad band network, includes: configuring the modules into partitions including a primary and a secondary partition and a monitoring partition; running the application on each running partition, inputs-outputs processed by the primary partition transmitted to the secondary running partition via the monitoring partition; synchronizing the runnings via exploiting microprocessor context changes; transmitting a catastrophic error signal to the monitoring partition; continuing the running by switching to a degraded mode, the running continuing on a single partition.

Подробнее
04-01-2018 дата публикации

DUAL-PORT NON-VOLATILE DUAL IN-LINE MEMORY MODULES

Номер: US20180004422A1
Принадлежит:

According to an example, a dual-port non-volatile dual in-line memory module (NVDIMM) includes a first port to provide a central processing unit (CPU) with access to universal memory of the dual-port NVDIMM and a second port to provide an external NVDIMM manager circuit with access to the universal memory of the dual-port NVDIMM. Accordingly, a media controller of the dual-port NVDIMM may store data received from the CPU through the first port in the universal memory, control dual-port settings received from the CPU, and transmit the stored data to the NVDIMM manager circuit through the second port of the dual-port NVDIMM. 1. A dual-port non-volatile dual in-line memory module (NVDIMM) , corn prising:a first port to provide a central processing unit (CPU) with access to universal memory of the dual-port NVDIMM;a second port to provide an external NVDIMM manager circuit with access to the universal memory of the dual-port NVDIMM, wherein the NVDIMM manager circuit interfaces with remote storage; and store data received from the CPU through the first port of the dual-port NVDIMM in the universal memory,', 'control dual-port settings for the dual-port NVDIMM received from the CPU through the first port of the dual-port NVDIMM, wherein the dual-port settings include at least one of an active-active redundancy flow and an active-passive redundancy flow, and', 'transmit the stored data to the NVDIMM manager circuit through the second port of the dual-port NVDIMM., 'a media controller to'}2. The dual-port NVDIMM of claim 1 , wherein responsive to controlling the dual-port settings to be the active-active redundancy flow claim 1 , the media controller is to set both the first port and the second port of the dual-port NVDIMM to an active state so that the CPU and NVDIMM manager circuit can simultaneously access the dual-port NVDIMM.3. The dual-port NVDIMM of claim 2 , wherein the media controller comprises an integrated direct memory access (DMA) engine migrate the stored ...

Подробнее
04-01-2018 дата публикации

Application aware input/output fencing

Номер: US20180004612A1
Автор: Abhijit Toley, Jai Gahlot
Принадлежит: Veritas Technologies LLC

Disclosed herein are methods, systems, and processes to perform application aware input/output (I/O) fencing operations. Performing such an application aware I/O fencing operation includes installing an identifier that identifies an instance of an application with a node on which the instance of the application is executing, on coordination points. A weight assigned to the instance of the application is determined, and the instance of the application is terminated based on the weight.

Подробнее
04-01-2018 дата публикации

APPLICATION AWARE INPUT/OUTPUT FENCING

Номер: US20180004613A1
Автор: Gahlot Jai, Toley Abhijit
Принадлежит:

Disclosed herein are methods, systems, and processes to perform application aware input/output (I/O) fencing operations. A determination is made that a cluster has been partitioned. The cluster includes multiple nodes. As a result of the partitioning, the nodes are split between a first network partition with a first set of nodes and a second network partition with a second set of nodes. Another determination is made that instances of an application are executing on the first set of nodes and the second set of nodes. An application aware I/O fencing operation is then performed that causes termination of instances of the application executing on the first set of nodes or on the second set of nodes. 1. A method comprising: the cluster comprises a plurality of nodes, and', 'as a result of the partitioning, the plurality of nodes are split between a first network partition comprising a first set of nodes of the plurality of nodes and a second network partition comprising a second set of nodes of the plurality of nodes;, 'determining that a cluster has been partitioned, wherein'}determining that a plurality of instances of an application are executing on the first set of nodes and the second set of nodes; and 'the performing the application fencing operation causes termination of instances of the application executing on the first set of nodes or on the second set of nodes.', 'performing an application fencing operation, wherein'}2. The method of claim 1 , further comprising: [ 'the application weight matrix comprises a weight assigned to the application, and', 'accessing an application weight matrix, wherein'}, 'comparing a first total application weight of the first set of nodes and a second total application weight of the second set of nodes., 'performing the fencing race comprises, at least in part,'}, 'performing a fencing race, wherein'}3. The method of claim 2 , further comprising:bypassing the fencing race, if all instances of the application are executing on the ...

Подробнее
02-01-2020 дата публикации

FAULT TOLERANCE METHOD AND SYSTEM FOR VIRTUAL MACHINE GROUP

Номер: US20200004642A1

A fault tolerance method and system for a virtual machine group is proposed. The method includes: establishing fault tolerance backup connections of virtual machines between a virtual machine hypervisor of at least one primary host and a virtual machine hypervisor of at least one backup host to perform fault tolerance backups of the virtual machines, wherein the plurality of virtual machines are included in a fault tolerance group; when a synchronizer determines that a failover of at least one first virtual machine among the primary virtual machines in the fault tolerance group is being performed. Informing, by the synchronizer, to perform a failover of other remaining primary virtual machines among the primary virtual machines in the fault tolerance group, or to return other remaining primary virtual machines among the primary virtual machines in the fault tolerance group back to a last fault tolerance backup state of each and continue performing fault tolerance backups of the other remaining primary virtual machines. 1. A fault tolerance method for a virtual machine group , applicable to a fault tolerance system , and comprising:establishing fault tolerance backup connections of a plurality of primary virtual machines between a virtual machine hypervisor of at least one primary host and a virtual machine hypervisor of at least one backup host to perform fault tolerance backups of the primary virtual machines, wherein the primary virtual machines are included in a fault tolerance group; andwhen a synchronizer determines that a failover of at least one first virtual machine among the primary virtual machines in the fault tolerance group is being performed,informing, by the synchronizer, to perform a failover of one or more other remaining primary virtual machines among the primary virtual machines in the fault tolerance group, orinforming, by the synchronizer, to return the one or more other remaining primary virtual machines among the primary virtual machines in ...

Подробнее
02-01-2020 дата публикации

PROACTIVE CLUSTER COMPUTE NODE MIGRATION AT NEXT CHECKPOINT OF CLUSTER CLUSTER UPON PREDICTED NODE FAILURE

Номер: US20200004648A1
Принадлежит:

While scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, a likelihood of failure of the active compute nodes is periodically and independently predicted. Responsive to the likelihood of failure of a given active compute node exceeding a threshold, the given active compute node is proactively migrated to a spare compute node of the cluster at a next scheduled checkpoint. Another spare compute node of the cluster can perform prediction and migration. Prediction can be based on both hardware events and software events regarding the active compute nodes. 1. A method comprising:while scheduled checkpoints are being taken of a cluster of active compute nodes distributively executing an application in parallel, periodically predicting independently of the scheduled checkpoints, by a processor, a likelihood of failure of each active computing node; andresponsive to the likelihood of failure of a given active compute node exceeding a threshold, proactively migrating, by the processor, the given active compute node to a spare compute node of the cluster at a next scheduled checkpoint.2. The method of claim 1 , wherein predicting the likelihood of failure of each active computing node does not affect when the checkpoints are taken.3. The method of claim 1 , wherein the spare compute node is a first spare compute node of the cluster claim 1 ,and wherein the processor predicting the likelihood of failure of each active computing node and proactively migrating the given active compute node to the first spare compute node is a part of a second spare compute node of the cluster.4. The method of claim 1 , wherein predicting the likelihood of failure of each active computing node is based on both hardware events regarding the active compute nodes and software events regarding the active compute nodes.5. The method of claim 4 , wherein the software events comprise:a last number of ranks of the application ...

Подробнее
13-01-2022 дата публикации

INDEXING BACKUP DATA GENERATED IN BACKUP OPERATIONS

Номер: US20220012135A1
Принадлежит:

In certain embodiments, a tiered storage system is disclosed that provides for failover protection during data backup operations. The system can provide for an index, or catalog, for identifying and enabling restoration of backup data located on a storage device. The system further maintains a set of transaction logs generated by media agent modules that identify metadata with respect to individual data chunks of a backup file on the storage device. A copy of the catalog and transaction logs can be stored at a location accessible by each of the media agent modules. In this manner, in case of a failure of one media agent module during backup, the transaction logs and existing catalog can be used by a second media agent module to resume the backup operation without requiring a restart of the backup process. 1. A non-transitory computer-readable medium that stores instructions which , when executed by a first computing device comprising one or more hardware processors , cause the first computing device to: [ (a) first backup data generated by the second computing device, wherein the first backup data comprises a plurality of data chunks, and', '(b) storage of the first backup data as the plurality of data chunks in one or more second storage devices that are communicatively coupled to the second computing device,, 'wherein the one or more first transaction logs are based on one or more of, 'wherein once a given data chunk from the plurality of data chunks is stored in the one or more second storage devices, the second computing device transmits a corresponding transaction log to the first computing device, and', 'wherein the second computing device comprises one or more hardware processors;, 'receive from a second computing device one or more first transaction logs,'}after the first backup data is generated, apply the one or more first transaction logs to an index; andwherein the index is configured to enable restoring backup data generated by at least the second ...

Подробнее
03-01-2019 дата публикации

METHOD AND SYSTEM FOR DYNAMIC CONFIGURATION OF MULTIPROCESSOR SYSTEM

Номер: US20190004827A1
Принадлежит:

A multiprocessor system used in a car, home, or office environment includes multiple processors that run different real-time applications. A dynamic configuration system runs on the multiple processors and includes a device manager, configuration manager, and data manager. The device manager automatically detects and adds new devices to the multiprocessor system, and the configuration manager automatically reconfigures which processors run the real-time applications. The data manager identifies the type of data generated by the new devices and identifies which devices in the multiprocessor system are able to process the data. 1. A processing system , comprising:a network, wherein the network comprises one of wired and wireless, and wherein the network is connected to a server by at least one Transmission Control Protocol/Internet Protocol (TCP/IP) packet switched link, and wherein the server runs real-time applications or other server based applications;an application processor connected to the network and configured in a language to implement instructions for devices and applications used in the network, and configured to monitor data from different sensors;a display processor connected to the network and configured in the language to implement instructions for devices and applications used in the network, wherein the instructions control and monitor the application processor through the server using a graphical user interface.2. The system of claim 1 , wherein the application processor is an appliance processor claim 1 , and wherein the appliance is at least one of a home appliance claim 1 , an office appliance claim 1 , and an industrial appliance.3. The system of claim 1 , wherein the wireless network comprises one of 802 IEEE wireless standards and Bluetooth.4. The system of claim 1 , wherein the network is configured to operate in a variety of different languages.5. The system of claim 1 , wherein the language is HTML.6. The system of claim 1 , wherein the ...

Подробнее
03-01-2019 дата публикации

AUTOMATED DISASTER RECOVERY SYSTEM AND METHOD

Номер: US20190004905A1
Принадлежит:

Methods and systems for recovering a host image of a client machine to a recovery machine comprise comparing a profile of a client machine of a first type to be recovered to a profile of a recovery machine of a second type different from the first type, to which the client machine is to be recovered, by a first processing device. The first and second profiles each comprise at least one property of the first type of client machine and the second type of recovery machine, respectively. At least one property of a host image of the client machine is conformed to at least one corresponding property of the recovery machine. The conformed host image is provided to the recovery machine, via a network. The recovery machine is configured with at least one conformed property of the host image by a second processing device of the recovery machine. 1. A method of recovering a host image of a physical client machine of a first type of hardware and/or manufacturer to a physical recovery machine of a second type of hardware and/or manufacturer different from the first type , wherein the host image comprises an operating system , the method comprising:comparing, by at least one first processing device separate from the client machine and the recovery machine, a first profile of a client machine of the first type to be recovered to a second profile of a recovery machine of the second type to which the client machine is to be recovered, wherein the first and second profiles each comprise at least one property of the first type of client machine and the second type of recovery machine, respectively;conforming at least one property of the host image of the client machine to at least one corresponding property of the recovery machine based, at least in part, on the comparison;providing the conformed host image and the recovery machine, via a network;configuring the recovery machine with at least one conformed property of the host image by at least one second processing device of the ...

Подробнее
01-01-2015 дата публикации

QUICK FAILOVER OF BLADE SERVER

Номер: US20150006950A1
Принадлежит:

Failover process for switching from a “failing” blade server to a “replacing” blade server. This process includes the following steps: (i) booting a replacing blade server to a set of operating system(s) including a first operating system; and (ii) subsequent to the booting of the replacing blade server, sending command data to the replacing blade server. The command data includes a command for the replacing blade server to configure itself to replace the failing blade server. 16-. (canceled)7. A computer program product for controlling , at least a portion of , a failover from a failing blade server to a replacing blade server , the computer program product comprising software stored on a software storage device , the software comprising:first program instructions programmed to boot a replacing blade server to a set of operating system(s) including a first operating system;second program instructions programmed to receive, by the replacing blade server, command data while the replacing blade server is in a booted state; andthird program instructions programmed to, responsive to the command data, configure the replacing blade server to replace the failing blade server without any rebooting of the replacing blade server;wherein:the software is stored on a software storage device in a manner less transitory than a signal in transit.8. The product of wherein the command data further includes failing server data to be used in configuration of the replacing blade server as a replacement for the failing blade server.9. The product of wherein the software further comprises:fourth program instructions to run, on a first adaptation layer of the replacing server, runtime services to effect the configuration of the replacing blade server as a replacement for the failing blade server.10. The product of wherein the runtime services are runtime abstract software services.11. The product of wherein the replacing blade server and the failing blade servers are both proper blade ...

Подробнее
01-01-2015 дата публикации

METHOD AND SYSTEM FOR PROVIDING HIGH AVAILABILITY TO DISTRIBUTED COMPUTER APPLICATIONS

Номер: US20150006958A1
Принадлежит:

Method, system, apparatus and/or computer program for achieving transparent integration of high-availability services for distributed application programs. Loss-less migration of sub-programs from their respective primary nodes to backup nodes is performed transparently to a client which is connected to the primary node. Migration is performed by high-availability services which are configured for injecting registration codes, registering distributed applications, detecting execution failures, executing from backup nodes in response to failure, and other services. High-availability application services can be utilized by distributed applications having any desired number of sub-programs without the need of modifying or recompiling the application program and without the need of a custom loader. In one example embodiment, a transport driver is responsible for receiving messages, halting and flushing of messages, and for issuing messages directing sub-programs to continue after checkpointing. 1. A method of achieving transparent integration of a distributed application program with a high availability protection program , comprising:injecting registration code, transparently and automatically, into a plurality of sub-programs during launch, without the need of modifying or recompiling the application program and without the need of a custom loader;registering the distributed application automatically with a high-availability protection program;detecting a failure in the execution of the distributed application program by said high-availability protection program; andexecuting the distributed application, subject to the detected failure, with one or more of the sub-programs being executed from their respective backup nodes automatically by said high-availability protection program in response to the failure.2. A method as recited in :wherein said high-availability protection program is configured as an extension of an operating system;wherein recovery of application ...

Подробнее
04-01-2018 дата публикации

Application aware input/output fencing

Номер: US20180007129A1
Автор: Abhijit Toley, Jai Gahlot
Принадлежит: Veritas Technologies LLC

Disclosed herein are methods, systems, and processes to perform application aware input/output (I/O) fencing operations. A determination is made that an instance of an application is executing on a node that is part of a cluster. In response to the determination that the instance of the application is executing on the node, an identifier that associates the instance of the application and the node on which the instance of the application is executing is generated for the instance of the application. The identifier is installed on one or more coordination points.

Подробнее
02-01-2020 дата публикации

MICRO-LEVEL NETWORK NODE FAILOVER SYSTEM

Номер: US20200007666A1
Принадлежит:

An improved core network that can monitor micro-level issues, identify specific services of specific nodes that may be causing an outage, and perform targeted node failovers in a manner that does not cause unnecessary disruptions in service is described herein. For example, the improved core network can include a failover and isolation server (FIS) system. The FIS system can obtain service-specific KPIs from the various nodes in the core network. The FIS can then compare the obtained KPI values of the respective service with corresponding threshold values. If any KPI value exceeds a corresponding threshold value, the FIS may preliminarily determine that the service of the node associated with the KPI value is responsible for a service outage. The FIS can initiate a failover operation, which causes the node to re-route any received requests corresponding to the service potentially responsible for the service outage to a redundant node. 1. A computer-implemented method comprising:obtaining one or more key performance indicator (KPI) values associated with one or more nodes in a core network that offer a first service, wherein the one or more KPI values are associated with the first service;comparing a first KPI value in the one or more KPI values with a first threshold value;determining that the first KPI exceeds the first threshold value;determining that the first KPI value corresponds with a first node in the one or more nodes;instructing the first node to re-route requests corresponding to the first service to a second node in the one or more nodes that is redundant to the first node;obtaining one or more second KPI values associated with the one or more nodes after the first node is instructed to re-route the requests;determining that a second KPI value in the one or more second KPI values exceeds a second threshold value;determining that the second KPI value corresponds with a third node in the one or more nodes;instructing the first node to no longer re-route ...

Подробнее
27-01-2022 дата публикации

RESERVING FAILOVER CAPACITY IN CLOUD COMPUTING

Номер: US20220027244A1
Принадлежит: VMWARE, INC.

Methods and devices for providing reserved failover capacity across a plurality of data centers are described herein. An exemplary method includes determining whether a management process is executing at a first data center corresponding to a first physical location. In accordance with a determination that the management process is not executing at the first data center corresponding to the first physical location a host is initiated at a second data center corresponding to a second physical location and the management process is executed on the initiated host at the second data center corresponding to the second physical location. 1. A method for providing reserved failover capacity across a plurality of data centers , the method comprising:determining whether a management process is executing at a first availability zone corresponding to a first physical location;reserving resources equivalent to a resource requirement of the management process at a second availability zone corresponding to a second physical location; initiating a host at the second availability zone corresponding to a second physical location; and', 'executing the management process on the initiated host at the second availability zone corresponding to the second physical location., 'in accordance with a determination that the management process is not executing at the first availability zone corresponding to the first physical location2. The method of claim 1 , wherein determining whether the management process is not executing at the first availability zone corresponding to the first physical location includes determining if the management process is not executing due to a first failure condition or a second failure condition.3. The method of claim 2 , further comprising: 'forgoing initiating the host at the second availability zone corresponding to the second location.', 'in accordance with a determination that the management process at the first availability zone corresponding to the first ...

Подробнее
14-01-2016 дата публикации

Run-To-Completion Thread Model for Software Bypass Fail Open for an Inline Intrusion Protection System

Номер: US20160011948A1
Автор: Macdonald Stuart John
Принадлежит:

A processing-based bypass “fail open” mode is provided for an intrusion prevention system by a primary process running on a first logical core (lcore) is used as a control plane, which invokes bypass-open run-to-completion threads in other lcores comprising a bypass data plane, and which spawns a secondary process to fully configure intrusion prevention threads on other lcores to create an Intrusion Prevention System data plane. Upon a ready signal from the secondary process, the primary process quiesces such that the secondary process IPS data plane exclusively owns and executes on the other lcores. 1. A method of providing a bypass fail open mode for an inline intrusion prevention system comprising the steps of: configuring one or more logical core pools, two or more network ports, and a bypass option;', 'starting one or more bypass threads on at least one logical core separate from the first single logical core, wherein the bypass threads provide a fail-open communications processing between the network ports; and', 'spawning a secondary process;, 'executing by one or more processors a primary process utilizing a first single logical core as a control plane, the primary process;'} opening an interprocess communication channel by a secondary process control plane with the primary control plane;', 'initializing and invoking by the secondary process one or more intrusion prevention threads comprising an intrusion prevention system data plane, wherein the one or more intrusion prevention threads are executed by one or more logical cores separate from the first logical core to process frames between the network ports; and', 'notifying the primary process by the secondary process via the interprocess communication channel of readiness of the intrusion detection threads; and, 'while the primary process owns and performs frame processing using one or more bypass data plane threadsresponsive to receiving the notification, the primary process relinquishing ownership of the ...

Подробнее
11-01-2018 дата публикации

Disaster recovery systems and methods

Номер: US20180011766A1
Принадлежит: Unitrends Inc

An illustrative method for storing disaster recovery data includes receiving a plurality of copies of data stored by a first memory device. Each of the plurality of copies includes a plurality of blocks of data. The method also includes storing, in a second memory device, the plurality of copies in an object-oriented format, determining, using recovery time objectives, a number of the plurality of copies to be stored in a block-oriented format, and selecting a subset of the plurality of copies having the determined number of the plurality of copies. The method further includes assigning each of the other copies of the plurality of copies to one of a plurality of clusters. Each cluster of the plurality of clusters includes one of the subset of the plurality of copies. The method also includes determining, for each cluster, a copy having a highest number of blocks also present in the other copies of the cluster and storing, in the block-oriented format, the determined copy from each cluster in a third memory device.

Подробнее
10-01-2019 дата публикации

Technique For Higher Availability In A Multi-Node System

Номер: US20190012244A1
Принадлежит:

Techniques are described herein for quick identification of a set of units of data for which recovery operations are to be performed to redo or undo changes made by the failed node. When a lock is requested by an instance, lock information for the lock request is replicated by another instance. If the instance fails, the other instance may use the replicated lock information to determine a set of data blocks for recovery operations. The set of data blocks is available in memory of a recovery instance when a given node fails, and does not have to be completely generated by scanning a redo log. 1. A method comprising:generating, at a first node of a multi-node database system, a plurality of lock requests; storing, in a redo log associated with the first node, changes to a target data block and a change number associated with the changes;', 'receiving, at a second node of the multi-node database system, a request to replicate lock information for the lock request; and', 'storing, in a memory of the second node, the change number and a location of the target data block., 'for each lock request of the plurality of lock requests2. The method of wherein only the second node is assigned to replicate lock information for the first node.3. The method of wherein a plurality of nodes are assigned to replicate lock information for the first node claim 1 , and the plurality of nodes includes the second node.4. The method of further comprising sending the request to replicate lock information asynchronously to the second node.5. The method of further comprising:in response to a failure of the first node, sending a recovery request to the second node;determining at the second node, based on replicated lock information, a set of one or more data blocks to recover, wherein said replicated lock information includes replicated lock information for a plurality of lock requests.6. The method of claim 5 , wherein determining the set of one or more data blocks comprises:determining a ...

Подробнее
10-01-2019 дата публикации

Failover Method, Apparatus and System

Номер: US20190012245A1
Принадлежит:

A failover method, apparatus and system to implement fast failover between a primary processor and a secondary processor, where the method includes receiving, by a first device, transaction content of a transaction and transaction status data of the transaction, the transaction status data being used to resume the transaction when the transaction is interrupted by a failure of a second device, and continuing to process, by the first device, the transaction according to the transaction content and the transaction status data when detecting that the second device fails. 1. A failover method , comprising:receiving, by a first device, transaction content of a transaction and transaction status data of the transaction, the transaction status data being used to resume the transaction when the transaction is interrupted by a failure of a second device; andcontinuing to process, by the first device, the transaction according to the transaction content and the transaction status data when detecting that the second device fails.2. The failover method of claim 1 , wherein the transaction status data comprises a transaction processing location identifier claim 1 , and continuing to process the transaction comprises:determining, by the first device according to the transaction processing location identifier, a location at which the transaction is interrupted; andcontinuing to process, by the first device, the interrupted transaction from the location at which the transaction is interrupted.3. The failover method of claim 1 , wherein continuing to process the transaction comprises claim 1 , processing claim 1 , by the first device claim 1 , the interrupted transaction again from a start position of the transaction.4. The failover method of claim 1 , wherein the transaction status data comprises a transaction completion identifier claim 1 , and the failover method further comprises deleting claim 1 , by the first device claim 1 , information corresponding to the transaction ...

Подробнее
09-01-2020 дата публикации

ROLE MANAGEMENT OF COMPUTE NODES IN DISTRIBUTED CLUSTERS

Номер: US20200012577A1
Принадлежит:

In one example, a distributed cluster may include compute nodes having a master node and a replica node, an in-memory data grid formed from memory associated with the compute nodes, a first high availability agent running on the replica node, and a second high availability agent running on the master node. The first high availability agent may determine a failure of the master node by accessing data in the in-memory data grid and designate a role of the replica node as a new master node to perform cluster management tasks of the master node. The second high availability agent may determine that the new master node is available in the distributed cluster by accessing the data in the in-memory data grid when the master node is restored after the failure and demote a role of the master node to a new replica node. 1. A distributed cluster comprising:a plurality of compute nodes comprising a master node and a replica node;an in-memory data grid formed from memory associated with the plurality of compute nodes; determine a failure of the master node by accessing data in the in-memory data grid; and', 'designate a role of the replica node as a new master node to perform cluster management tasks of the master node upon determining the failure; and, 'a first high availability agent running on the replica node to determine that the new master node is available in the distributed cluster by accessing the data in the in-memory data grid when the master node is restored after the failure; and', 'demote a role of the master node to a new replica node upon determining the new master node., 'a second high availability agent running on the master node to2. The distributed cluster of claim 1 , wherein the first high availability agent running on the replica node is to:initiate a failover operation to failover the replica node to the new master node upon determining the failure of the master node;upon completion of the failover operation, determine whether the master node is restored ...

Подробнее
19-01-2017 дата публикации

DETECTING HIGH AVAILABILITY READINESS OF A DISTRIBUTED COMPUTING SYSTEM

Номер: US20170017535A1
Принадлежит:

Technology is disclosed for determining high availability readiness of a distributed computing system (“system”). A confidence measure (CM) can be computed for a particular controller in the system to determine whether a takeover by the particular controller from a first controller would be successful. The CM can be a percentage value. A CM of 0% indicates that a takeover would be a failure, which results in loss of access to data managed by the first controller. A CM of 100% indicates a successful takeover with no performance impact on the system. A CM between 0% and 100% indicates a successful takeover but with a performance impact. The CM can be computed based on events occurring in the system, e.g., veto and non-veto events. The CM is computed as a function of various weights and/or indices associated with the veto events and/or non-veto events. 1. A method , comprising:receiving, by a computing device, a list of historical events related to a high availability pair comprising a first node and a second node of a distributed computing system;determining, by the computing device, a set of non-veto events and a set of veto events related to at least one of the nodes from the list of historical events;obtaining by the computing device, a severity index and a compliance factor for each event of the set of non-veto events; andgenerating and outputting, by the computing device, a confidence measure for the at least one of the nodes based on the set of veto events, the severity index, and the compliance factor.2. The computer-implemented method of claim 1 , wherein the confidence measure indicates a magnitude of an impact on a performance of the distributed computing system if the at least one of the nodes takes over from another one of the nodes.3. The computer-implemented method of claim 1 , wherein the severity index of an event of the set of non-veto events indicates a magnitude of performance impact on the distributed computing system due to the occurrence of the ...

Подробнее
16-01-2020 дата публикации

Independent safety monitoring of an automated driving system

Номер: US20200017114A1
Принадлежит: Intel Corp

An automated driving system includes a security companion subsystem to access data generated at a compute subsystem of the automated driving system, which indicates a determination by the compute subsystem associated with an automated driving task. The security companion subsystem determines whether the determination is safe based on the data. The security companion subsystem is configured to realize a higher safety integrity level than the compute subsystem.

Подробнее
21-01-2016 дата публикации

FAILURE RECOVERY APPARATUS OF DIGITAL LOGIC CIRCUIT AND METHOD THEREOF

Номер: US20160019126A1
Автор: KWON Young-Su
Принадлежит:

Exemplary embodiments of the present invention relate to a failure recovery apparatus of digital logic circuit and method thereof when a fault occurs in the digital logic circuit. A failure recovery apparatus according to an embodiment of the present invention comprises: a fault detection block configured to determine fault occurrence by comparing output results of a plurality of digital logic circuit which perform the same operation using a clock having a first cycle; and a failure recovery block configured to perform a failure recovery operation of the plurality of digital logic circuit by using a clock having a second cycle which is longer than the first cycle when it is determined as that a fault occurs. According to exemplary embodiments of the present invention, when a fault occurs in digital logic circuits due to external factors, it provides high reliability in failure recovery of the digital logic circuits.

Подробнее
15-01-2015 дата публикации

Tolerating failures using concurrency in a cluster

Номер: US20150019900A1
Принадлежит: International Business Machines Corp

A system, and computer program product for tolerating failures using concurrency in a cluster are provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted.

Подробнее
15-01-2015 дата публикации

TOLERATING FAILURES USING CONCURRENCY IN A CLUSTER

Номер: US20150019901A1

A method is provided in the illustrative embodiments. A failure is detected in a first computing node serving an application in a cluster. A subset of actions is selected from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster. A waiting period is set for the first computing node. The first computing node is allowed to continue serving the application during the waiting period. During the waiting period, concurrently with the first computing node serving the application, the subset of actions is performed at the second computing node. Responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node is aborted. 1. A method for tolerating failures using concurrency in a clustered data processing environment , the method comprising:detecting a failure in a first computing node, the first computing node serving an application in a cluster of computing nodes;selecting a subset of actions from a set of actions, the set of actions configured to transfer the serving of the application from the first computing node to a second computing node in the cluster;setting a waiting period for the first computing node;allowing the first computing node to continue serving the application in the cluster during the waiting period;performing during the waiting period, concurrently with the first computing node serving the application, the subset of actions at the second computing node; andaborting, responsive to receiving a signal of activity from the first computing node during the waiting period, the concurrent operation of the second computing node in the cluster.2. The method of claim 1 , wherein the performing the subset of actions comprises: completing a task in preparation for the takeover.3. The method of claim 1 , further comprising:reordering the set of actions such that the ...

Подробнее
18-01-2018 дата публикации

Virtual Machine Seed Image Replication through Parallel Deployment

Номер: US20180018191A1
Принадлежит:

Generating secondary virtual machine seed image storage is provided. An input is received to deploy a primary virtual machine and a secondary virtual machine based on a golden virtual machine image. In response, the primary virtual machine from the golden virtual machine image on a primary data processing site and the secondary virtual machine from the golden virtual machine image on a secondary data processing site are deployed. Execution of the secondary virtual machine is suspended on the secondary data processing site. Using the golden virtual machine image, a seed image corresponding to the secondary virtual machine is generated that is up-to-date at that point in time in storage at the secondary data processing site to form the secondary virtual machine seed image storage. The secondary virtual machine seed image storage is enabled to receive state data updates from the primary virtual machine on the primary data processing site. 1. A computer-implemented method for generating secondary virtual machine seed image storage , the computer-implemented method comprising:receiving, by a computer, an input to deploy a primary virtual machine and a secondary virtual machine based on a golden virtual machine image;responsive to the computer receiving the input, deploying, by the computer, the primary virtual machine from the golden virtual machine image on a primary data processing site and the secondary virtual machine from the golden virtual machine image on a secondary data processing site;suspending, by the computer, execution of the secondary virtual machine on the secondary data processing site;generating, by the computer, using the golden virtual machine image, a seed image corresponding to the secondary virtual machine that is up-to-date at that point in time in storage at the secondary data processing site to form the secondary virtual machine seed image storage; andenabling, by the computer, the secondary virtual machine seed image storage to receive state ...

Подробнее
18-01-2018 дата публикации

NODE SYSTEM, SERVER APPARATUS, SCALING CONTROL METHOD, AND PROGRAM

Номер: US20180018244A1
Принадлежит: NEC Corporation

A system includes an active system that executes processing, a standby system that is able to perform at least one of scale-up and scale-down, and a control apparatus that controls system switching to set the standby system undergoing the scaled up or scaled down as a new active system. 1. A node system comprising:an active system that executes processing;a standby system that is able to perform at least one of scale-up and scale-down; anda control apparatus that controls system switching to switch the standby system undergoing the scale-up or scale-down to a new active system.2. The node system according to claim 1 , wherein the control apparatus is configured to instruct the standby system to perform scale-up or scale-down claim 1 , when performing scale-up or scale-down of the active system claim 1 , andupon reception of a completion notification from the standby system with the scale-up or scale-down completed, the control apparatus controls the system switching to switch the standby system undergoing the scale-up or scale-down to the new active system, and to switch the active system before the system switching to a new standby system.3. The node system according to claim 2 , wherein the control apparatus controls the new standby system to perform scale-up or scale-down in the same way as the standby system switched to the new active system.4. The node system according to claim 2 , wherein claim 2 , when the active system needs to be scaled up claim 2 ,the control apparatus instructs the standby system that completes the scale-up in response to a scale-up instruction from the control apparatus, to switch to the new active system,after the system switching, the new standby system imposes processing restriction on the new active system,the control apparatus instructs the new standby system to perform scale-up in the same way as the standby system that transitions to the new active system, andthe control apparatus, upon reception of a scale-up completion ...

Подробнее
17-01-2019 дата публикации

Method and Arrangement for Operating Two Redundant Systems

Номер: US20190018401A1
Принадлежит:

A method and an arrangement having redundant systems operating in parallel in a cyclic mode and reciprocally checking a result of the task of the other system on a regular basis, and wherein one system is selected or confirmed for the productive mode in the fault situation found, where a characteristic variable concerning an operating parameter is picked up for each of the systems in multiple/all cycles and used for updating statistical parameters, where at least when a disparity between results of the two systems is found, a current operating parameter is correlated with the statistical parameter for each system, and where the system for which the current operating parameter differs from the statistical parameter less is detected as the correctly operating system and used for the productive mode such that the degree of fault coverage can be increased and hence the availability of the overall system increased. 1. A method for operating an arrangement having two redundant systems each operating in parallel in a cyclic mode , one system of the two systems operating in a productive mode each time and another system of the two systems executing the same task for checking purposes , the method comprising:checking reciprocally by the two systems at least one result of a task of a respective other system on a regular basis, each system of the two systems comparing a result of the task of the other system of the two systems with their own result, a detected fault comprising a detected disparity among the results leads to a fault situation being found, with one of the systems being selected or confirmed for the productive mode in the fault situation found;picking up at least one respective characteristic variable concerning an operating parameter for each of the systems in multiple or all cycles and using the picked up at least one respective characteristic variable for updating at least one statistical parameter each time;correlating a current operating parameter with the ...

Подробнее
17-01-2019 дата публикации

SIMPLIFIED PROCESSOR SPARING

Номер: US20190018744A1
Принадлежит:

In a multi-core computer system, a method for dealing with a fault with a core includes detecting a fault in one of the cores. Information is transferred from a recovery buffer to a mapper. The information includes logical register mapping information. A recovery is performed using the information in the mapper. If a recovery cannot proceed, a sparing can be initiated using the information in the mapper. 1. A method comprising:detecting a fault in a first core in a multi-core computer system;based at least in part on the detecting, transferring information from a recovery buffer to a mapper, wherein the information includes logical register mapping information;after transferring the information from the recovery buffer to the mapper, determining if a recovery reset will proceed;upon determining that the recovery reset will not proceed, selecting a spare core; andperforming a sparing using the information in the mapper.2. The method of wherein performing the sparing comprises:loading the information in the mapper to the spare core; andlogically replacing the first core with the spare core.3. The method of wherein determining that the recovery reset cannot proceed comprises detecting a previous recovery reset attempt failed.4. The method of wherein determining that the recovery reset cannot proceed comprises detecting a failure to restore the recovery buffer.5. The method of further comprising stopping execution of threads in the first core.6. The method of further comprising stopping any updates to the mapper and the recovery buffer.7. The method of further comprising:upon determining that recovery reset will proceed, performing a recovery of the first core using the information in the mapper.8. A system comprising:a plurality of processing cores; wherein: detect a fault in a first core;', 'after transferring information from a recovery buffer to a mapper, determining if a recovery reset will proceed, wherein the information includes logical register mapping ...

Подробнее
16-01-2020 дата публикации

STORAGE SYSTEM AND CONFIGURATION INFORMATION CONTROL METHOD

Номер: US20200019478A1
Принадлежит: Hitachi, Ltd.

Proposed is a scale-out-type storage system which implements high-availability, high-speed failover. In a scale-out-type storage system, two or more nodes each comprise a cluster controller, a node controller, a plurality of subcluster processes (subclusters and the like) which are processes which execute I/O processing in their own node, which form a subcluster between processes in their own node, and which are synchronized with work-type (active)/standby-type (passive) corresponding processes in the other nodes, and a nonvolatile data store (SODB). The configuration information of the storage system is held partitioned into global configuration information of the SODB and local configuration information and the like of the subclusters and the like, and thereupon the working-type subcluster is capable of executing I/O processing without accessing the SODB. 1. A scale-out-type storage system in which a cluster is constructed by linking a plurality of nodes , at least two or more nodes among the plurality of nodes each comprising:a cluster controller which controls processing spanning the whole cluster;a node controller which performs closed processing control on its own node;a plurality of subcluster processes which are processes which execute I/O processing in their own node, which form a subcluster between processes in their own node, and which are synchronized with work-type/standby-type corresponding processes in the other nodes; anda nonvolatile data store which is shared by the whole cluster,wherein the data store holds, as global configuration information, configuration information which includes information that must be shared by the whole cluster among the configuration information of the storage system,wherein the subcluster processes hold, as local configuration information, configuration information which is required for their own subcluster process to operate among the configuration information of the storage system, andwherein the work-type subcluster ...

Подробнее
16-01-2020 дата публикации

DISASTER RECOVERY DEPLOYMENT METHOD, APPARATUS, AND SYSTEM

Номер: US20200019479A1
Принадлежит:

This application discloses a disaster recovery deployment method, apparatus, and system, and relates to the field of network application technologies. The method includes: obtaining, by a master data center and a backup data center, disaster recovery control information; sending, by the master data center, the data corresponding to the service of the master data center to the at least one backup data center based on the disaster recovery control information; and deploying, by the backup data center, a disaster recovery resource for the master data center based on the disaster recovery control information, and backing up the received data. In other words, the master data center and the backup data center automatically back up resources and data based on the disaster recovery control information, and therefore, manual operation steps in a disaster recovery deployment process are simplified, and efficiency of disaster recovery deployment is improved. 1. A method for disaster recovery deployment , the method comprising:obtaining, by a master data center, disaster recovery control information, wherein the disaster recovery control information indicates a disaster recovery resource to be deployed by at least one backup data center for a service of the master data center and a backup relationship of data corresponding to the service of the master data center in the at least one backup data center, and wherein the disaster recovery resource is a resource used to perform disaster recovery and backup on the service of the master data center;sending, by the master data center, the data corresponding to the service of the master data center to the at least one backup data center based on the disaster recovery control information;obtaining, by the backup data center, the disaster recovery control information;deploying, by the backup data center, a first disaster recovery resource for the master data center based on the disaster recovery control information; andreceiving, by the ...

Подробнее
17-01-2019 дата публикации

TELEPROTECTION REQUIREMENT COMPLIANCE

Номер: US20190020372A1
Принадлежит:

A methodology includes determining a first delay between a first relay and a first label edge router, a second delay between a second relay and a second label edge router, and a third delay of a label-switched path between the first label edge router and the second label edge router. Based on the first, second, and third delays, it is determined whether an end-to-end latency between the first relay and the second relay exceeds an end-to-end latency threshold. 1. A method comprising:determining a first delay between a first relay and a first label edge router;determining a second delay between a second relay and a second label edge router;determining a third delay of a label-switched path between the first label edge router and the second label edge router; andbased on the first, second, and third delays, determining whether an end-to-end latency between the first relay and the second relay exceeds an end-to-end latency threshold.2. The method of claim 1 , wherein:determining the first delay includes determining the first delay based on a first timestamp of a first Generic Object Oriented Substation Event (GOOSE) message; anddetermining the second delay includes determining the second delay based on a second timestamp of a second GOOSE message.3. The method of claim 1 , further comprising:if it is determined that the end-to-end latency exceeds the end-to-end latency threshold, initiating a failover to a backup label-switched path between the first label edge router and the second label edge router.4. The method of claim 3 , further comprising:determining a fourth delay of the backup label-switched path between the first label edge router and the second label edge router; andbased on the first, second, and fourth delays, determining that another end-to-end latency between the first relay and the second relay does not exceed the end-to-end latency threshold.5. The method of claim 3 , further comprising:determining that the end-to-end latency no longer exceeds the end- ...

Подробнее
28-01-2016 дата публикации

STORAGE DEVICE AND STORAGE SYSTEM

Номер: US20160026398A1
Принадлежит:

A virtualization controller identifies a node that manages a segment to be accessed, and instructs the node to access the segment. A mirror controller of the node instructed to access the segment writes data in the segment managed by the node and in a segment having a mirror relation with the segment managed by the node. 1. A storage apparatus that constitutes a storage system in cooperation with other storage apparatuses each including a controller that is capable of receiving an access request and performs certain processing on the access request , the storage apparatus comprising:a storage that stores therein data in a predefined unit of storage; anda controller that is capable of receiving an access request to the storage system, and performs predefined processing on the access request, whereinthe controller performs, when receiving a write access request, processing for identifying a controller, out of a plurality of controllers of storage apparatuses that constitute the storage system, that performs mirror control for storing data, by mirroring, in an original storage and a mirror destination storage in a predefined unit of storage, and performs the mirror control corresponding to the writing access request when identified, by the processing for identifying, that the controller that has received the access request is a controller that performs the mirror control.2. The storage apparatus according to claim 1 , wherein when identified claim 1 , by the processing for identifying claim 1 , that another controller is a controller that performs the mirror control claim 1 , the controller instructs the other controller to perform the mirror control corresponding to the writing access request.3. The storage apparatus according to claim 1 , wherein the controller instructs the other controller to write data in a virtual storage area in which a plurality of units of storage managed by the other controller are grouped claim 1 , and causes the other controller to perform ...

Подробнее
17-04-2014 дата публикации

SYSTEMS AND METHODS FOR FAULT TOLERANT, ADAPTIVE EXECUTION OF ARBITRARY QUERIES AT LOW LATENCY

Номер: US20140108861A1
Принадлежит:

A system and method for performing distributed execution of database queries includes a query server that receives a query to be executed on a database, forms a query plan based on the query, assigns tasks to task slots on a plurality of worker nodes in a cluster, and, upon receipt of a notification that a task has completed on a worker node, immediately assigns an unassigned task to a free task slot on that worker node, such that the task may begin executing on that worker node substantially immediately thereafter. The task slots on worker nodes include pools of resources that run tasks without start-up overhead. 1. A system for performing distributed execution of database queries , the system comprising:a query server comprising at least one memory storing computer executable instructions and at least one processing unit for executing the instructions, wherein execution of the instructions causes the at least one processing unit to:receive a query to be executed on a database;form a query plan based on the query, the query plan comprising a plurality of operators, each operator divided into one or more tasks;assign tasks to task slots on a plurality of worker nodes in a cluster, wherein the task slots comprise pools of resources that run tasks without start-up overhead; andupon receipt of a notification that a task has completed on a worker node, immediately assign an unassigned task to a free task slot on that worker node, such that the task may begin executing on that worker node substantially immediately thereafter.2. The system of claim 1 , wherein the database comprises at least one table partitioned over the plurality of worker nodes.3. The system of claim 1 , wherein execution of the instructions further causes the at least one processing unit to:receive a task complete status update from a worker node upon completion of an assigned task on that worker node; andin response to the task complete status update, notify a worker node having an assigned task ...

Подробнее
26-01-2017 дата публикации

Fault tolerant systems and method of using the same

Номер: US20170024294A1
Принадлежит: Siemens Energy Inc

Systems and methods for resolving fault detection in a control system is provided. The system includes an I/O module operably connected to a first, second, and third microcontroller for transmitting data. The first microcontroller is in an active state, i.e., in control, while the remaining controllers are in an idle state. The system further includes an event generator for generating an event indicative of a fault occurrence, and a means for detecting a fault event. The system also includes a means for reassigning a controller, wherein upon detection of a fault event in both the first and second controllers, the means for reassigning a controller changes the state of the third controller to active, leaving the remaining controllers idle or in a shutdown state, thereby effectively assigning control from the first controller to the third controller.

Подробнее
28-01-2016 дата публикации

STORAGE CONTROL DEVICE AND STORAGE SYSTEM

Номер: US20160026548A1
Принадлежит:

A control module that manages a segment to which data is written implements write processing and resynchronization processing using a bitmap managed for each LUN. In other words, the control module stores the bitmap for the managed LUN in a bitmap storage unit. A mirror LUN control unit sets a corresponding portion of the bitmap to 1, controls data write to a target segment and a mirror segment, and resets the bitmap to 0 when the data write to both of the segments is complete. A resynchronization control unit refers to the bitmap storage unit to perform the resynchronization processing. 1. A storage control device comprising:a map information memory that stores map information used for resynchronization processing of data to which mirror control is performed in unit of predetermined storage management of a storage device; anda mirror controller that performs the mirror control between a first housing that stores a first storage device to which the storage control device controls an access and a second housing that stores a second storage device to which other storage control device controls an access, and controls, at a time of data write, the data write to the first storage device and the second storage device using the map information memory.2. The storage control device according to claim 1 , further comprising a resynchronization controller that refers to the map information memory claim 1 , when a failure occurs claim 1 , to perform the resynchronization processing.3. The storage control device according to claim 1 , further comprising an internal initiator that makes a transparent access to an area as the unit of predetermined storage management in the second storage device claim 1 , whereinthe storage control device that controls an access to the second storage device includes an internal target controller that causes the corresponding storage control device to make a transparent access to the area as the unit of predetermined storage management in the ...

Подробнее
25-01-2018 дата публикации

FAULT MONITORING DEVICE, VIRTUAL NETWORK SYSTEM, AND FAULT MONITORING METHOD

Номер: US20180024898A1
Автор: YOSHIKAWA Naoya
Принадлежит: NEC Corporation

A fault monitoring device includes a notice reception part configured to receive a notice indicating occurrence of faults from a virtual network device, and a recovery process part configured to carry out a recovery process for one device having the highest priority of fault response among the virtual network device producing the notice, a physical device implementing the virtual network device, and another virtual network device involved in dependency with the virtual network device. 1. A fault monitoring device comprising:a notice reception part configured to receive a notice indicating occurrence of a fault from a virtual network device; anda recovery process part configured to carry out a recovery process for one device having a highest priority of fault response among the virtual network device producing the notice, a physical device implementing the virtual network device, and another virtual network device involved in dependency with the virtual network device.2. The fault monitoring device according to claim 1 , further comprising a configuration storage unit configured to store the virtual network device in correlation with at least one of the physical device implementing the virtual network device and another virtual network device involved in dependency with the virtual network device claim 1 , wherein the recovery process part carries out the recovery process for one device having the highest priority of fault response among the virtual network device and each device stored on the configuration storage unit in correlation with the virtual network device.3. The fault monitoring device according to claim 2 , further comprising an instruction part configured to send an instruction to detect presence/absence of the fault to each device stored on the configuration storage unit in correlation with the virtual network device producing the notice claim 2 , and a result retrieval part configured to retrieve a detection result concerning the presence/absence of ...

Подробнее
24-01-2019 дата публикации

VIRTUALIZED FILE SERVER DATA SHARING

Номер: US20190026101A1
Принадлежит: Nutanix, Inc.

In one embodiment, a system for managing a virtualization environment includes a set of host machines, each of which includes a hypervisor, virtual machines, and a virtual machine controller, and a first virtualized file server configured to receive a request to access a storage item located at a second virtualized file server, determine that the storage item is designated as being accessible by other virtualized file servers, identify an FSVM of the second virtualized file server at which the storage item is located, and forward the request to the FSVM of the second virtualized file server. The storage item may be designated as being accessible by other virtualized file servers when the storage item is associated with a predetermined tag value indicating that the storage item is shared among virtualized file servers. The predetermined tag value may be stored in a sharding map in association with the storage item. 1. A method comprising:receiving, at a first virtualized file server of a first computing device, a request to access a storage item located at a second virtualized file server of a second computing device;identifying a File Server Virtual Machine (FSVM) of the second virtualized file server configured to serve a storage resource storing the storage item; andsending the request to the identified FSVM of the second virtualized file server to access the storage item.2. The method of claim 1 , wherein sending the request to the identified FSVM of the second virtualized file server to access the storage item is in response to the storage item is being permitted to the first virtualized file server.3. The method of claim 2 , wherein determining whether the first virtualized file server is permitted to access the storage item.4. The method of claim 3 , wherein determining whether the first virtualized server is permitted to access the storage item includes determining that the storage item is associated with a tag value indicating that the storage item is shared ...

Подробнее
25-01-2018 дата публикации

MONITORING OF REPLICATED DATA INSTANCES

Номер: US20180026867A1
Принадлежит:

Replicated instances in a distributed computing environment provide for automatic failover and recovery. A component monitors the status of event processors in a set or bucket and handles the failure of an event processor. For a large number of instances, the data environment can be partitioned such that each monitoring component is assigned a partition of the workload. At intervals, each event processor sends a “heartbeat” message to the event processors in the bucket covering the same workload partition, to inform the other event processors of the status of the event processor sending the heartbeat. If it is determined that a heartbeat is received from each event processor in the bucket, a current process can continue. In the event of monitoring component failure, the instances can be repartitioned, and the remaining monitoring components can be assigned to the new partitions to substantially evenly distribute the workload. 1. A computer-implemented method , comprising:assigning a first event processor to one or more workloads in a distributed computing environment;receiving one or more messages on behalf of the first event processor;determining whether at least one of the one or more messages was not received within a first time interval, and, if so,assigning at least a second event processor to the one or more workloads.2. The computer-implemented method of claim 1 , further comprising:determining to continue execution of a current process in the distributing computing environment if at least one of the one or more messages is received from the first event processor.3. The computer-implemented method of claim 1 , further comprising:maintaining, by the first event processor and the at least a second event processor, a list of all event processors assigned to the one or more workloads, a respective status for each event processor, and a respective last check-in time of each event processor.4. The computer-implemented method of claim 1 , further comprising:ordering ...

Подробнее
23-01-2020 дата публикации

SYSTEMS, METHODS, AND APPARATUSES FOR IMPLEMENTING A SCHEDULER AND WORKLOAD MANAGER WITH WORKLOAD RE-EXECUTION FUNCTIONALITY FOR BAD EXECUTION RUNS

Номер: US20200026571A1
Принадлежит: SALESFORCE.COM, INC.

In accordance with disclosed embodiments, there are provided systems, methods, and apparatuses for implementing a stateless, deterministic scheduler and work discovery system with interruption recovery. For instance, according to one embodiment, there is disclosed a system to implement a stateless scheduler service, in which the system includes: a processor and a memory to execute instructions at the system; a compute resource discovery engine to identify one or more computing resources available to execute workload tasks; a workload discovery engine to identify a plurality of workload tasks to be scheduled for execution; a cache to store information on behalf of the compute resource discovery engine and the workload discovery engine; a scheduler to request information from the cache specifying the one or more computing resources available to execute workload tasks and the plurality of workload tasks to be scheduled for execution; and further in which the scheduler is to schedule at least a portion of the plurality of workload tasks for execution via the one or more computing resources based on the information requested. Other related embodiments are disclosed. 1. A method performed by a system having at least a processor and a memory therein , wherein the method comprises:allocating a cache within the memory of the system;identifying, via a workload discovery engine, pending workload tasks to be scheduled for execution from one or more workload queues and updating the cache;identifying, via a compute resource discovery engine, a plurality of computing resources available to execute the workload tasks and updating the cache;identifying, via an external services monitor, a plurality of external services accessible to the workload tasks and updating the cache;executing a scheduler via the processor of the system, wherein the scheduler performs at least the following operations:scheduling the workload tasks for execution on the plurality of computing resources; ...

Подробнее
23-01-2020 дата публикации

OPPORTUNISTIC OFFLINING FOR FAULTY DEVICES IN DATACENTERS

Номер: US20200026591A1
Принадлежит:

Embodiments relate to determining whether to take a resource distribution unit (RDU) of a datacenter offline when the RDU becomes faulty. RDUs in a cloud or datacenter supply a resource such as power, network connectivity, and the like to respective sets of hosts that provide computing resources to tenant units such as virtual machines (VMs). When an RDU becomes faulty some of the hosts that it supplies may continue to function and others may become unavailable for various reasons. This can make a decision of whether to take the RDU offline for repair difficult, since in some situations countervailing requirements of the datacenter may be at odds. To decide whether to take an RDU offline, the potential impact on availability of tenant VMs, unused capacity of the datacenter, a number or ratio of unavailable hosts on the RDU, and other factors may be considered to make a balanced decision. 1. A method performed by one or more control server devices participating in a cloud fabric that controls a compute cloud , the compute cloud comprised of a plurality of cloud server devices , the cloud further comprising cloud hardware assets , each cloud hardware asset servicing a respective set of the cloud server devices , the cloud server devices hosting tenant components of tenants of the cloud , the method comprising:receiving a notification that a cloud hardware asset is in a failure state, the cloud hardware asset servicing a set of the cloud server devices; and 'determining how making the set of cloud server devices unavailable would affect: (i) a measure of availability of a population of the tenant components, wherein making the set of cloud server devices unavailable would render some of the tenant components unavailable, and (ii) a measure and/or prediction of cloud capacity for a resource of the cloud.', 'based on the notification, determining whether to place the cloud hardware asset in a state of unavailability, wherein when the cloud hardware asset enters the ...

Подробнее
23-01-2020 дата публикации

CONTROLLING PROCESSING ELEMENTS IN A DISTRIBUTED COMPUTING ENVIRONMENT

Номер: US20200026605A1
Принадлежит:

A computer system controls processing elements associated with a stream computing application. A stream computing application is monitored for the occurrence of one or more conditions. One or more processing element groups are determined to be restarted based on occurrence of the one or more conditions, wherein the processing element groups each include a plurality of processing elements associated with the stream computing application. Each processing element of the determined one or more processing element groups is concurrently restarted. Embodiments of the present invention further include a method and program product for controlling processing elements within a stream computing application in substantially the same manner described above. 1. A computer-implemented method of controlling processing elements associated with a stream computing application comprising:monitoring a stream computing application for occurrence of one or more conditions;determining one or more processing element groups to restart based on occurrence of the one or more conditions, wherein the processing element groups each include a plurality of processing elements associated with the stream computing application; andconcurrently restarting each processing element of the determined one or more processing element groups.2. The computer-implemented method of claim 1 , wherein determining one or more processing element groups further comprises:establishing at least one processing element group based on a configuration attribute.3. The computer-implemented method of claim 1 , wherein the one or processing elements include a plurality of operators claim 1 , and determining one or more processing element groups further comprises:establishing at least one processing element group based on locations of processing elements within an operator graph indicating a flow through the operators.4. The computer-implemented method of claim 1 , wherein the one or processing elements include a plurality of ...

Подробнее
23-01-2020 дата публикации

INTELLIGENT LOG GAP DETECTION TO ENSURE NECESSARY BACKUP PROMOTION

Номер: US20200026622A1
Принадлежит:

An intelligent log gap detection to ensure necessary backup promotion. Specifically, a method and system are disclosed, which entail determining whether to pursue a differential database backup or promote the differential database backup to a full database backup, in order to preclude data loss across high availability databases. The deduction pivots on a matching or mismatching between log sequence numbers (LSNs). 1. A method for intelligent log gap detection , comprising:receiving a first database backup request for a first differential database backup on a database availability cluster (DAC);making a first determination that a first full database backup has already been performed;obtaining, based on the first determination, a checkpoint log sequence number (LSN) associated with the first full database backup;making a second determination that the checkpoint LSN mismatches a first differential base LSN (DBL);detecting, based on the second determination, a log gap across the DAC; andpromoting, based on the detecting the log gap, the first differential database backup to a second full database backup.2. The method of claim 1 , wherein making the first determination claim 1 , comprises:performing a search of a cluster backup chain table (BCT) in reverse chronological order, wherein the cluster BCT comprises a plurality of cluster backup chain records (BCRs); andidentifying a cluster BCR of the plurality of cluster BCRs based on a backup identifier (ID) specified therein, wherein the backup ID identifies the cluster BCR as being associated with a full database backup.3. The method of claim 2 , wherein the cluster BCR comprises an object ID claim 2 , the backup ID claim 2 , a first LSN claim 2 , a last LSN claim 2 , the checkpoint LSN claim 2 , and a database backup LSN.4. The method of claim 1 , further comprising:issuing, based on the promoting, a full backup command (FBC).5. The method of claim 1 , further comprising:receiving a second database backup request for a ...

Подробнее
28-01-2021 дата публикации

METHODS, SYSTEMS, AND COMPUTER READABLE MEDIA FOR PROVIDING BYZANTINE FAULT TOLERANCE

Номер: US20210026745A1
Автор: Wang Yongge
Принадлежит:

Methods, systems, and computer readable media for providing Byzantine fault tolerance (BFT) are disclosed. According to one method, a method for providing BFT occurs at a computing platform executing a BFT protocol, wherein the computing platform is acting as a leader participant of a round of the BFT protocol. The method comprising: receiving signed round-change messages from multiple participants in the round; broadcasting a signed lock message indicating that signed round-change messages have been received from a predetermined number of the participants in the round voting for a same candidate block; receiving signed commit messages from multiple participants in the round; and broadcasting a signed decide message indicating the candidate block is a finalized block after the predetermined number of the participants in the round have sent signed commit messages indicating the candidate block. 1. A method for providing Byzantine fault tolerance (BFT) , the method comprising: receiving signed round-change messages from multiple participants in the round;', 'broadcasting a signed lock message indicating that signed round-change messages have been received from a predetermined number of the participants in the round voting for a same candidate block;', 'receiving signed commit messages from multiple participants in the round; and', 'broadcasting a signed decide message indicating the candidate block is a finalized block after the predetermined number of the participants in the round have sent signed commit messages indicating the candidate block., 'at a computing platform executing a BFT protocol, wherein the computing platform is acting as a leader participant of a round of the BFT protocol2. The method of wherein the predetermined number of the participants includes at least 2t+1 participants claim 1 , where t represents an amount of malicious participants in the round.3. The method of wherein a participant in the round receives the decide message from the leader ...

Подробнее