Настройки

Укажите год
-

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее
-

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Ведите корректный номера.
Укажите год
Укажите год

Применить Всего найдено 235. Отображено 164.
30-12-2014 дата публикации

Phased bucket pre-fetch in a network processor

Номер: US0008923306B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. Based on a prefetch status, a selection of the subset of rules are retrieved for rule matching. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found.

Подробнее
02-11-2017 дата публикации

SYSTEMS AND METHODS FOR TEXT ANALYTICS PROCESSOR

Номер: US20170315984A1
Принадлежит:

A hardware-based programmable text analytics processor has a plurality of components including at least a tokenizer, a tagger, a parser, and a classifier. The tokenizer processes an input stream of unstructured text data and identifies a sequence of tokens along with their associated token ids. The tagger assigns a tag to each of the sequence of tokens from the tokenizer using a trained machine learning model. The parser parses the tagged tokens from the tagger and creates a parse tree for the tagged tokens via a plurality of shift, reduce and/or finalize transitions based on a trained machine learning model. The classifier performs classification for tagging and parsing by accepting features extracted by the tagger and the parser, classifying the features and returning classes of the features back to the tagger and the parser, respectively. The TAP then outputs structured data to be processed for various text analytics processing applications.

Подробнее
14-02-2013 дата публикации

Packet Classification

Номер: US20130039366A1
Принадлежит: Cavium, Inc.

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure including a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. The methods may produce wider, shallower trees that result in shorter search times and reduced memory requirements for storing the trees. 1. A method comprising:using a classifier table having a plurality of rules, the plurality of rules having at least one field, building a decision tree structure including a plurality of nodes, each node representing a subset of the plurality of rules;for each node of the decision tree, (a) determining a number of cuts that may be made on each at least one field creating child nodes equal to the number of cuts;(b) selecting a field on which to cut the node based on a comparison of an average of a difference between an average number of rules per child node created and an actual number of rules per child node created per each at least one field;(c) cutting the node into a number of child nodes on the selected field; andstoring the decision tree structure.2. The method of wherein determining the number of cuts is based on a maximum number of cuts for a given storage capacity.3. The method of wherein selecting includes selecting the field on which to cut the node into a number of child nodes based on the field being a field of the at least one field with the smallest average of the difference between an average number of rules per child node and an actual number of rules per child node.4. The method of wherein cutting includes cutting the node only if the node has greater than a predetermined number of the subset of the plurality of rules.5. The method of wherein the predetermined number is an adjustable number claim 4 , the ...

Подробнее
03-11-2011 дата публикации

METHOD AND APPARATUS FOR A VIRTUAL SYSTEM ON CHIP

Номер: US20110271277A1
Принадлежит: Cavium Networks, Inc.

A virtual system on chip (VSoC) is an implementation of a machine that allows for sharing of underlying physical machine resources between different virtual systems. A method or corresponding apparatus of the present invention relates to a device that includes a plurality of virtual systems on chip and a configuring unit. The configuring unit is arranged to configure resources on the device for the plurality of virtual systems on chip as a function of an identification tag assigned to each virtual system on chip.

Подробнее
20-12-2016 дата публикации

Method and apparatus for assigning resources used to manage transport operations between clusters within a processor

Номер: US0009525630B2
Принадлежит: Cavium, Inc.

A method, and corresponding apparatus, of assigning processing resources used to manage transport operations between a first memory cluster and one or more other memory clusters, include receiving information indicative of allocation of a subset of processing resources in each of the one or more other memory clusters to the first memory cluster, storing, in the first memory cluster, the information indicative of resources allocated to the first memory cluster, and facilitating management of transport operations between the first memory cluster and the one or more other memory clusters based at least in part on the information indicative of resources allocated to the first memory cluster.

Подробнее
30-08-2016 дата публикации

Method and apparatus for compiling search trees for processing request keys based on a key size supported by underlying processing elements

Номер: US0009432284B2
Принадлежит: Cavium, Inc., CAVIUM INC

A packet classification system, methods, and apparatus are provided for packet classification. A processor of a router coupled to a network compiles at least one search tree based on a rules set. The processor determines an x number of search phases needed to process an incoming key corresponding to the rules set, wherein the rules set includes a plurality of rules, where each of the plurality of rules includes an n number of rule fields and where the incoming key includes an n number of processing fields. The processor generates an x set of search trees, where each of the x set of search trees corresponds to a respective one of the x number of search phases. Also, the processor provides the x set of search trees to a search processor, where each of the x set of search trees is configured to process respective portions of the incoming key.

Подробнее
11-11-2014 дата публикации

Deterministic finite automata graph traversal with nodal bit mapping

Номер: US0008886680B2
Автор: Rajan Goyal, GOYAL RAJAN
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

An apparatus, and corresponding method, for generating a graph used in performing a search for a match of at least one expression in an input stream is presented. The graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc may also include a nodal bit map including structural information of a node to which the valid arc points to. A walker process may utilize the nodal bit map to determine if a memory access is necessary. The nodal bit map reduces the number of external memory access and therefore reduces system run time.

Подробнее
24-10-2013 дата публикации

Incremental Update Heuristics

Номер: US20130282766A1
Принадлежит:

A system, apparatus, and method are provided for receiving one or more incremental updates including adding, deleting, or modifying rules of a Rule Compiled Data Structure (RCDS) used for packet classification. Embodiments disclosed herein may employ at least one heuristic for maintaining quality of the RCDS. At a given one of the one or more incremental updates received, a section of the RCDS may be identified and recompilation of the identified section may be triggered, altering the RCDS shape or depth in a manner detected by the at least one heuristic employed. The at least one heuristic employed enables performance and functionality of an active search process using the RCDS to be improved by advantageously determining when and where to recompile one or more sections of the RCDS being searched. 1. A method comprising:receiving one or more incremental updates for a Rule Compiled Data Structure (RCDS) representing a plurality of rules, the plurality of rules having at least one field, the RCDS representing the plurality of rules as a decision tree for packet classification;determining one or more updates for the RCDS based on the one or more incremental updates received;employing at least one heuristic for maintaining quality of the RCDS; andat a given one of the one or more incremental updates received, identifying a section of the RCDS and triggering recompilation of the identified section based on the one or more updates determined altering the RCDS shape or depth in a manner detected by the at least one heuristic employed.2. The method of wherein maintaining quality includes enabling a balanced shape of the RCDS claim 1 , increasing a number of decision tree paths in the RCDS claim 1 , or controlling a number of decision tree levels in the RCDS by recompiling the identified section.3. The method of wherein the RCDS includes a plurality of nodes claim 1 , each node representing a subset of the plurality of rules and the manner detected by the at least one ...

Подробнее
15-10-2015 дата публикации

Compilation of Finite Automata Based on Memory Hierarchy

Номер: US20150295889A1
Принадлежит: Cavium, Inc.

At least one per-pattern non-deterministic finite automaton (NFA) may be generated for a single regular expression pattern and may include a respective set of nodes. Nodes of the respective set of nodes of each per-pattern NFA generated may be distributed for storing in a plurality of memories based on hierarchical levels mapped to the plurality of memories and per-pattern NFA storage allocation settings configured for the hierarchical levels, optimizing run time performance for matching regular expression patterns in an input stream.

Подробнее
14-03-2013 дата публикации

METHOD AND APPARATUS FOR MULTIPLE ACCESS OF PLURAL MEMORY BANKS

Номер: US20130067173A1
Принадлежит: Cavium, Inc.

A processor with on-chip memory including a plurality of physical memory banks is disclosed. The processor includes a method, and corresponding apparatus, of enabling multi-access to the plurality of physical memory banks The method comprises selecting a subset of multiple access requests to be executed in at least one clock cycle over at least one of a number of access ports connected to the plurality of physical memory banks, the selected subset of access requests addressed to different physical memory banks, among the plurality of memory banks, and scheduling the selected subset of access requests, each over a separate access port.

Подробнее
28-05-2020 дата публикации

STATIC DICTIONARY-BASED COMPRESSION HARDWARE PIPELINE FOR DATA COMPRESSION ACCELERATOR OF A DATA PROCESSING UNIT

Номер: US20200169268A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data processing functions. This disclosure describes a programmable hardware-based data compression accelerator that includes a pipeline for performing static dictionary-based and dynamic history-based compression on streams of information, such as network packets. The search block may support single and multi-thread processing, and multiple levels of compression effort. To achieve high-compression, the search block may operate at a high level of effort that supports a single thread and use of both a dynamic history of the input data stream and a static dictionary of common words. The static dictionary may be useful in achieving high-compression where the input data stream is relatively small. 1. A method comprising:receiving, by a search engine implemented as a pipeline of a processing device, an input data stream to be compressed;identifying, by the search engine, one or more dictionary addresses of one or more words having different word lengths stored in a static dictionary that potentially match a current byte string beginning at a current byte position in the input data stream;determining, by the search engine, whether at least one match occurs for the current byte string from among the one or more words at the dictionary addresses;selecting, by the search engine, an output for the current byte position, wherein the output for the current byte position comprises one of a reference to a match for the current byte string or a literal of original data at the current byte position; andtransmitting, by the search engine, the selected output for the current byte position in an output data stream.2. The method of claim 1 , wherein ...

Подробнее
24-02-2009 дата публикации

Dual mode firewall

Номер: US0007496955B2

Methods and devices are provided for implementing a dual mode firewall. Some implementions provide a firewall in a network device that acts as bridge for layer 2 traffic and acts as a router for layer 3 traffic. In some implementions, a determination of whether to act as a bridge or a router for a packet is based on the configuration of the interface handling the packet. In some implementations, the network device inspects a destination of each packet to determine whether to act as a bridge or a router for that packet. The firewall screens both the layer 2 and the layer 3 traffic according to policies implement in the firewall.

Подробнее
30-09-2014 дата публикации

System and method to provide non-coherent access to a coherent memory system

Номер: US0008850125B2

In one embodiment, a system comprises a memory and a memory controller that provides a cache access path to the memory and a bypass-cache access path to the memory, receives requests to read graph data from the memory on the bypass-cache access path and receives requests to read non-graph data from the memory on the cache access path. A method comprises receiving a request at a memory controller to read graph data from a memory on a bypass-cache access path, receiving a request at the memory controller to read non-graph data from the memory through a cache access path, and arbitrating, in the memory controller, among the requests using arbitration.

Подробнее
24-11-2020 дата публикации

Memory layout for JPEG accelerator

Номер: US0010848775B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A device includes a memory configured to store image data and an image coding unit implemented in circuitry. The image coding unit is configured to store a first portion of a set of context information in memory of the image coding unit as an array representing a direct access table and store a second portion of the set of context information in a hash table. The image coding unit is further configured to determine whether a context value for context-based coding of a value of an instance of a syntax element for a block of image data is stored in the array or in the hash table, retrieve the context value from either the array or the hash table according to the determination, and context-based code the value of the instance of the syntax element using the context value.

Подробнее
26-08-2014 дата публикации

Intelligent graph walking

Номер: US0008819217B2
Принадлежит: Cavium, Inc.

An apparatus, and corresponding method, for performing a search for a match of at least one expression in an input stream is presented. A graph including a number of interconnected nodes is generated. A compiler may assign at least one starting node and at least one ending node. The starting node includes a location table with node position information of an ending node and a sub-string value associated with the ending node. Using the node position information and a string comparison function, intermediate nodes located between the starting and ending nodes may be bypassed. The node bypassing may reduce the number of memory accesses required to read the graph.

Подробнее
15-10-2013 дата публикации

System and method to reduce memory access latencies using selective replication across multiple memory ports

Номер: US0008560757B2

In one embodiment, a system includes memory ports distributed into subsets identified by a subset index, where each memory port has an individual wait time based on a respective workload. The system further comprises a first address hashing unit configured to receive a read request including a virtual memory address associated with a replication factor and referring to graph data. The first address hashing unit translates the replication factor into a corresponding subset index based on the virtual memory address, and converts the virtual memory address to a hardware based memory address referring to graph data in the memory ports within a subset indicated by the corresponding subset index. The system further comprises a memory replication controller configured to direct read requests to the hardware based address to the one of the memory ports within the subset indicated by the corresponding subset index with a lowest individual wait time.

Подробнее
15-11-2016 дата публикации

Lookup front end packet output processor

Номер: US0009497117B2
Принадлежит: Cavium, Inc., CAVIUM INC

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host.

Подробнее
04-12-2014 дата публикации

Method and Apparatus for a Virtual System on Chip

Номер: US20140359622A1
Принадлежит:

A virtual system on chip (VSoC) is an implementation of a machine that allows for sharing of underlying physical machine resources between different virtual systems. A method or corresponding apparatus of the present invention relates to a method that includes a plurality of virtual systems on chip and a configuring unit. The configuring unit is arranged to configure resources on the method for the plurality of virtual systems on chip as a function of an identification tag assigned to each virtual system on chip. 1. A device comprising:a plurality of processing cores on a single physical chip;a plurality of virtual systems on chip, each virtual system on chip (VSoC) relating to a subset of the plurality of processing cores on the single physical chip; and assign a given identification tag of a plurality of identification tags to a work request based on metadata extracted from the work request; and', 'assign the work request to a given VSoC of the plurality of virtual systems on chip based on the given identification tag assigned to the work request., 'an input packet processing unit configured to2. The device of claim 1 , wherein to assign the work request the input packet processing unit is further configured to:employ a lookup table, the lookup table configured to associate the metadata extracted with the given identification tag; andthe given identification tag is a unique identification tag of the plurality of identification tags assigned to the given VSoC.3. The device of claim 1 , wherein the metadata extracted includes at least one field of a plurality of fields of a packet.4. The device of claim 1 , wherein the metadata extracted represents a Media Access Control (MAC) address or an Internet Protocol address.5. The device of claim 1 , wherein the input packet processing unit is further configured to:store data included in the work request by employing a first pool of pointers corresponding to first free resources of a plurality of resources configured to ...

Подробнее
30-08-2012 дата публикации

Regular Expression Processing Automaton

Номер: US20120221497A1
Принадлежит:

A method and corresponding apparatus are provided implementing a stage one of run time processing using Deterministic Finite Automata (DFA) and implementing a stage two of run time processing using Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload, such as the payload portion of an Internet Protocol (IP) datagram, or an input stream. 1. A method comprising:in a processor of a security appliance coupled to a network:generating an initial NFA for a set of patterns;generating an initial DFA for the set of patterns, the initial DFA generated having states representing a subset of the set of patterns, having a number of states less than a number of states of a complete DFA having states representing the set of patterns, completely, and having at least one end-state, where each end-state maps to one or more states of the initial NFA for the set of patterns and represents a transition from processing the set of patterns as DFA to processing the set of patterns as NFA;adding states to the initial DFA, extending from the at least one end-state, to form an extended DFA for the set of patterns, the states added to satisfy one or more following conditions: i) reduce a number of instances the set of patterns transition from being processed as DFA to being processed as NFA, ii) reduce a number of states of the NFA to track when the set of patterns transition from being processed as DFA to being processed as NFA, and iii) a number of states of the extended DFA is less than the number of states of the complete DFA, the extended DFA represents a portion of the set of patterns to be processed as DFA, the portion of the set of patterns being of at least a predefined size;mapping at least one of the added DFA states to one or more states of the NFA to form an end-state of the extended DFA, which when processed, transitions run time processing for finding the set of patterns in an input stream from DFA to NFA so that the portion of the set of ...

Подробнее
21-01-2016 дата публикации

Reverse NFA Generation And Processing

Номер: US20160021060A1
Принадлежит:

In a processor of a security appliance, an input of a sequence of characters is walked through a finite automata graph generated for at least one given pattern. At a marked node of the finite automata graph, if a specific type of the at least one given pattern is matched at the marked node, the input sequence of characters is processed through a reverse non-deterministic finite automata (rNFA) graph generated for the specific type of the at least one given pattern by walking the input sequence of characters backwards through the rNFA beginning from an offset of the input sequence of characters associated with the marked node. Generating the rNFA for a given pattern includes inserting processing nodes for processing an input sequence of patterns to determine a match for the given pattern. In addition, the rNFA is generated from the given type of pattern.

Подробнее
21-11-2017 дата публикации

Memory management for finite automata processing

Номер: US0009823895B2
Принадлежит: Cavium, Inc., CAVIUM INC

Matching at least one regular expression pattern in an input stream may be optimized by initializing a search context in a run stack based on (i) partial match results determined from walking segments of a payload of a flow through a first finite automation and (ii) a historical search context associated with the flow. The search context may be modified via push or pop operations to direct at least one processor to walk segments of the payload through the at least one second finite automation. The search context may be maintained in a manner that obviates overflow of the search context and obviating stalling of the push or pop operations to increase match performance.

Подробнее
29-12-2015 дата публикации

Lookup cluster complex

Номер: US0009225643B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found.

Подробнее
05-03-2015 дата публикации

METHOD AND APPARATUS FOR COMPILATION OF FINITE AUTOMATA

Номер: US20150067776A1
Принадлежит: Cavium, Inc.

A method and corresponding apparatus are provided implementing run time processing using Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload. A subpattern may be selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic and a unified deterministic finite automata (DFA) may be generated using the subpatterns selected from all patterns in the set, and at least one non-deterministic finite automata (NFA) may be generated for at least one pattern in the set, optimizing run time performance of the run time processing. 1. A security appliance operatively coupled to a network , the security appliance comprising:at least one memory; select a subpattern from each pattern in a set of one or more regular expression patterns based on at least one heuristic;', 'generate a unified deterministic finite automata (DFA) using the subpatterns selected from all patterns in the set;', 'generate at least one non-deterministic finite automata (NFA) for at least one pattern in the set, a portion of the at least one pattern used for generating the at least one NFA, and at least one walk direction for run time processing of the at least one NFA, being determined based on whether a length of the subpattern selected is fixed or variable and a location of the subpattern selected within the at least one pattern; and', 'store the unified DFA and the at least one NFA generated in the at least one memory., 'at least one processor operatively coupled to the at least one memory, the at least one processor configured to2. The security appliance of claim 1 , wherein the at least one heuristic includes maximizing a number of unique subpatterns selected and length of each subpattern selected.3. The security appliance of claim 1 , wherein the processor is further configured to determine whether the length of the subpattern selected is fixed or variable.4. The security appliance ...

Подробнее
12-05-2015 дата публикации

Lookup front end packet input processor

Номер: US0009031075B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host.

Подробнее
25-04-2023 дата публикации

Data flow graph-driven analytics platform using data processing units having hardware accelerators

Номер: US0011636154B2
Принадлежит: FUNGIBLE, INC., Fungible, Inc.

A data flow graph-driven analytics platform is described in which highly-programmable data stream processing devices, referred to generally herein as data processing units (DPUs), operate to provide a scalable, fast and efficient analytics processing architecture. In general, the DPUs are specialized data-centric processors architected for efficiently applying data manipulation operations (e.g., regular expression operations to match patterns, filtering operations, data retrieval, compression/decompression and encryption/decryption) to streams of data units, such as packet flows having network packets, a set of storage packets being retrieved from or written to storage or other data units.

Подробнее
08-10-2015 дата публикации

PHASED BUCKET PRE-FETCH IN A NETWORK PROCESSOR

Номер: US20150288700A1
Принадлежит:

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. Based on a prefetch status, a selection of the subset of rules are retrieved for rule matching. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. An apparatus for processing a packet comprising: receive a key request including a key, the key including data extracted from a packet;', 'parse the key to extract at least one field;', 'select at least one entry in a tree access table indicated by the key request, the entry providing an address of a set of rules stored in a memory; and', 'process the entry, based on the at least one field, to determine at least one bucket having an ordered set of bucket entries, the bucket entries including pointers to respective subsets of rules, the subsets of rules each being a portion of the set of rules;', 'a bucket-walk engine (BWE) configured to retrieve a selection of the subsets of rules from the memory, the selection corresponding to a configuration of the bucket;', 'a rule-matching engine (RME) configured to apply the at least one field against the selection and output a response signal., 'a tree walk engine (TWE) configured to2. The apparatus of claim 1 , wherein the RME is configured to apply the at least one field against each subset of rules independent of an order of the respective bucket entries.3. The apparatus of claim 1 , wherein the RME is configured to apply the at least one field against each subset of rules in parallel.4. The apparatus of claim 1 , wherein the BWE claim 1 , in response to the response signal indicating a match claim ...

Подробнее
14-03-2017 дата публикации

Compiler with mask nodes

Номер: US0009595003B1
Принадлежит: Cavium, Inc., CAVIUM INC

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure including a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. The plurality of nodes may be stride nodes, mask nodes, or a combination thereof. A mask node may remove restrictions of stride nodes, such as markers and consumption of contiguous bits. As long as a bit of a field is a non-consumed bit, the bit may be used for cutting a field in a mask node. An advantage of a mask node is that the mask node may consume fewer resources (e.g., memory) than a stride node.

Подробнее
25-06-2013 дата публикации

Deterministic finite automata graph traversal with nodal bit mapping

Номер: US0008473523B2
Автор: Rajan Goyal, GOYAL RAJAN
Принадлежит: Cavium, Inc., GOYAL RAJAN, CAVIUM INC, CAVIUM, INC.

An apparatus, and corresponding method, for generating a graph used in performing a search for a match of at least one expression in an input stream is presented. The graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc may also include a nodal bit map including structural information of a node to which the valid arc points to. A walker process may utilize the nodal bit map to determine if a memory access is necessary. The nodal bit map reduces the number of external memory access and therefore reduces system run time.

Подробнее
01-05-2014 дата публикации

LOOKUP FRONT END PACKET INPUT PROCESSOR

Номер: US20140119378A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host. 1. A method of processing a packet comprising:generating at least one key based on data of a lookup request;determining, based on an identifier of the lookup request, a subset of processing clusters that are capable of operating rule matching for the at least one key;selecting at least one of the processing clusters of the subset based on availability; andforwarding at least one key request to the at least one selected processing cluster, the key request including the at least one key to initiate rule matching using the key.2. The method of claim 1 , further comprising comparing the identifier against a table to determine a packet header index (PHIDX).3. The method of claim 2 , wherein the at least one key is generated according to the PHIDX.4. The method of claim 3 , wherein the PHIDX indexes an entry in a packet header table (PHT) claim 3 , the entry indicating rules for extracting data from the lookup request to generate the at least one key.5. The method of claim 1 , further comprising comparing the identifier against a table to determine a key format table index (KFTIDX).6. The method of claim 5 , wherein the KFTIDX indexes an entry in a key format table claim 5 , the entry indicating instructions for extracting fields from the key at the processing cluster.7. The method of claim 5 , wherein the key request further ...

Подробнее
16-03-2021 дата публикации

Multimode cryptographic processor

Номер: US0010951393B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

This disclosure describes techniques that include performing cryptographic operations (encryption, decryption, generation of a message authentication code). Such techniques may involve the data processing unit performing any of multiple modes of encryption, decryption, and/or other cryptographic operation procedures or standards, including, Advanced Encryption Standard (AES) cryptographic operations. In some examples, the security block is implemented as a unified, multi-threaded, high-throughput encryption and decryption system for performing multiple modes of AES operations.

Подробнее
27-12-2012 дата публикации

Anchored Patterns

Номер: US20120331007A1
Принадлежит:

A method and apparatus relate to recognizing anchored patterns from an input stream. Patterns from a plurality of given patterns are marked as anchored patterns. An anchored state tree for the anchored patterns of the plurality of given patterns is built, including nodes representing a state of the anchored state tree. For each node of the anchored state tree, a failure value equivalent to a node representing a state in an unanchored state tree representing unanchored patterns of the plurality of given patterns is determined.

Подробнее
06-05-2014 дата публикации

Work migration in a processor

Номер: US0008719331B2

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. A work product may be migrated between lookup engines to complete the rule matching process. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found.

Подробнее
07-03-2013 дата публикации

LOOKUP FRONT END PACKET INPUT PROCESSOR

Номер: US20130058332A1
Автор: Rajan Goyal
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host. 1. An apparatus comprising:a memory storing a Rule Compiled Data Structure (RCDS), the RCDS representing a set of rules for packet classification;a host command interface, the host command interface configured to receive one or more host commands for an incremental update for the RCDS;a processor coupled to the memory and the host command interface, the processor configured to perform an active search of the RCDS for classifying received packets, the RCDS being updated based on the one or more host commands received, the RCDS being atomically updated from the perspective of the active search being performed.2. The apparatus of claim 1 , wherein the one or more host commands is a host request packet including a request host transaction identifier and a host access command claim 1 , the host access command specifying a host read or host write access to the RCDS claim 1 , the host access command including a control header claim 1 , the control header including one or more ordering flags claim 1 , the processor further configured to:decode the one or more ordering flags; anddetermine an ordering of a host lookup request and the host access command specified based on the one or more ordering flags decoded.3. The apparatus of claim 2 , wherein the one or more ordering flags includes a host response flag claim 2 , the host ...

Подробнее
07-06-2012 дата публикации

GRAPH CACHING

Номер: US20120143854A1
Принадлежит: Cavium, Inc.

In a method and apparatus for analyzing nodes of a Deterministic Finite Automata (DFA), an accessibility ranking, based on a DFA graph geometrical configuration, may be determined in order to determine cacheable portions of the DFA graph in order to reduce the number of external memory accesses. A walker process may be configured to walk the graph in a graph cache as well as main memory. The graph may be generated in a manner allowing each arc to include information if the node it is pointing to is stored in the graph cache or in main memory. The walker may use this information to determine whether or not to access the next arc in the graph cache or in main memory.

Подробнее
18-08-2015 дата публикации

Method and an accumulator scoreboard for out-of-order rule response handling

Номер: US0009112767B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

According to at least one example embodiment, a method and a corresponding accumulator scoreboard for managing bundles of rule matching threads processed by one or more rule matching engines comprise: recording, for each rule matching thread in a given bundle of rule matching threads, a rule matching result in association with a priority corresponding to the respective rule matching thread; determining a final rule matching result, for the given bundle of rule matching threads, based at least in part on the corresponding indications of priorities; and generating a response state indicative of the determined final rule matching result for reporting to a host processor or a requesting processing engine.

Подробнее
27-04-2021 дата публикации

Flexible reliability coding for storage on a network

Номер: US0010990478B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

This disclosure describes a programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets. This disclosure also describes techniques that include enabling data durability coding on a network. In some examples, such techniques may involve storing data in fragments across multiple fault domains in a manner that enables efficient recovery of the data using only a subset of the data. Further, this disclosure describes techniques that include applying a unified approach to implementing a variety of durability coding schemes. In some examples, such techniques may involve implementing each of a plurality of durability coding and/or erasure coding schemes using a common matrix approach, and storing, for each durability and/or erasure coding scheme, an appropriate set of matrix coefficients.

Подробнее
09-08-2007 дата публикации

Supporting options in a communication session using a TCP cookie

Номер: US20070185998A1
Принадлежит: Cisco Technology, Inc.

A defender operable to support options in a communication session intercepts a connection request packet sent from a client to a server. The defender identifies a client option combination associated with the client from the connection request packet. The defender establishes a client option index corresponding to the client option combination, and encodes the client option index into a cookie of an acknowledgment packet. The defender then sends the acknowledgment packet to the client.

Подробнее
16-01-2020 дата публикации

INCREMENTAL COMPILATION OF FINITE AUTOMATA FOR A REGULAR EXPRESSION ACCELERATOR

Номер: US20200019391A1
Принадлежит:

A compiler/loader unit for a RegEx accelerator is described that receives a first set of regular expression rules for implementing the RegEx accelerator, generates, based on the first set of regular expression rules, an initial deterministic finite automata (DFA) graph, and generates, an initial memory map for allocating the initial DFA graph to a memory of the RegEx accelerator. The compiler/loader unit receives receive, a second set of one or more new or modified regular expression rules for implementing the RegEx accelerator and in response performs incremental compilation of the second set of regular expressions. The compiler/loader unit generates, based on the second set of one or more regular expression rules, a supplemental DFA graph and reconciles the initial DFA graph with the supplemental DFA graph to generate an updated memory map for allocating the initial DFA graph and the supplemental DFA graph to the memory of the RegEx accelerator. 1. A method comprising:receiving, at a compiler/loader unit executing at a computing device, a first set of one or more regular expression rules for implementing a RegEx accelerator;generating, by the compiler/loader unit, based on the first set of regular expression rules, an initial deterministic finite automata (DFA) graph;generating, by the compiler/loader unit, an initial memory map for allocating the initial DFA graph to a memory of the RegEx accelerator;receiving, at the compiler/loader unit, a second set of one or more regular expression rules for implementing the RegEx accelerator, the second set of one or more regular expression rules being different than the first set of one or more regular expression rules;generating, by the compiler/loader unit, based on the second set of one or more regular expression rules, a supplemental DFA graph; andreconciling, by the compiler/loader unit, the initial DFA graph with the supplemental DFA graph to generate an updated memory map for allocating the initial DFA graph and the ...

Подробнее
25-09-2018 дата публикации

Batch incremental update

Номер: US0010083200B2
Принадлежит: Cavium, Inc., CAVIUM INC

A system, apparatus, and method are provided for adding, deleting, and modifying rules in one update from the perspective of an active search process for packet classification. While a search processor searches for one or more rules that match keys generated from received packets, there is a need to add, delete, or modify rules. By organizing a plurality incremental updates for adding, deleting, or modifying rules into a batch update, several operations for incorporating the incremental updates may be made more efficient by minimizing a number of updates required.

Подробнее
08-09-2015 дата публикации

Method and apparatus for scheduling rule matching in a processor

Номер: US0009130819B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

In a network search processor, configured to handle search requests in a router, a scheduler for scheduling rule matching threads initiated by a plurality of initiating engines is designed to make efficient use of the resources in the network search processor while providing high speed performance. According to at least one example embodiment, the scheduler and a corresponding scheduling method comprise: determining a set of bundles of rule matching threads, each bundle being initiated by a separate initiating engine; distributing rule matching threads in each bundle into a number of subgroups of rule matching threads; assigning the subgroups of rule matching threads associated with each bundle of the set of bundles to multiple scheduling queues; and sending rule matching threads, assigned to each scheduling queue, to rule matching engines according to an order based on priorities associated with the respective bundles of rule matching threads.

Подробнее
03-01-2023 дата публикации

Access node for data centers

Номер: US0011546189B2
Принадлежит: Fungible, Inc.

An access node that can be configured and optimized to perform input and output (I/O) tasks, such as storage and retrieval of data to and from network devices (such as solid state drives), networking, data processing, and the like. For example, the access node may be configured to receive data to be processed, wherein the access node includes a plurality of processing cores, a data network fabric, and a control network fabric; receive, over the control network fabric, a work unit message indicating a processing task to be performed a processing core; and process the work unit message, wherein processing the work unit message includes retrieving data associated with the work unit message over the data network fabric.

Подробнее
05-03-2015 дата публикации

Memory Management for Finite Automata Processing

Номер: US20150067200A1
Принадлежит: Cavium, Inc.

Matching at least one regular expression pattern in an input stream may be optimized by initializing a search context in a run stack based on (i) partial match results determined from walking segments of a payload of a flow through a first finite automation and (ii) a historical search context associated with the flow. The search context may be modified via push or pop operations to direct at least one processor to walk segments of the payload through the at least one second finite automation. The search context may be maintained in a manner that obviates overflow of the search context and obviating stalling of the push or pop operations to increase match performance. 1. A security appliance operatively coupled to a network , the security appliance comprising:at least one memory configured to store a first finite automaton, at least one second finite automaton, and a run stack; and initializing a search context in the run stack based on (i) partial match results determined from walking segments of a payload of the flow through the first finite automaton and (ii) a historical search context associated with the flow;', 'modifying the search context via push or pop operations to direct the at least one processor to walk segments of the payload through the at least one second finite automaton to explore whether at least one partial match of at least one regular expression pattern advances along at least one path of the at least one second finite automaton; and', 'maintaining the search context in a manner obviating overflow of the search context and obviating stalling of the push or pop operations., 'at least one processor operatively coupled to the at least one memory and configured to search for at least one regular expression pattern in a flow, the search including2. The security appliance of claim 1 , wherein the search context includes a plurality of search context entries and each search context entry is determined based on a given positive partial match result of ...

Подробнее
10-03-2016 дата публикации

Scope In Decision Trees

Номер: US20160071016A1
Принадлежит:

A root node of a decision tree data structure may cover all values of a search space used for packet classification. The search space may include a plurality of rules, the plurality of rules having at least one field. The decision tree data structure may include a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. Scope in the decision tree data structure may be based on comparing a portion of the search space covered by a node to a portion of the search space covered by the node's rules. Scope in the decision tree data structure may be used to identify whether or not a compilation operation may be unproductive. By identifying an unproductive compilation operation it may be avoided, thereby improving compiler efficiency as the unproductive compilation operation may be time-consuming. 1. A method comprising:compiling a decision tree data structure including a plurality of nodes using a classifier table having a plurality of rules representing a search space for packet classification, the plurality of rules having at least one field, the plurality of nodes each covering a portion of the search space by representing subsets of the plurality of rules;for each node of the decision tree data structure, computing a scope factor for the node based on a rule portion of the search space covered by each rule intersecting the node; andusing the scope factor computed for at least one node of the plurality of nodes as an input parameter to a decision for performing a compiler operation at the at least one node.2. The method of wherein computing the scope factor includes computing a node scope value indicating a node portion of the search space covered by the node claim 1 , wherein computing the node scope value includes computing a node field scope value for each at least one field covered by the node portion of the search space and summing each node field scope value computed to compute a total node scope value for the node.3. The method of ...

Подробнее
08-09-2020 дата публикации

Data processing unit having hardware-based range encoding and decoding

Номер: US0010771090B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A highly programmable data processing unit includes multiple processing units for processing streams of information, such as network packets or storage packets. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. The data processing unit is configured to retrieve speculative probability values for range coding a plurality of bits with a single read instruction to an on-chip memory that stores a table of probability values. The data processing unit is configured to store state information used for context-coding packets of a data stream so that the state information is available after switching between data streams.

Подробнее
01-03-2016 дата публикации

Packet extraction optimization in a network processor

Номер: US0009276846B2
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. Based on information in the packet, the lookup front-end can optimize start times for sending key requests as a continuous stream with minimal delay. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found.

Подробнее
26-01-2017 дата публикации

METHOD AND APPARATUS FOR VIRTUALIZATION

Номер: US20170024159A1
Принадлежит:

A virtual system on chip (VSoC) is an implementation of a machine that allows for sharing of underlying physical machine resources between different virtual systems. A method or corresponding apparatus of the present invention relates to a device that includes a plurality of virtual systems on chip and a configuring unit. The configuring unit is arranged to configure resources on the device for the plurality of virtual systems on chip as a function of an identification tag assigned to each virtual system on chip. 1. A device comprising:a plurality of virtual systems on chip, each virtual system on chip (VSoC) relating to a subset of a plurality of resources on a single physical chip;a plurality of access control elements on the single physical chip; and assign memory subsets of a given memory to each of the plurality of virtual systems on chip; and', 'set each access control element to control whether a given VSoC of the plurality of virtual systems on chip is enabled to access a given at least one location of the given memory., 'a configuring unit arranged to2. The device of claim 1 , wherein the configuring unit is further arranged to:assign a unique identification tag of a plurality of identification tags to each VSoC;assign each memory subset a given identification tag of the plurality of identification tags, the given identification tag assigned to a corresponding VSoC to which the memory subset is assigned; andprovide a granularity of memory protection based on a number of the plurality of identification tags assigned.3. The device of claim 2 , wherein the granularity of memory protection provided is further based on a programmed memory size of the given memory.4. The device of claim 3 , wherein:a total element number of the plurality of access control elements is static;the programmed memory size of the given memory is variable; andthe granularity of memory protection provided is further based on the programmed memory size of the given memory and the total ...

Подробнее
07-05-2009 дата публикации

Intelligent graph walking

Номер: US20090119399A1
Принадлежит: Cavium Networks, Inc.

An apparatus, and corresponding method, for performing a search for a match of at least one expression in an input stream is presented. A graph including a number of interconnected nodes is generated. A compiler may assign at least one starting node and at least one ending node. The starting node includes a location table with node position information of an ending node and a sub-string value associated with the ending node. Using the node position information and a string comparison function, intermediate nodes located between the starting and ending nodes may be bypassed. The node bypassing may reduce the number of memory accesses required to read the graph.

Подробнее
23-08-2016 дата публикации

Method and apparatus for processing finite automata

Номер: US0009426166B2
Принадлежит: Cavium, Inc., CAVIUM INC

A method and corresponding apparatus for run time processing use a Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload. A subpattern may be selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic. The DFA may be generated from selected subpatterns from all patterns in the set, and at least one NFA may be generated for at least one pattern in the set, optimizing run time performance of the run time processing.

Подробнее
05-11-2019 дата публикации

Engine architecture for processing finite automata

Номер: US0010466964B2
Принадлежит: Cavium, LLC, CAVIUM LLC

An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network.

Подробнее
28-05-2009 дата публикации

Deterministic finite automata (DFA) graph compression

Номер: US20090138494A1
Автор: Rajan Goyal
Принадлежит: Cavium Networks, Inc.

An apparatus, and corresponding method, for generating a graph used in performing a search for a match of at least one expression in an input stream is presented. The graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc of a current node represents a character match in an expression of a character associated with the current node. Arcs which are not valid may be pruned. Non-valid arcs may include arcs which point back to a designated node(s), or arcs that point to the same next node as the designated node(s) for the same character. Typically, the majority of arcs associated with a node are non-valid. Therefore, pruning the non-valid arcs may greatly reduce graph storage requirements.

Подробнее
19-05-2020 дата публикации

Access node integrated circuit for data centers which includes a networking unit, a plurality of host units, processing clusters, a data network fabric, and a control network fabric

Номер: US0010659254B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A highly-programmable access node is described that can be configured and optimized to perform input and output (I/O) tasks, such as storage and retrieval of data to and from storage devices (such as solid state drives), networking, data processing, and the like. For example, the access node may be configured to execute a large number of data I/O processing tasks relative to a number of instructions that are processed. The access node may be highly programmable such that the access node may expose hardware primitives for selecting and programmatically configuring data processing operations. As one example, the access node may be used to provide high-speed connectivity and I/O operations between and on behalf of computing devices and storage components of a network, such as for providing interconnectivity between those devices and a switch fabric of a data center.

Подробнее
26-09-2013 дата публикации

LOOKUP CLUSTER COMPLEX

Номер: US20130250948A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. A method of processing a packet comprising:receiving a key request including a key and an identifier (ID), the key including data extracted from a packet;selecting at least one entry in a table indicated by the ID, the entry providing a starting address of a path to a set of rules stored in a memory;processing the entry, based on at least one field of the key, to determine a subset of rules, the subset of rules being a portion of the set of rules;applying the at least one field against the subset of rules; andoutputting a response signal indicating whether the at least one field matches at least one rule of the subset of rules.2. The method of claim 1 , wherein the key request includes a key format table index.3. The method of claim 2 , further comprising parsing the key based on the key format table index.4. The method of claim 1 , wherein the set of rules is a portion of rules stored in the memory.5. The method of claim 1 , wherein determining a subset of rules includes determining at least one bucket claim 1 , the at least one bucket including pointers to the subset of rules.6. The method of claim 5 , wherein the at least one bucket includes a plurality of buckets claim 5 , and wherein the entry includes a node associated with the plurality of buckets.7. The method of claim 6 , wherein processing the entry includes processing the node to determine the plurality of buckets.8. The method of claim 6 , wherein the node is associated ...

Подробнее
03-09-2020 дата публикации

ACCESS NODE FOR DATA CENTERS

Номер: US20200280462A1
Принадлежит:

An access node that can be configured and optimized to perform input and output (I/O) tasks, such as storage and retrieval of data to and from network devices (such as solid state drives), networking, data processing, and the like. For example, the access node may be configured to receive data to be processed, wherein the access node includes a plurality of processing cores, a data network fabric, and a control network fabric; receive, over the control network fabric, a work unit message indicating a processing task to be performed a processing core; and process the work unit message, wherein processing the work unit message includes retrieving data associated with the work unit message over the data network fabric.

Подробнее
30-04-2019 дата публикации

System and method for storing lookup request rules in multiple memories

Номер: US0010277510B2

In one embodiment, a system includes a data navigation unit configured to navigate through a data structure stored in a first memory to a first representation of at least one rule. The system further includes at least one rule processing unit configured to a) receive the at least one rule based on the first representation of the at least one rule from a second memory to one of the rule processing unit, and b) processing a key using the at least one rule.

Подробнее
16-08-2016 дата публикации

Method and apparatus for processing of finite automata

Номер: US0009419943B2
Принадлежит: Cavium, Inc., CAVIUM INC

A method, and corresponding apparatus and system are provided for optimizing matching at least one regular expression pattern in an input stream by walking at least one finite automaton in a speculative manner. The speculative manner may include walking at least two nodes of a given finite automaton, of the at least one finite automaton, in parallel, with a segment, at a given offset within a payload of a packet in the input stream. The walking may include determining a match result for the segment, at the given offset within the payload, at each node of the at least two nodes. The walking may further include determining at least one subsequent action for walking the given finite automaton, based on an aggregation of each match result determined.

Подробнее
01-04-2021 дата публикации

DATA FLOW GRAPH-DRIVEN ANALYTICS PLATFORM USING DATA PROCESSING UNITS HAVING HARDWARE ACCELERATORS

Номер: US20210097108A1
Принадлежит:

A data flow graph-driven analytics platform is described in which highly-programmable data stream processing devices, referred to generally herein as data processing units (DPUs), operate to provide a scalable, fast and efficient analytics processing architecture. In general, the DPUs are specialized data-centric processors architected for efficiently applying data manipulation operations (e.g., regular expression operations to match patterns, filtering operations, data retrieval, compression/decompression and encryption/decryption) to streams of data units, such as packet flows having network packets, a set of storage packets being retrieved from or written to storage or other data units. 1. A device comprising:an analytics interface to receive a request specifying at least one analytical operation to be performed on the data;a query compiler to generate, based on the analytical operation, a data flow graph for configuring at least one data processing unit (DPU) to execute the analytical operation, wherein each of the DPUs comprises an integrated circuit having hardware-based accelerators configured for processing streams of data units, and wherein the data flow graph comprises a data structure having one or more graph nodes connected by one or more directional arcs, each arc representing a stream of data units to be processed or produced by the DPU, and each of the graph nodes represents a set of data stream processing operations to be performed by the DPU to process or produce the data streams; anda query execution controller configured to communicate the data flow graph to the DPUs to configure the DPUs to perform the analytical operation on the data.2. The device of claim 1 , wherein the query execution controller is configured to:receive one or more data streams that comprise results from application of the analytical operation to the data by the DPUs,construct a response that aggregates the results, andoutput the response to an analytics tool that issued the ...

Подробнее
07-02-2017 дата публикации

Generating a non-deterministic finite automata (NFA) graph for regular expression patterns with advanced features

Номер: US0009563399B2
Принадлежит: Cavium, Inc., CAVIUM INC

In an embodiment, a method of compiling a pattern into a non-deterministic finite automata (NFA) graph includes examining the pattern for a plurality of elements and a plurality of node types. Each node type can correspond with an element. Each element of the pattern can be matched at least zero times. The method further includes generating a plurality of nodes of the NFA graph. Each of the plurality of nodes can be configured to match for one of the plurality of elements. The node can indicate the next node address in the NFA graph, a count value, and/or node type corresponding to the element. The node can also indicate the element representing a character, character class or string. The character can also be a value or a letter.

Подробнее
30-11-2021 дата публикации

Context value retrieval prior to or parallel with expansion of previous symbol for context-decoding in range decoder

Номер: US0011188338B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes examples of retrieving values represented by one or more previous symbols needed for decoding a current symbol before or in parallel with the insertion of the values represented by the one or more previous symbols in the data stream.

Подробнее
21-05-2020 дата публикации

DATA STRIPING FOR MATCHING TECHNIQUES IN DATA COMPRESSION ACCELERATOR OF A DATA PROCESSING UNIT

Номер: US20200162100A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a hardware-based programmable data compression accelerator for the data processing unit including a pipeline for performing string substitution. The disclosed string substitution pipeline, referred to herein as a “search block,” is configured to perform string search and replacement functions to compress an input data stream. In some examples, the search block is a part of a compression process performed by the data compression accelerator. The search block may support single and multi-thread processing, and multiple levels of compression effort. In order to achieve high-throughput, the search block processes multiple input bytes per clock cycle per thread. 1. A method comprising:storing, by a match block of a search engine of a processing device, a history of an input data stream in a history buffer across two or more memory banks of the history buffer depending on an operational mode of the match block and a size of the history;receiving, by the match block, one or more history addresses of potential previous occurrences of a current byte string beginning at a current byte position in the input data stream;determining, by the match block, whether at least one match occurs for the current byte string from among one or more previous occurrences of byte strings stored at the one or more history addresses in the history buffer; andsending, by the match block and to a subsequent block of the search engine, an indication of whether the at least one match occurs for the current byte string for use in compressing the input data stream based on the match.2. The method of claim 1 , wherein ...

Подробнее
16-01-2020 дата публикации

ARC CACHING FOR DETERMININSTIC FINITE AUTOMATA OF REGULAR EXPRESSION ACCELERATOR

Номер: US20200021664A1
Принадлежит:

A DFA engine is described that determines whether a current symbol of a payload matches a label of any effective arcs or negative arcs associated with a current node of a DFA graph that are stored in a cache. Responsive to determining that the current symbol does not match a label of any effective or negative arcs associated with the current node of the DFA graph, the DFA engine determines whether the current symbol matches a label of any arc associated with the current node of the DFA graph that is stored in a memory. Responsive to determining that the current symbol matches a label of a particular arc associated with the current node of the DFA graph that is stored in the memory, the DFA engine stores the particular arc in the cache as a new effective arc and uses the particular arc to evaluate the current symbol. 1. A processing device comprising:a memory configured to store at least a portion of a deterministic finite automata (DFA) graph;a cache configured to store at least one of: one or more effective arcs or one or more negative arcs; and determining whether a current symbol of the payload matches a label of any of the one or more effective arcs or the one or more negative arcs associated with a current node of the DFA graph that are stored in the cache;', 'responsive to determining that the current symbol does not match a label of any one of the one or more effective arcs or any one of the one or more negative arcs associated with the current node of the DFA graph, determining whether the current symbol matches a label of any arc associated with the current node of the DFA graph that is stored in the memory; and', 'responsive to determining that the current symbol matches a label of a particular arc associated with the current node of the DFA graph that is stored in the memory, storing the particular arc in the cache as a new effective arc of the one or more effective arcs; and', 'using the particular arc to evaluate the payload., 'a DFA engine implemented ...

Подробнее
25-06-2013 дата публикации

Lookup cluster complex

Номер: US0008472452B2

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found.

Подробнее
21-05-2020 дата публикации

HASHING TECHNIQUES IN DATA COMPRESSION ACCELERATOR OF A DATA PROCESSING UNIT

Номер: US20200162101A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a hardware-based programmable data compression accelerator for the data processing unit including a pipeline for performing string substitution. The disclosed string substitution pipeline, referred to herein as a “search block,” is configured to perform string search and replacement functions to compress an input data stream. In some examples, the search block is a part of a compression process performed by the data compression accelerator. The search block may support single and multi-thread processing, and multiple levels of compression effort. In order to achieve high-throughput, the search block processes multiple input bytes per clock cycle per thread. 1. A method comprising:generating, by a hash block of a search engine of a processing device, a hash key from a current byte string beginning at a current byte position in an input data stream to be compressed;computing, by the hash block, a hash index from the hash key using a hash function;accessing, by the hash block, a hash bucket of a hash table identified by the hash index;reading, by the hash block and during the hash table access, one or more history addresses of potential previous occurrences of the current byte string in the input data stream from the hash bucket identified by the hash index, wherein the history addresses comprise byte positions of previous occurrences of byte strings; andsending, by the hash block and to a subsequent block of the search engine, the one or more history addresses for use in compressing the input data stream based on matches to the current byte string from among the respective previous ...

Подробнее
06-07-2021 дата публикации

Systems and methods for deep learning processor

Номер: US0011055063B2
Принадлежит: Marvell Asia Pte, Ltd., MARVELL ASIA PTE LTD

A hardware-based programmable deep learning processor (DLP) is proposed, wherein the DLP comprises with a plurality of accelerators dedicated for deep learning processing. Specifically, the DLP includes a plurality of tensor engines configured to perform operations for pattern recognition and classification based on a neural network. Each tensor engine includes one or more matrix multiplier (MatrixMul) engines each configured to perform a plurality of dense and/or sparse vector-matrix and matrix-matrix multiplication operations, one or more convolutional network (ConvNet) engines each configured to perform a plurality of efficient convolution operations on sparse or dense matrices, one or more vector floating point units (VectorFPUs) each configured to perform floating point vector operations, and a data engine configured to retrieve and store multi-dimensional data to both on-chip and external memories.

Подробнее
08-05-2012 дата публикации

Method and apparatus for content based searching

Номер: US0008176300B2

The scheduling of multiple request to be processed by a number of deterministic finite automata-based graph thread engine (DTE) workstations is processed by a novel scheduler. The scheduler may select an entry from an instruction in a content search apparatus. Using attribute information from the selected entry, the scheduler may thereafter analyze a dynamic scheduling table to obtain placement information. The scheduler may determine an assignment of the entry, using the placement information, that may limit cache thrashing and head of line blocking occurrences. Each DTE workstation may including normalization capabilities. Additionally, the content searching apparatus may employ an address memory scheme that may prevent memory bottle neck issues.

Подробнее
07-02-2013 дата публикации

METHOD AND APPARATUS FOR MANAGING TRANSFER OF TRANSPORT OPERATIONS FROM A CLUSTER IN A PROCESSOR

Номер: US20130036284A1
Принадлежит: Cavium, Inc.

A method and corresponding apparatus of managing transport operations between a first memory cluster and one or more other memory clusters, include selecting, at a clock cycle in the first memory cluster, at least one transport operation destined to at least one destination memory cluster, from one or more transport operations, based at least in part on priority information associated with the one or more transport operations or current states of available processing resources allocated to the first memory cluster in each of a subset of the one or more other memory clusters, and initiating the transport of the selected at least one transport operation. 1. A method of managing transport operations between a first memory cluster and one or more other memory clusters , the method comprising:selecting, at a clock cycle in the first memory cluster, at least one transport operation destined to at least one destination memory cluster, from one or more transport operations, based at least in part on priority information associated with the one or more transport operations or current states of available processing resources allocated to the first cluster in each of a subset of the one or more other memory clusters; andinitiating the transport of the selected at least one transport operation.2. A method according to claim 1 , wherein selecting the at least one transport operation being based at least in part on the current states of the available processing resources allocated to the first memory cluster in each of a subset of the one or more other memory clusters claim 1 , the method further comprising updating a current state of available processing resources allocated to the first memory cluster claim 1 , in at least one other memory cluster claim 1 , corresponding to the selected at least one transport operation.3. A method according to claim 1 , wherein the one or more transport operations include at least one processing thread migration operation.4. A method according ...

Подробнее
22-08-2013 дата публикации

Rule Modification in Decision Trees

Номер: US20130218853A1
Принадлежит: CAVIUM, INC.

A system, apparatus, and method are provided for modifying rules in-place atomically from the perspective of an active search process using the rules for packet classification. A rule may be modified in-place by updating a rule's definition to be an intersection of an original and new definition. The rule's definition may be further updated to the rule's new definition and a decision tree may used updated based on the rule's new definition. While a search processor searches for one or more rules that match keys generated from received packets the in-place rule modification prevents periods of incorrect rule matching of the keys thereby preventing packet loss and preserving throughput. 1. A method comprising:receiving an incremental update specifying a new rule definition including a modification of at least one field of a designated rule of a plurality of rules, the plurality of rules being represented by a Rule Compiled Data Structure (RCDS) as a decision tree for packet classification utilized by an active search process, the plurality of rules representing a search space for the packet classification, each rule of the plurality of rules having an original rule definition defining a subset of the search space;determining an intersection of the new rule definition specified and the original rule definition of the designated rule;setting the original rule definition of the designated rule to an intermediate rule definition defined by the intersection determined and incorporating the series of one or more updates determined in the RCDS, atomically from the perspective of the active search process; andsetting the intermediate rule definition of the designated rule to the new rule definition and incorporating the series of one or more updates determined in the RCDS, atomically from the perspective of the active search process.2. The method of wherein setting the original rule definition of the designated rule to the intermediate rule definition defined by the ...

Подробнее
19-04-2022 дата публикации

Static dictionary-based compression hardware pipeline for data compression accelerator of a data processing unit

Номер: US0011309908B2
Принадлежит: Fungible, Inc.

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data processing functions. This disclosure describes a programmable hardware-based data compression accelerator that includes a pipeline for performing static dictionary-based and dynamic history-based compression on streams of information, such as network packets. The search block may support single and multi-thread processing, and multiple levels of compression effort. To achieve high-compression, the search block may operate at a high level of effort that supports a single thread and use of both a dynamic history of the input data stream and a static dictionary of common words. The static dictionary may be useful in achieving high-compression where the input ...

Подробнее
01-03-2022 дата публикации

Data ingestion and storage by data processing unit having stream-processing hardware accelerators

Номер: US0011263190B2
Принадлежит: Fungible, Inc.

A system comprises a data processing unit (DPU) integrated circuit having programmable processor cores and hardware-based accelerators configured for processing streams of data units; and software executing on one or more of the processing cores. In response to a request to perform an operation on a set of one or more data tables, each having one or more columns of data arranged in a plurality of rows, the software configures the DPU to: input at least a portion of the rows of each of the database tables as at least one or more streams of data units, process the one or more streams of data units with the hardware-based accelerators to apply one or more of compression, encoding or encryption to produce a resultant stream of data units; and write the resultant stream of data units to a storage in a tree data structure.

Подробнее
23-08-2016 дата публикации

Method and apparatus for compilation of finite automata

Номер: US0009426165B2
Принадлежит: Cavium, Inc., CAVIUM INC

A method and corresponding apparatus are provided implementing run time processing using Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload. A subpattern may be selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic and a unified deterministic finite automata (DFA) may be generated using the subpatterns selected from all patterns in the set, and at least one non-deterministic finite automata (NFA) may be generated for at least one pattern in the set, optimizing run time performance of the run time processing.

Подробнее
21-05-2020 дата публикации

HISTORY-BASED COMPRESSION PIPELINE FOR DATA COMPRESSION ACCELERATOR OF A DATA PROCESSING UNIT

Номер: US20200159859A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a hardware-based programmable data compression accelerator for the data processing unit including a pipeline for performing string substitution. The disclosed string substitution pipeline, referred to herein as a “search block,” is configured to perform string search and replacement functions to compress an input data stream. In some examples, the search block is a part of a compression process performed by the data compression accelerator. The search block may support single and multi-thread processing, and multiple levels of compression effort. In order to achieve high-throughput, the search block processes multiple input bytes per clock cycle per thread. 1. A method comprising:receiving, by a search engine implemented as a pipeline of a processing device, an input data stream to be compressed;identifying, by the search engine, one or more history addresses of potential previous occurrences of a current byte string beginning at a current byte position in the input data stream;determining, by the search engine, whether at least one match occurs for the current byte string from among one or more previous occurrences of byte strings at the history addresses;selecting, by the search engine, an output for the current byte position, wherein the output for the current byte position comprises one of a reference to a match for the current byte string or a literal of original data at the current byte position; andtransmitting, by the search engine, the selected output for the current byte position in an output data stream.2. The method of claim 1 , wherein identifying the one or more history ...

Подробнее
04-04-2017 дата публикации

Work migration in a processor

Номер: US0009614762B2
Принадлежит: Cavium, Inc., CAVIUM INC

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. A work product may be migrated between lookup engines to complete the rule matching process. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found.

Подробнее
30-04-2015 дата публикации

Method And Apparatus For Managing Processing Thread Migration Between Clusters Within A Processor

Номер: US20150121395A1
Принадлежит:

A method, and corresponding apparatus, of managing processing thread migrations within a plurality of memory clusters, includes embedding, in memory components of the plurality of memory clusters, instructions indicative of processing thread migrations; storing, in one or more memory components of a particular memory cluster among the plurality of memory clusters, data configured to designate the particular memory cluster as a sink memory cluster, the sink memory cluster preventing an incoming migrated processing thread from migrating out of the sink memory cluster; and processing one or more processing threads, in one or more of the plurality of memory clusters, in accordance with at least one of the embedded migration instructions and the data stored in the one or more memory components of the sink memory cluster. 1. A method of managing processing thread migrations within a plurality of memory clusters , the method comprising:embedding, in memory components of the plurality of memory clusters, instructions indicative of processing thread migrations, wherein the instructions indicative of processing thread migrations include instructions preventing migrating a processing thread to a memory cluster from which the processing thread migrated previously; andprocessing one or more processing threads, in one or more of the plurality of memory clusters, in accordance with at least one of the embedded migration instructions.2. A method according to claim 1 , further comprising:storing, in one or more memory components of a particular memory cluster among the plurality of memory clusters, data configured to designate the particular memory cluster as a sink memory cluster, the sink memory cluster preventing an incoming migrated processing thread from migrating out of the sink memory cluster.3. A method according to claim 1 , wherein in each of the plurality of memory clusters at least one processing engine is reserved to handle migrating processing threads.4. A method ...

Подробнее
04-12-2014 дата публикации

Method and Apparatus for a Virtual System on Chip

Номер: US20140359621A1
Принадлежит:

A virtual system on chip (VSoC) is an implementation of a machine that allows for sharing of underlying physical machine resources between different virtual systems. A method or corresponding apparatus of the present invention relates to a device that includes a plurality of virtual systems on chip and a configuring unit. The configuring unit is arranged to configure resources on the device for the plurality of virtual systems on chip as a function of an identification tag assigned to each virtual system on chip. 1. A device comprising: assign a unique identification tag of a plurality of identification tags to each VSoC;', 'assign memory subsets of a given memory to each of the plurality of virtual systems on chip;', 'assign each memory subset a given identification tag of the plurality of identification tags, the given identification tag assigned to a corresponding VSoC to which the memory subset is assigned; and', 'provide a granularity of memory protection based on a number of the plurality of identification tags assigned., 'a plurality of virtual systems on chip, each virtual system on chip (VSoC) relating to a subset of a plurality of processing cores on a single physical chip and a configuring unit arranged to2. The device of claim 1 , wherein the granularity of memory protection provided is further based on a programmed memory size of the given memory.3. The device of claim 1 , further comprising a plurality of access elements on the single physical chip claim 1 , wherein the configuring unit is further arranged to set each access element to control whether a given VSoC of the plurality of virtual systems on chip is enabled to access a given at least one location of the given memory.4. The device of claim 3 , wherein:a total element number of the plurality of access elements is static;the number of the plurality of identification tags assigned and a programmed memory size of the given memory are variable; andthe granularity of memory protection provided is ...

Подробнее
03-07-2014 дата публикации

LOOKUP FRONT END PACKET OUTPUT PROCESSOR

Номер: US20140188973A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host. 1. A method of processing a packet comprising:merging a plurality of sub-tree responses from a processing cluster, the processing cluster performing rule matching for a packet, the plurality of sub-tree responses being responsive to lookup requests associated with the packet; andoutputting a lookup result to a host processor, the lookup result including at least one of the plurality of sub-tree responses based on relative priority of the plurality of sub-tree responses.2. The method of claim 1 , wherein the merging includes selecting one of the sub-tree responses having a highest-priority rule match and eliminating sub-tree responses that are absent the highest-priority rule match claim 1 , the lookup result including the one of the sub-tree responses.3. The method of claim 1 , wherein the sub-tree responses are a subset of a plurality of responses from the processing cluster claim 1 , and further comprising:determining whether the plurality of responses is to be coalesced based on an indicator associated with the packet, the lookup result including a selection of the plurality of responses based on whether the plurality of responses is to be coalesced.4. The method of claim 3 , further comprising updating a table based on the plurality of responses claim 3 , the table indicating in-process lookup requests at a ...

Подробнее
27-05-2010 дата публикации

Multiple core Session Initiation Protocol (SIP)

Номер: US20100131658A1
Принадлежит: Cavium Networks, Inc.

A Session Initiation Protocol (SIP) proxy server including a multi-core central processing unit (CPU) is presented. The multi-core CPU includes a receiving core dedicated to pre-SIP message processing. The pre-SIP message processing may include message retrieval, header and payload parsing, and Call-ID hashing. The Call-ID hashing is used to determine a post-SIP processing core designated to process messages between particular user pair. The pre-SIP and post-SIP configuration allows for the use of multiple processing cores to utilize a single control plane, thereby providing an accurate topology of the network for each processing core.

Подробнее
05-05-2020 дата публикации

ARC caching for determininstic finite automata of regular expression accelerator

Номер: US0010645187B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A DFA engine is described that determines whether a current symbol of a payload matches a label of any effective arcs or negative arcs associated with a current node of a DFA graph that are stored in a cache. Responsive to determining that the current symbol does not match a label of any effective or negative arcs associated with the current node of the DFA graph, the DFA engine determines whether the current symbol matches a label of any arc associated with the current node of the DFA graph that is stored in a memory. Responsive to determining that the current symbol matches a label of a particular arc associated with the current node of the DFA graph that is stored in the memory, the DFA engine stores the particular arc in the cache as a new effective arc and uses the particular arc to evaluate the current symbol.

Подробнее
28-04-2020 дата публикации

Incremental compilation of finite automata for a regular expression accelerator

Номер: US0010635419B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A compiler/loader unit for a RegEx accelerator is described that receives a first set of regular expression rules for implementing the RegEx accelerator, generates, based on the first set of regular expression rules, an initial deterministic finite automata (DFA) graph, and generates, an initial memory map for allocating the initial DFA graph to a memory of the RegEx accelerator. The compiler/loader unit receives receive, a second set of one or more new or modified regular expression rules for implementing the RegEx accelerator and in response performs incremental compilation of the second set of regular expressions. The compiler/loader unit generates, based on the second set of one or more regular expression rules, a supplemental DFA graph and reconciles the initial DFA graph with the supplemental DFA graph to generate an updated memory map for allocating the initial DFA graph and the supplemental DFA graph to the memory of the RegEx accelerator.

Подробнее
28-06-2016 дата публикации

Method and apparatus for a virtual system on chip

Номер: US0009378033B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A virtual system on chip (VSoC) is an implementation of a machine that allows for sharing of underlying physical machine resources between different virtual systems. A method or corresponding apparatus of the present invention relates to a device that includes a plurality of virtual systems on chip and a configuring unit. The configuring unit is arranged to configure resources on the device for the plurality of virtual systems on chip as a function of an identification tag assigned to each virtual system on chip.

Подробнее
28-05-2009 дата публикации

Method and apparatus for traversing a deterministic finite automata (DFA) graph compression

Номер: US20090138440A1
Автор: Rajan Goyal
Принадлежит:

An apparatus, and corresponding method, for traversing a compressed graph used in performing a search for a match of at least one expression in an input stream is presented. The compressed graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc of a current node represents a character match in an expression of a character associated with the current node. Arcs which are not valid may be pruned. Non-valid arcs may include arcs which point back to a designated node(s), or arcs that point to the same next node as the designated node(s) for the same character. Each valid arc may comprise a next node pointer, a hash function, and a copy of an associated character. The hash function may be used to manage a retrieval process used by a walker traversing the compressed node. The walker may also use a comparison function to verify the correct arc has been retrieved.

Подробнее
23-02-2016 дата публикации

Processing request keys based on a key size supported by underlying processing elements

Номер: US0009268855B2
Принадлежит: CAVIUM, INC.

A packet classification system, methods, and apparatus are provided for packet classification. A processor of a router coupled to a network processes data packets received from a network. The processor creates a request key using information extracted from a packet. The processor splits the request key into an n number of partial request keys if at least one predetermined criterion is met. The processor also sends a non-final request that includes an i-th partial request key to a corresponding search table of an n number of search tables, wherein i Подробнее

27-02-2018 дата публикации

Finite automata processing based on a top of stack (TOS) memory

Номер: US0009904630B2
Принадлежит: Cavium, Inc., CAVIUM INC

A method, and corresponding apparatus and system are provided for optimizing matching of at least one regular expression pattern in an input stream by storing a context for walking a given node, of a plurality of nodes of a given finite automaton of at least one finite automaton, the store including a store determination, based on context state information associated with a first memory, for accessing the first memory and not a second memory or the first memory and the second memory. Further, to retrieve a pending context, the retrieval may include a retrieve determination, based on the context state information associated with the first memory, for accessing the first memory and not the second memory or the second memory and not the first memory. The first memory may have read and write access times that are faster relative to the second memory.

Подробнее
29-11-2016 дата публикации

System and method to traverse a non-deterministic finite automata (NFA) graph generated for regular expression patterns with advanced features

Номер: US0009507563B2
Принадлежит: Cavium, Inc., CAVIUM INC

In one embodiment, a method of walking a non-deterministic finite automata (NFA) graph representing a pattern includes extracting a node type and an element from a node of the NFA graph. The method further includes matching a segment of a payload for the element by matching the payload for the element at least zero times, the number of times based on the node type.

Подробнее
02-07-2015 дата публикации

MULTI-RULE APPROACH TO ENCODING A GROUP OF RULES

Номер: US20150189046A1
Принадлежит:

A multi-rule approach for encoding rules grouped in a rule chunk is provided. The approach includes a multi-rule with a multi-rule header representing headers of the rules and, in some cases, dimensional data representing dimensional data of the rules. The approach further includes disabling dimension matching of always matching dimensions, responding to an always match rule with a match response without matching, interleaving minimum/maximum values in a range field, interleaving value/mask values in a mask field, and for a given rule of rule chunk, encoding a priority field at the end of dimension data stored for the rule in the multi-rule. Advantageously, this approach provides efficient storage of rules and enables the efficient comparison of rules to keys. 1. A method for encoding one or more key matching rules grouped in a chunk , the method comprising:in a rule encoding engine, communicatively coupled to memory and provided with a chunk of key matching rules, building a multi-rule corresponding to the chunk comprising:storing in the memory a multi-rule header of the multi-rule, the multi-rule header representing headers of the key matching rules.2. The method of wherein storing the multi-rule header of the multi-rule further includes storing claim 1 , consecutively claim 1 , a rule validity value for each of the key matching rules of the chunk in which storing a first value for a rule validity value corresponding to a subject key matching rule enables matching of the subject key matching rule and storing a second value different than the first value disables matching of the subject key matching rule.3. The method of wherein storing the rule validity values includes claim 2 , given a key matching rule that always matches claim 2 , storing a rule validity having a third value; and given a key matching rule that never matches claim 2 , storing a rule validity having a fourth value different than the third value.4. The method of wherein storing the multi-rule ...

Подробнее
03-10-2013 дата публикации

Deterministic Finite Automata Graph Traversal With Nodal Bit Mapping

Номер: US20130262518A1
Автор: Rajan Goyal
Принадлежит:

An apparatus, and corresponding method, for generating a graph used in performing a search for a match of at least one expression in an input stream is presented. The graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc may also include a nodal bit map including structural information of a node to which the valid arc points to. A walker process may utilize the nodal bit map to determine if a memory access is necessary. The nodal bit map reduces the number of external memory access and therefore reduces system run time. 1. A computer implemented method comprising:given a current node and an arc pointing from the current node to a next node, analyzing arcs in the graph to determine which of the arcs are valid arcs pointing from the next node;constructing arc configuration information associated with the arc pointing from the current node to the next node, the arc configuration information representing the valid arcs pointing from the next node; andstoring the arc configuration information representing the valid arcs pointing from the next node in the arc pointing from the current node to the next node to enable the arc configuration information to be evaluated and the valid arcs pointing from the next node to be identified from the evaluation of the arc configuration information without the next node being read.2. The method of wherein the arc configuration information comprises a bit map.3. The method of wherein constructing the arc configuration information associated with the arc pointing from the current node to the next node includes:providing a listing of indicator values, each indicator value being associated with a respective character, each indicator value providing an indication of whether a valid arc associated with a respective character exists in the next node;assigning a negative value to an indicator value if the associated valid arc does not exist in the next node; andassigning a positive value to an indicator ...

Подробнее
09-07-2015 дата публикации

METHOD AND APPARATUS FOR COMPILING SEARCH TREES FOR PROCESSING REQUEST KEYS BASED ON A KEY SIZE SUPPORTED BY UNDERLYING PROCESSING ELEMENTS

Номер: US20150195194A1
Принадлежит: CAVIUM, INC.

A packet classification system, methods, and apparatus are provided for packet classification. A processor of a router coupled to a network compiles at least one search tree based on a rules set. The processor determines an x number of search phases needed to process an incoming key corresponding to the rules set, wherein the rules set includes a plurality of rules, where each of the plurality of rules includes an n number of rule fields and where the incoming key includes an n number of processing fields. The processor generates an x set of search trees, where each of the x set of search trees corresponds to a respective one of the x number of search phases. Also, the processor provides the x set of search trees to a search processor, where each of the x set of search trees is configured to process respective portions of the incoming key. 1. A method , executed by one or more processors , for compiling at least one search tree based on an original rules set , the method comprising:determining an x number of search phases needed to process an incoming key corresponding to the original rules set, wherein the original rules set includes a plurality of rules, where each of the plurality of rules includes an n number of rule fields and where the incoming key includes an n number of processing fields;generating x sets of search trees, where each of the x sets of search trees corresponds to a respective one of the x number of search phases; andproviding the x sets of search trees to a search processor, where each of the x sets of search trees is configured to process respective portions of the incoming key.2. The method of wherein determining the x number of search phases needed includes determining a processing capability of a processing system for processing the incoming key.3. The method of further comprising:partitioning the n rule fields into a plurality of rule field subsets; andassigning each of the plurality of rule field subsets to a respective one of the x ...

Подробнее
02-08-2022 дата публикации

Multimode cryptographic processor

Номер: US0011405179B2
Принадлежит: Fungible, Inc.

This disclosure describes techniques that include performing cryptographic operations (encryption, decryption, generation of a message authentication code). Such techniques may involve the data processing unit performing any of multiple modes of encryption, decryption, and/or other cryptographic operation procedures or standards, including, Advanced Encryption Standard (AES) cryptographic operations. In some examples, the security block is implemented as a unified, multi-threaded, high-throughput encryption and decryption system for performing multiple modes of AES operations.

Подробнее
10-01-2019 дата публикации

ACCESS NODE FOR DATA CENTERS

Номер: US20190013965A1
Принадлежит:

A highly-programmable access node is described that can be configured and optimized to perform input and output (I/O) tasks, such as storage and retrieval of data to and from storage devices (such as solid state drives), networking, data processing, and the like. For example, the access node may be configured to execute a large number of data I/O processing tasks relative to a number of instructions that are processed. The access node may be highly programmable such that the access node may expose hardware primitives for selecting and programmatically configuring data processing operations. As one example, the access node may be used to provide high-speed connectivity and I/O operations between and on behalf of computing devices and storage components of a network, such as for providing interconnectivity between those devices and a switch fabric of a data center. 1. An access node integrated circuit comprising:a networking unit configured to control input and output of data between a network and the access node integrated circuit;one or more host units configured to at least one of control input and output of the data between the access node integrated circuit and one or more application processors or control storage of the data with one or more storage devices;a plurality of processing clusters, each of the processing clusters including two or more programmable processing cores configured to perform processing tasks on the data;a data network fabric interconnecting the plurality of processing clusters, the one or more host units, and the networking unit, wherein the data network fabric is configured to carry the data between the networking unit, the one or more host units, and the plurality of processing clusters; andat least one control network fabric interconnecting the plurality of processing clusters, the one or more host units, and the networking unit, wherein the at least one control network fabric is configured to carry control messages identifying the ...

Подробнее
13-01-2015 дата публикации

Identifying duplication in decision trees

Номер: US0008934488B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure including a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. By identifying duplication in decision trees, the methods may produce wider, shallower trees that result in shorter search times and reduced memory requirements for storing the trees.

Подробнее
02-11-2017 дата публикации

SYSTEMS AND METHODS FOR DEEP LEARNING PROCESSOR

Номер: US20170316312A1
Принадлежит:

A hardware-based programmable deep learning processor (DLP) is proposed, wherein the DLP comprises with a plurality of accelerators dedicated for deep learning processing. Specifically, the DLP includes a plurality of tensor engines configured to perform operations for pattern recognition and classification based on a neural network. Each tensor engine includes one or more matrix multiplier (MatrixMul) engines each configured to perform a plurality of dense and/or sparse vector-matrix and matrix-matrix multiplication operations, one or more convolutional network (ConvNet) engines each configured to perform a plurality of efficient convolution operations on sparse or dense matrices, one or more vector floating point units (VectorFPUs) each configured to perform floating point vector operations, and a data engine configured to retrieve and store multi-dimensional data to both on-chip and external memories.

Подробнее
24-11-2015 дата публикации

Scope in decision trees

Номер: US0009195939B1
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A root node of a decision tree data structure may cover all values of a search space used for packet classification. The search space may include a plurality of rules, the plurality of rules having at least one field. The decision tree data structure may include a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. Scope in the decision tree data structure may be based on comparing a portion of the search space covered by a node to a portion of the search space covered by the node's rules. Scope in the decision tree data structure may be used to identify whether or not a compilation operation may be unproductive. By identifying an unproductive compilation operation it may be avoided, thereby improving compiler efficiency as the unproductive compilation operation may be time-consuming.

Подробнее
07-02-2013 дата публикации

METHOD AND APPARATUS FOR ASSIGNING RESOURCES USED TO MANAGE TRANSPORT OPERATIONS BETWEEN CLUSTERS WITHIN A PROCESSOR

Номер: US20130036288A1
Принадлежит: Cavium, Inc.

A method, and corresponding apparatus, of assigning processing resources used to manage transport operations between a first memory cluster and one or more other memory clusters, include receiving information indicative of allocation of a subset of processing resources in each of the one or more other memory clusters to the first memory cluster, storing, in the first memory cluster, the information indicative of resources allocated to the first memory cluster, and facilitating management of transport operations between the first memory cluster and the one or more other memory clusters based at least in part on the information indicative of resources allocated to the first memory cluster. 1. A method of managing transport operations between a first memory cluster and one or more other memory clusters , the method comprising:receive information indicative of allocation of a subset of processing resources in each of the one or more other memory clusters to the first memory cluster;storing, in the first cluster, the information indicative of resources allocated to the first memory cluster; andfacilitating management of transport operations between the first memory cluster and the one or more other memory clusters based at least in part on the information indicative of resources allocated to the first memory cluster.2. A method according to claim 1 , wherein processing resources allocated to the first memory cluster being allocated per types of transport operations between the first cluster and the one or more other clusters.3. A method according to further comprising modifying allocation of processing resources to the first memory cluster based on received information indicative of modified allocation of processing resources in each of the one or more other memory clusters to the first cluster.4. A method according to claim 3 , wherein modifying allocation of processing resources to the first cluster includes reducing or increasing processing resources claim 3 , in a ...

Подробнее
28-07-2020 дата публикации

Data striping for matching techniques in data compression accelerator of a data processing unit

Номер: US0010727865B2
Принадлежит: Fungible, Inc., FUNGIBLE INC

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a hardware-based programmable data compression accelerator for the data processing unit including a pipeline for performing string substitution. The disclosed string substitution pipeline, referred to herein as a “search block,” is configured to perform string search and replacement functions to compress an input data stream. In some examples, the search block is a part of a compression process performed by the data compression accelerator. The search block may support single and multi-thread processing, and multiple levels of compression effort. In order to achieve high-throughput, the search block processes ...

Подробнее
02-07-2015 дата публикации

METHOD AND APPARATUS FOR PROCESSING OF FINITE AUTOMATA

Номер: US20150186786A1
Принадлежит: Cavium, Inc.

A method, and corresponding apparatus and system are provided for optimizing matching at least one regular expression pattern in an input stream by walking at least one finite automaton in a speculative manner. The speculative manner may include walking at least two nodes of a given finite automaton, of the at least one finite automaton, in parallel, with a segment, at a given offset within a payload of a packet in the input stream. The walking may include determining a match result for the segment, at the given offset within the payload, at each node of the at least two nodes. The walking may further include determining at least one subsequent action for walking the given finite automaton, based on an aggregation of each match result determined. 1. A security appliance operatively coupled to a network , the security appliance comprising:at least one memory configured to store at least one finite automaton including a plurality of nodes generated from at least one regular expression pattern; walking at least two nodes of a given finite automaton, of the at least one finite automaton, in parallel, with a segment, at a given offset within a payload, of a packet in the input stream;', 'determining a match result for the segment, at the given offset within the payload, at each node of the at least two nodes; and', 'determining at least one subsequent action for walking the given finite automaton, based on an aggregation of each match result determined., 'at least one processor operatively coupled to the at least one memory and configured to walk the at least one finite automaton, with segments of an input stream received via the network, to match the at least one regular expression pattern in the input stream, the walk including2. The security appliance of claim 1 , wherein the at least one finite automaton includes a deterministic finite automaton (DFA) and at least one non-deterministic finite automaton (NFA) claim 1 , the given finite automaton being a given NFA of ...

Подробнее
06-08-2015 дата публикации

Finite Automata Processing Based on a Top of Stack (TOS) Memory

Номер: US20150220454A1
Принадлежит: Cavium, Inc.

A method, and corresponding apparatus and system are provided for optimizing matching of at least one regular expression pattern in an input stream by storing a context for walking a given node, of a plurality of nodes of a given finite automaton of at least one finite automaton, the store including a store determination, based on context state information associated with a first memory, for accessing the first memory and not a second memory or the first memory and the second memory. Further, to retrieve a pending context, the retrieval may include a retrieve determination, based on the context state information associated with the first memory, for accessing the first memory and not the second memory or the second memory and not the first memory. The first memory may have read and write access times that are faster relative to the second memory. 1. A security appliance operatively coupled to a network , the security appliance comprising:a first memory;a second memory operatively coupled to the first memory; andat least one processor operatively coupled to the first memory and the second memory and configured to store a context for walking a given node, of a plurality of nodes of a given finite automaton of at least one finite automaton, the store including a determination, based on context state information associated with the first memory, for accessing (i) the first memory and not the second memory or (ii) the first memory and the second memory.2. The security appliance of claim 1 , wherein the context identifies the given node and an offset claim 1 , of a segment in a payload of an input stream received from the network claim 1 , to enable the at least one processor to walk the given node with the segment claim 1 , based on retrieving the context stored.3. The security appliance of claim 1 , wherein the context state information associated with the first memory includes a validity state claim 1 , the validity state indicating a valid or invalid state of the ...

Подробнее
21-01-2016 дата публикации

Reverse NFA Generation And Processing

Номер: US20160021123A1
Принадлежит: Cavium LLC

In a processor of a security appliance, an input of a sequence of characters is walked through a finite automata graph generated for at least one given pattern. At a marked node of the finite automata graph, if a specific type of the at least one given pattern is matched at the marked node, the input sequence of characters is processed through a reverse non-deterministic finite automata (rNFA) graph generated for the specific type of the at least one given pattern by walking the input sequence of characters backwards through the rNFA beginning from an offset of the input sequence of characters associated with the marked node. Generating the rNFA for a given pattern includes inserting processing nodes for processing an input sequence of patterns to determine a match for the given pattern. In addition, the rNFA is generated from the given type of pattern.

Подробнее
06-08-2020 дата публикации

FLEXIBLE RELIABILITY CODING FOR STORAGE ON A NETWORK

Номер: US20200250032A1
Принадлежит:

This disclosure describes a programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets. This disclosure also describes techniques that include enabling data durability coding on a network. In some examples, such techniques may involve storing data in fragments across multiple fault domains in a manner that enables efficient recovery of the data using only a subset of the data. Further, this disclosure describes techniques that include applying a unified approach to implementing a variety of durability coding schemes. In some examples, such techniques may involve implementing each of a plurality of durability coding and/or erasure coding schemes using a common matrix approach, and storing, for each durability and/or erasure coding scheme, an appropriate set of matrix coefficients. 1. A method comprising:receiving, by a data processing unit, a set of data to be stored;identifying, by the data processing unit, a durability scheme from among a plurality of available durability schemes for data reliability;determining, by the data processing unit and based on the identified durability scheme, a coefficient matrix appropriate for the identified durability scheme; andgenerating, by data durability circuitry within the data processing unit and by applying the coefficient matrix to the set of data, parity data.2. The method of claim 1 , further comprising:storing, by the data processing unit, the set of data and the parity data.3. The method of claim 1 , further comprising:caching, by the data processing unit, the coefficient matrix appropriate for the identified durability scheme.4. The method of claim 1 , further comprising:receiving, by the data processing unit, a request to access at least a portion of the set of data;identifying, by the data processing unit, available data from the stored set of data and the stored parity data;generating, by the ...

Подробнее
24-05-2011 дата публикации

Method and apparatus for traversing a compressed deterministic finite automata (DFA) graph

Номер: US0007949683B2
Автор: Rajan Goyal, GOYAL RAJAN

An apparatus, and corresponding method, for traversing a compressed graph used in performing a search for a match of at least one expression in an input stream is presented. The compressed graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc of a current node represents a character match in an expression of a character associated with the current node. Arcs which are not valid may be pruned. Non-valid arcs may include arcs which point back to a designated node(s), or arcs that point to the same next node as the designated node(s) for the same character. Each valid arc may comprise a next node pointer, a hash function, and a copy of an associated character. The hash function may be used to manage a retrieval process used by a walker traversing the compressed node. The walker may also use a comparison function to verify the correct arc has been retrieved.

Подробнее
08-12-2015 дата публикации

Duplication in decision trees

Номер: US0009208438B2
Принадлежит: Cavium, Inc., CAVIUM INC, CAVIUM, INC.

A packet classification system, apparatus, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure for packet classification. Duplication in the decision tree may be identified, producing a wider, shallower decision tree that may result in shorter search times with reduced memory requirements for storing the decision tree. A number of operations needed to identify duplication in the decision tree may be reduced, thereby increasing speed and efficiency of a compiler building the decision tree.

Подробнее
23-09-2021 дата публикации

FINITE AUTOMATA GLOBAL COUNTER IN A DATA FLOW GRAPH-DRIVEN ANALYTICS PLATFORM HAVING ANALYTICS HARDWARE ACCELERATORS

Номер: US20210295181A1
Принадлежит:

System and methods for performing analytical operations are described. A hardware-based regular expression (RegEx) engine performs a regular expression operation on a stream of data units based on a finite automata (FA) graph. Performing includes configuring a regular expression engine of a hardware-based regular expression accelerator to, beginning at a root node in the plurality of nodes of the FA graph, step the regular expression engine through one or more nodes of the FA graph until the regular expression engine arrives at a skip node and to consume, at the skip node, two or more data units from the stream of data units before traversing one of the directional arcs to another node. 1. An integrated circuit , comprising:a memory including a finite automata (FA) graph, wherein the FA graph includes a plurality of nodes connected by directional arcs, each arc representing transitions between nodes of the FA graph based on criteria specified for the respective arc, the plurality of nodes including a skip node; andone or more hardware-based regular expression (RegEx) accelerators connected to the memory, wherein each RegEx accelerator includes a regular expression engine, the regular expression engine configured to receive the FA graph from the memory and to perform a regular expression operation on a stream of data units based on the received FA graph,wherein the regular expression engine is further configured to, on reaching the skip node, consume two or more data units in the stream of data units before traversing one of the directional arcs to another node.2. The integrated circuit of claim 1 , wherein the regular expression engine is a deterministic finite automata (DFA) engine and the received FA graph is a DFA graph claim 1 , andwherein the regular expression engine is further configured to remain at the skip node consuming data units received from the stream of data units until a counter indicates an Nth data unit has been consumed.3. The integrated circuit ...

Подробнее
14-03-2017 дата публикации

Method and apparatus encoding a rule for a lookup request in a processor

Номер: US0009596222B2

In one embodiment, a method includes encoding a key matching rule having at least one dimension by storing in a memory (i) a header of the key matching rule that has at least one header field, and (ii) at least one rule value field of the key matching rule corresponding to one of the dimensions.

Подробнее
27-12-2012 дата публикации

Regex Compiler

Номер: US20120331554A1
Принадлежит:

A method and corresponding apparatus relate to converting a nondeterministic finite automata (NFA) graph for a given set of patterns to a deterministic finite automata (DFA) graph having a number of states. Each of the DFA states is mapped to one or more states of the NFA graph. A hash value of the one or more states of the NFA graph mapped to each DFA state is computed. A DFA states table correlates each of the number of DFA states to the hash value of the one or more states of the NFA graph for the given pattern. 1. A method comprising:in a processor of a security appliance coupled to a network: mapping each of the number of DFA states to one or more states of the NFA graph;', 'computing a hash value of the one or more states of the NFA graph mapped to each DFA state;', 'storing a DFA states table correlating each of the number of DFA states to the hash value of the one or more states of the NFA graph for the given pattern., 'converting a nondeterministic finite automata (NFA) graph for a given set of patterns to a deterministic finite automata (DFA) graph having a number of states, converting the NFA graph including2. The method of wherein the computed hash value is a cryptographic/perfect hash value.3. The method of wherein mapping includes:determining whether there are unmarked DFA states within the number of DFA states;selecting one of the unmarked DFA states;marking the unmarked DFA state; anddetermining transitions of the one or more states of the NFA graph mapped to the marked DFA state to other NFA states for a character of an alphabet recognized by the NFA graph.4. The method of wherein determining transitions includes mapping the other NFA states and epsilon closures of the other NFA states to a possible new unmarked DFA state.5. The method of wherein mapping the possible new unmarked DFA state to epsilon closures of the other NFA states includes obtaining the epsilon closure of the other NFA states from an epsilon cache claim 4 , and if the epsilon ...

Подробнее
07-02-2013 дата публикации

LOOKUP FRONT END PACKET INPUT PROCESSOR

Номер: US20130034100A1
Принадлежит:

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host. 1. A method of processing a packet comprising:receiving a lookup request including a packet header of a packet and an associated group identifier (GID);generating at least one key based on data of the packet header;comparing the GID against a global definition table to determine at least one table identifier (TID);determining, based on the TID, a subset of processing clusters that are capable of operating rule matching for the packet;selecting one of the processing clusters of the subset based on availability; andforwarding at least one key request to the selected processing cluster, the key request including the at least one key and the at least one TID to initiate rule matching using the key.2. The method of claim 1 , further comprising comparing the GID against a global definition table to determine a packet header index (PHIDX).3. The method of claim 2 , wherein the at least one key is generated according to the PHIDX.4. The method of claim 3 , wherein the PHIDX indexes an entry in a packet header table (PHT) claim 3 , the entry indicating rules for extracting data from the packet header to generate the at least one key.5. The method of claim 1 , further comprising comparing the GID against a global definition table to determine a key format table index (KFTIDX).6. The method of claim 5 , wherein the KFTIDX indexes ...

Подробнее
07-02-2013 дата публикации

LOOKUP CLUSTER COMPLEX

Номер: US20130034106A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. A method of processing a packet comprising:receiving a key request including a key and a table identifier (TID), the key including data extracted from a packet;parsing the key to extract at least one field;selecting at least one entry in a tree access table indicated by the TID, the entry providing a starting address of a path to a set of rules stored in a memory;processing the entry, based on the at least one field, to determine at least one bucket, the at least one bucket including pointers to a subset of rules, the subset of rules being a portion of the set of rules;retrieving the subset of rules from the memory;applying the at least one field against the subset of rules; andoutputting a response signal indicating whether the at least one field matches at least one rule of the subset of rules.2. The method of claim 1 , wherein the key request includes a key format table index.3. The method of claim 2 , wherein parsing the key is based on the key format table index.4. The method of claim 1 , wherein the set of rules is a portion of rules stored in the memory.5. The method of claim 1 , wherein the at least one bucket includes a plurality of buckets.6. The method of claim 5 , wherein the entry includes a node associated with the plurality of buckets.7. The method of claim 6 , wherein processing the entry includes processing the node to determine the plurality of buckets.8. The method of claim 6 , wherein the node is associated with the ...

Подробнее
07-02-2013 дата публикации

System and Method for Storing Lookup Request Rules in Multiple Memories

Номер: US20130036083A1
Принадлежит: Cavium, Inc.

In one embodiment, a system includes a data navigation unit configured to navigate through a data structure stored in a first memory to a first representation of at least one rule. The system further includes at least one rule processing unit configured to a) receive the at least one rule based on the first representation of the at least one rule from a second memory to one of the rule processing unit, and b) processing a key using the at least one rule. 1. A system comprising:a data navigation unit configured to navigate through a data structure stored in a first memory to a first representation of at least one rule; andat least one rule processing unit configured to a) receive the at least one rule based on the first representation of the at least one rule from a second memory to one of the rule processing unit, and b) processing a key using the at least one rule.2. The system of claim 1 , wherein the first memory is an on-chip memory and the second memory is an external memory.3. The system of claim 1 , wherein the representation of the at least one rule is a pointer to an address in the second memory.4. The system of claim 1 , wherein the rule processing units are further configured to load the at least one rule from the first memory or the second memory.5. The system of claim 1 , wherein a first of the rule processing units is configured to load the at least one rule from the first memory and a second of the rule processing units is configured to load the at least one rule from the external memory.6. The system of claim 1 , wherein the rule processing units are further configured to (i) load the at least one rule from the first memory if the representation is a pointer to a bucket in the first memory claim 1 , and (ii) load the at least one rule from the second memory if the representation is a pointer to the at least one rule in the second memory.7. The system of claim 6 , wherein the rule processing units are further configured to (iii) load the at least one ...

Подробнее
07-02-2013 дата публикации

INCREMENTAL UPDATE

Номер: US20130036102A1
Принадлежит:

A system, apparatus, and method are provided for adding, deleting, and modifying rules in one update from the perspective of an active search process for packet classification. While a search processor searches for one or more rules that match keys generated from received packets, there is a need to add, delete, or modify rules. By adding, deleting, and modifying rules in one update from the perspective of an active search process for packet classification, performance and functionality of the active search process may be maintained, thereby preventing packet loss and preserving throughput. 1. A method comprising:receiving an incremental update for a Rule Compiled Data Structure (RCDS), the Rule Compiled Data Structure (RCDS) representing a set of rules for packet classification, the Rule Compiled Data Structure (RCDS) being utilized for packet classification by an active search process; andatomically updating the Rule Compiled Data Structure (RCDS) based on the incremental update received, the Rule Compiled Data Structure (RCDS) being atomically updated from the perspective of the active search process utilizing the RCDS.2. The method of claim 1 , wherein atomically updating the Rule Compiled Data Structure (RCDS) includes:restricting a state of the Rule Compiled Data Structure (RCDS) to a before state and an after state, the before state being a state of the Rule Compiled Data Structure (RCDS) before receiving the incremental update for the RCDS, the after state being a state of the Rule Compiled Data Structure (RCDS) after a series of one or more modifications to the Rule Compiled Data Structure (RCDS) has been completed, the series of one or more modifications having been completed based on the incremental update received, the series of one or more modifications being visible to the active search process based on performing one update to the Rule Compiled Data Structure (RCDS) being searched.3. The method of claim 1 , wherein updating the Rule Compiled Data ...

Подробнее
07-02-2013 дата публикации

WORK MIGRATION IN A PROCESSOR

Номер: US20130036151A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. A work product may be migrated between lookup engines to complete the rule matching process. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. An apparatus for processing a packet comprising:a plurality of clusters, each cluster including a plurality of processors for processing lookup requests and a local memory storing a set of rules; anda front-end configured to forward the lookup requests to the plurality of clusters and receive response messages from the plurality of clusters; receive a key request including a key and a table identifier (TID), the key including data extracted from a packet;', 'generate a work product associated with the key request, the work product corresponding to a process of rule-matching at least one field of the key;', 'determine whether to forward the work product to another of the plurality of clusters; and', 'based on the determination, forward the work product to another of the plurality of clusters., 'each of the plurality of clusters being configured to2. The apparatus of claim 1 , wherein the front-end includes a table storing information on the rules at each of the plurality of clusters.3. The apparatus of claim 2 , wherein the front-end is configured to forward the key request and a key identifier (KID) corresponding to the key to one of the plurality of clusters based on the table.4. The apparatus of claim 2 , wherein the front-end is configured to forward the key to a subset of the plurality of clusters based on the table.5. The apparatus of ...

Подробнее
07-02-2013 дата публикации

LOOKUP FRONT END PACKET OUTPUT PROCESSOR

Номер: US20130036152A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. The lookup front-end further processes the response message and provides a corresponding response to the host. 1. A method of processing a packet comprising:receiving a plurality of responses from a processing cluster, the processing cluster performing rule matching for a packet, the plurality of responses being responsive to lookup requests associated with the packet;determining whether the plurality of responses is to be coalesced based on a coalesce bit associated with the packet; andoutputting a lookup result to a host processor, the lookup result including a selection of the plurality of responses based on whether the plurality of responses is to be coalesced.2. The method of claim 1 , further comprising updating a lookup response table (LRT) based on the plurality of responses claim 1 , the LRT indicating in-process lookup requests at a plurality of processing clusters.3. The method of claim 2 , wherein the coalesce bit is stored at the LRT.4. The method of claim 1 , further comprising forwarding the plurality of responses to a transmit buffer.5. The method of claim 4 , further comprising configuring a slot of the transmit buffer for placement of the plurality of responses prior to receipt of the plurality of responses.6. The method of claim 5 , wherein the selection of the plurality of responses is placed into the slot.7. The method of claim 5 , wherein the slot is configured having a predetermined order relative to other ...

Подробнее
07-02-2013 дата публикации

METHOD AND APPARATUS FOR MANAGING TRANSPORT OPERATIONS TO A CLUSTER WITHIN A PROCESSOR

Номер: US20130036185A1
Принадлежит: Cavium, Inc.

A method and corresponding apparatus of managing transport operations between a first memory cluster and one or more other memory clusters, include receiving, in the first cluster, information related to one or more transport operations with related data buffered in an interface device, the interface device coupling the first cluster to the one or more other clusters, selecting at least one transport operation, from the one or more transport operations, based at least in part on the received information, and executing the selected at least one transport operation. 1. A method of managing transport operations between a first memory cluster and one or more other memory clusters , the method comprising:receiving, in the first memory cluster, information related to one or more transport operations with related data buffered in an interface device, the interface device coupling the first memory cluster to the one or more other memory clusters;selecting at least one transport operation, from the one or more transport operations, based at least in part on the received information; andexecuting the selected at least one transport operation.2. A method according to claim 1 , wherein the received information includes:information indicative of one or more types of the one or more transport operations; orinformation indicative of one or more source memory clusters associated with the one or more transport operations.3. A method according to claim 1 , wherein the one or more transport operations include at least one processing thread migration operation.4. A method according to claim 3 , wherein a processing thread migration operation claim 3 , among the at least one processing thread migration operation claim 3 , is executable in two or more clock cycles.5. A method according to claim 3 , wherein a processing thread includes a tree search thread or bucket search thread.6. A method according to claim 3 , wherein a processing thread migration operation claim 3 , among the at ...

Подробнее
07-02-2013 дата публикации

On-chip memory (ocm) physical bank parallelism

Номер: US20130036274A1
Принадлежит: Cavium LLC

According to an example embodiment, a processor is provided including an integrated on-chip memory device component. The on-chip memory device component includes a plurality of memory banks, and multiple logical ports, each logical port coupled to one or more of the plurality of memory banks, enabling access to multiple memory banks, among the plurality of memory banks, per clock cycle, each memory bank accessible by a single logical port per clock cycle and each logical port accessing a single memory bank per clock cycle.

Подробнее
07-02-2013 дата публикации

METHOD AND APPARATUS FOR MANAGING PROCESSING THREAD MIGRATION BETWEEN CLUSTERS WITHIN A PROCESSOR

Номер: US20130036285A1
Принадлежит: Cavium, Inc.

A method, and corresponding apparatus, of managing processing thread migrations within a plurality of memory clusters, includes embedding, in memory components of the plurality of memory clusters, instructions indicative of processing thread migrations; storing, in one or more memory components of a particular memory cluster among the plurality of memory clusters, data configured to designate the particular memory cluster as a sink memory cluster, the sink memory cluster preventing an incoming migrated processing thread from migrating out of the sink memory cluster; and processing one or more processing threads, in one or more of the plurality of memory clusters, in accordance with at least one of the embedded migration instructions and the data stored in the one or more memory components of the sink memory cluster. 1. A method of managing processing thread migrations within a plurality of memory clusters , the method comprising:embedding, in memory components of the plurality of memory clusters, instructions indicative of processing thread migrations;storing, in one or more memory components of a particular memory cluster among the plurality of memory clusters, data configured to designate the particular memory cluster as a sink memory cluster, the sink memory cluster preventing an incoming migrated processing thread from migrating out of the sink memory cluster; andprocessing one or more processing threads, in one or more of the plurality of memory clusters, in accordance with at least one of the embedded migration instructions and the data stored in the one or more memory components of the sink memory cluster.2. A method according to claim 1 , wherein the one or more processing threads include at least one a tree search thread.3. A method according to claim 1 , wherein the one or more processing threads include at least one bucket search thread.4. A method according to claim 1 , wherein the instructions indicative of processing thread migrations include ...

Подробнее
07-02-2013 дата публикации

System and Method for Rule Matching in a Processor

Номер: US20130036471A1
Принадлежит: Cavium, Inc.

In one embodiment, a system includes a format block configured to receive a key, at least one rule, and rule formatting information. The rule can have one or more dimensions. The format block can be further configured to extract each of the dimensions from the at least one rule. The system can further include a plurality of dimension matching engines (DME). Each DME can be configured to receive the key and a corresponding formatted dimension, and process the key and the corresponding dimension for returning a match or nomatch. The system can further include a post processing block configured to analyze the matches or no matches returned from the DMEs and return a response based on the returned matches or nomatches. 1. A system comprising:a format block configured to (a) receive a key, at least one rule, and rule formatting information, the rule having at least one dimension and (b) extract each of the dimensions from the at least one rule;a plurality of dimension matching engines (DME), each DME configured to receive the key and a corresponding formatted dimension, and process the key and the corresponding dimension for returning a match or nomatch; anda post processing block configured to analyze the matches or no matches returned from the DMEs and return a response based on the returned matches or nomatches.2. The system of claim 1 , wherein the format block includes:a start block configured to find rule starts, mark invalid or deactivated rules, and pre-calculate terms of the dimensions;a middle block configured to remove marked rules, extract rule format from headers, and extract priority from headers;a tween block configured to calculate rule header end position information and rule dimension end position information; anda finish block configured to calculate control for the DMEs.3. The system of wherein the DMEs are further configured to match the key to at least one of a range indicated by a minimum and maximum in the corresponding dimension claim 1 , an ...

Подробнее
07-02-2013 дата публикации

Method and Apparatus Encoding a Rule for a Lookup Request in a Processor

Номер: US20130036477A1
Принадлежит: Cavium, Inc.

In one embodiment, a method includes encoding a key matching rule having at least one dimension by storing in a memory (i) a header of the key matching rule that has at least one header field, and (ii) at least one rule value field of the key matching rule corresponding to one of the dimensions. 1. A method comprising:encoding a key matching rule having at least one dimension by storing in a memory (i) a header of the key matching rule having at least one header field, and (ii) at least one rule value field of the key matching rule corresponding to one of the dimensions.2. The method of claim 1 , wherein storing the header of the key matching rule further includes at least one of:(a) storing a length of the key matching rule;(b) storing a match type of the key matching rule corresponding to one of the dimensions;(c) storing an enable value corresponding to the one of the dimensions, wherein the one of the dimensions is enabled if the enable value has a first value and the one of the dimensions is disabled if the enable value has a second value, wherein disabling matching of the one dimension masks the one dimension;(d) storing a rule validity value corresponding to the key matching rule, wherein the key matching rule is enabled if the rule validity value has a first value and the key matching rule is disabled if the rule validity value has a second value; and(e) storing a priority value corresponding to the key matching rule, wherein the priority value indicates a priority of the key matching rule compared to a plurality of key matching rules.3. The method of claim 2 , wherein disabling matching of the one of the dimensions further disabling storage of the at least one rule value field of the key corresponding to the one of the dimensions.4. The method of claim 2 , wherein the match type field includes an indication of at least one of a prefix match claim 2 , an exact match claim 2 , a mask match claim 2 , and a range match claim 2 , wherein the prefix match is ...

Подробнее
07-03-2013 дата публикации

Identifying Duplication in Decision Trees

Номер: US20130060727A1
Принадлежит: Cavium, Inc.

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure including a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. By identifying duplication in decision trees, the methods may produce wider, shallower trees that result in shorter search times and reduced memory requirements for storing the trees. 1. A method comprising:using a classifier table having a plurality of rules, the plurality of rules having at least one field, building a decision tree structure including a plurality of nodes, each node representing a subset of the plurality of rules;grouping rules based on whether or not rules overlap;assigning priority values to the plurality of rules including assigning unique priority values within each group of overlapping rules and enabling non-overlapping rules to have a same priority value; andstoring the decision tree structure including storing the plurality of rules and the priority value assigned.2. An apparatus comprising:a memory;a processor coupled to the memory, the processor configured to:use a classifier table having a plurality of rules, the plurality of rules having at least one field;build a decision tree structure including a plurality of nodes, each node representing a subset of the plurality of rules;group rules based on whether or not rules overlap;assign priority values to the plurality of rules including assigning unique priority values within each group of overlapping rules and enabling non-overlapping rules to have a same priority value; andstore the decision tree structure in the memory including storing the plurality of rules and the priority value assigned.3. A non-transient computer-readable medium having encoded thereon a sequence of ...

Подробнее
04-04-2013 дата публикации

Decision Tree Level Merging

Номер: US20130085978A1
Принадлежит: Cavium, Inc.

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure including a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. By merging levels of decision trees, the methods may produce wider, shallower trees that result in shorter search times and reduced memory requirements for storing the trees. 1. A method comprising:using a classifier table having a plurality of rules, the plurality of rules having at least one field, building a decision tree structure including a plurality of nodes, each node representing a subset of the plurality of rules;determining for each level of the decision tree whether to merge grandchildren of a parent node with child nodes of the parent node based on a merge resulting in a total number of child nodes of the parent node not being more than a given threshold;merging at each level the grandchildren of the parent node with child nodes of the parent node based on the determination; andstoring the decision tree structure.2. The method of wherein merging at each level includes merging cuts of each child node into cuts of the parent node resulting in new child nodes of the parent node.3. The method of further wherein each parent node has been cut on a first field set of one or more fields of the at least one field resulting in the child nodes of the parent node claim 2 , and each child node has been cut on a second field set of one or more fields of the at least one field resulting in the grandchildren of the parent node claim 2 , and further wherein merging cuts of each child node into cuts of the parent node includes: (i) a first set of one or more bits of the respective field, the first set of one or more bits being a first set of all bits of the ...

Подробнее
25-04-2013 дата публикации

SYSTEM AND METHOD TO REDUCE MEMORY ACCESS LATENCIES USING SELECTIVE REPLICATION ACROSS MULTIPLE MEMORY PORTS

Номер: US20130103904A1
Принадлежит: Cavium, Inc.

In one embodiment, a system comprises multiple memory ports distributed into multiple subsets, each subset identified by a subset index and each memory port having an individual wait time. The system further comprises a first address hashing unit configured to receive a read request including a virtual memory address associated with a replication factor, and referring to graph data. The first address hashing unit translates the replication factor into a corresponding subset index based on the virtual memory address, and converts the virtual memory address to a hardware based memory address that refers to graph data in the memory ports within a subset indicated by the corresponding subset index. The system further comprises a memory replication controller configured to direct read requests to the hardware based address to the one of the memory ports within the subset indicated by the corresponding subset index with a lowest individual wait time. 1. A system comprising:a plurality of memory ports, the plurality of memory ports distributed into a plurality of subsets, each subset identified by a subset index, each of the plurality of memory ports having an individual wait time based on a respective workload;a first address hashing unit configured to receive a read request including a virtual memory address, the virtual memory address associated with a replication factor, the virtual memory address referring to graph data, translate the replication factor into a corresponding subset index based on the virtual memory address, and convert the virtual memory address to a hardware based memory address, the hardware based address referring to graph data in at least one of the memory ports within a subset indicated by the corresponding subset index;a memory replication controller configured to direct read requests to the hardware based address to the one of the plurality of memory ports within the subset indicated by the corresponding subset index with a lowest individual ...

Подробнее
25-04-2013 дата публикации

SYSTEM AND METHOD TO PROVIDE NON-COHERENT ACCESS TO A COHERENT MEMORY SYSTEM

Номер: US20130103909A1
Принадлежит: Cavium, Inc.

In one embodiment, a system comprises a memory and a memory controller that provides a cache access path to the memory and a bypass-cache access path to the memory, receives requests to read graph data from the memory on the bypass-cache access path and receives requests to read non-graph data from the memory on the cache access path. A method comprises receiving a request at a memory controller to read graph data from a memory on a bypass-cache access path, receiving a request at the memory controller to read non-graph data from the memory through a cache access path, and arbitrating, in the memory controller, among the requests using arbitration. 1. A system comprising:a memory;a memory controller providing a cache access path to the memory and a bypass-cache access path to the memory, the memory controller receiving requests to read graph data from the memory on the bypass-cache access path and receiving requests to read non-graph data from the memory on the cache access path.2. The system of where the cache access path receives requests to read graph data and non-graph data from the memory.3. The system of where the non-graph data includes packet data.4. The system of where the memory stores graph data and non-graph data.5. The system of where the memory controller reads the requested graph data or non-graph data.6. The system of where the memory controller receives the requests to read graph data from a co-processor.7. The system of where the co-processor includes at least one of a deterministic automata processing unit claim 6 , a nondeterministic automata processing unit claim 6 , and a hyper-finite automata processing unit.8. The system of where the co-processor is configured to stop sending read requests to the memory controller to stop the reading of selected graph data from the memory on the bypass-cache access path when the selected graph data is being written to the memory on the cache access path.9. The system of where the memory controller receives ...

Подробнее
23-05-2013 дата публикации

REVERSE NFA GENERATION AND PROCESSING

Номер: US20130133064A1
Принадлежит: Cavium, Inc.

In a processor of a security appliance, an input of a sequence of characters is walked through a finite automata graph generated for at least one given pattern. At a marked node of the finite automata graph, if a specific type of the at least one given pattern is matched at the marked node, the input sequence of characters is processed through a reverse non-deterministic finite automata (rNFA) graph generated for the specific type of the at least one given pattern by walking the input sequence of characters backwards through the rNFA beginning from an offset of the input sequence of characters associated with the marked node. Generating the rNFA for a given pattern includes inserting processing nodes for processing an input sequence of patterns to determine a match for the given pattern. In addition, the rNFA is generated from the given type of pattern. 1. A method comprising:in a processor of a security appliance coupled to a network:walking an input of a sequence of characters through a finite automata graph generated for at least one given pattern; and 'if a specific type of the at least one given pattern is matched at the marked node, processing the input sequence of characters through a reverse non-deterministic finite automata (rNFA) graph generated for the specific type of the at least one given pattern by walking the input sequence of characters backwards through the rNFA beginning from an offset of the input sequence of characters associated with the marked node.', 'at a marked node of the finite automata graph2. The method of further comprising:at the marked node, reporting a match of the at least one given pattern if the at least one given pattern is not the specific type of the at least one given pattern.3. The method of wherein the specific type of the at least one given pattern includes at least one of the following: start offset claim 1 , back reference claim 1 , capture group claim 1 , and assertion.4. The method of further comprising determining the ...

Подробнее
05-09-2013 дата публикации

DUPLICATION IN DECISION TREES

Номер: US20130232104A1
Принадлежит: Cavium, Inc.

A packet classification system, apparatus, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure for packet classification. Duplication in the decision tree may be identified, producing a wider, shallower decision tree that may result in shorter search times with reduced memory requirements for storing the decision tree. A number of operations needed to identify duplication in the decision tree may be reduced, thereby increasing speed and efficiency of a compiler building the decision tree. 1. A method comprising:building a decision tree structure representing a plurality of rules using a classifier table having the plurality of rules, the plurality of rules having at least one field;including a plurality of nodes in the decision tree structure, each node representing a subset of the plurality of rules, each node having a leaf node type or a non-leaf node type;linking each node having the leaf node type to a bucket, each node having the leaf node type being a leaf node, the bucket representing the subset of the plurality of rules represented by the leaf node;cutting each node having the non-leaf node type on one or more selected bits of a selected one or more fields of the at least one field creating one or more child nodes having the non-leaf node type or the leaf node type, each node cut being a parent node of the one or more child nodes created, the one or more child nodes created representing one or more rules of the parent node;identifying duplication in the decision tree structure;modifying the decision tree structure based on the identified duplication; andstoring the modified decision tree structure.2. The method of wherein identifying duplication includes claim 1 , for each leaf node:computing a hash value based on each rule and a total number ...

Подробнее
12-09-2013 дата публикации

PHASED BUCKET PRE-FETCH IN A NETWORK PROCESSOR

Номер: US20130239193A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. Based on a prefetch status, a selection of the subset of rules are retrieved for rule matching. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. An apparatus for processing a packet comprising: receive a key request including a key and a table identifier (TID), the key including data extracted from a packet;', 'parse the key to extract at least one field;', 'select at least one entry in a tree access table indicated by the TID, the entry providing a starting address of a set of rules stored in a memory; and', 'process the entry, based on the at least one field, to determine at least one bucket having a prefetch status and an ordered set of bucket entries, the bucket entries including pointers to respective subsets of rules, the subsets of rules each being a portion of the set of rules;, 'a tree walk engine (TWE) configured toa bucket-walk engine (BWE) configured to retrieve a selection of the subsets of rules from the memory, the selection corresponding to a configuration of the prefetch status;a rule-matching engine (RME) configured to apply the at least one field against each subset of rules of the selection and output a response signal indicating whether the at least one field matches at least one rule of the subsets of rules.2. The apparatus of claim 1 , wherein the RME is configured to apply the at least one field against each subset of rules independent of an order of the respective bucket entries.3. The apparatus of claim 1 , wherein the RME is configured to apply the at least ...

Подробнее
09-01-2014 дата публикации

System And Method To Reduce Memory Access Latencies Using Selective Replication Across Multiple Memory Ports

Номер: US20140013061A1
Принадлежит: Cavlum, Inc.

In one embodiment, a system comprises a plurality of memory ports. The memory ports are distributed into a plurality of subsets, where each subset is identified by a subset index. The system further comprises a first address hashing unit configured to receive a request including at least one virtual memory address. Each virtual memory address is associated with a replication factor, and the virtual memory address refers to graph data. The first address hashing unit translates the replication factor into a corresponding subset index based on the virtual memory address, and converts the virtual memory address to a hardware based memory address. The hardware based address refers to data in the memory ports within a subset indicated by the corresponding subset index. 1. A system comprising:a plurality of memory ports, the plurality of memory ports distributed into a plurality of subsets, each subset identified by a subset index;a first address hashing unit configured to receive a request including at least one virtual memory address, each virtual memory address associated with a replication factor, the virtual memory address referring to data, translate the replication factor into a corresponding subset index based on the virtual memory address, and convert the virtual memory address to a hardware based memory address, the hardware based address referring to data in at least one of the memory ports within a subset indicated by the corresponding subset index;a memory replication controller configured to direct requests to the hardware based address to the plurality of memory ports within the subset indicated by the corresponding subset index.2. The system of claim 1 , further comprising a second address hashing unit configured to receive a request for data including at least one virtual memory address claim 1 , and to convert each virtual memory address to a hardware based memory address claim 1 , the hardware based memory address referring to a separate one of the ...

Подробнее
27-02-2014 дата публикации

Multiple Core Session Initiation Protocol (SIP)

Номер: US20140059241A1
Принадлежит: Cavium, Inc.

A Session Initiation Protocol (SIP) proxy server including a multi-core central processing unit (CPU) is presented. The multi-core CPU includes a receiving core dedicated to pre-SIP message processing. The pre-SIP message processing may include message retrieval, header and payload parsing, and Call-ID hashing. The Call-ID hashing is used to determine a post-SIP processing core designated to process messages between particular user pair. The pre-SIP and post-SIP configuration allows for the use of multiple processing cores to utilize a single control plane, thereby providing an accurate topology of the network for each processing core. 1. A method for processing protocol messages , the method comprising:given a plurality of different media sessions between users, for each of the media sessions, analyzing protocol messages at a receiving processor unit in order to determine an associated call identifier, wherein the call identifier uniquely identifies protocol messages associated with one of the media sessions, wherein for each media session, corresponding protocol messages are identified by a single respective call identifier;directing the protocol messages, based on the call identifier, to respective processing units from a plurality of processing units, the processing units configured to perform processing of the protocol messages.2. The method of further comprising processing the protocol messages in respective processing units in a parallel manner claim 1 , wherein for each media session claim 1 , all the corresponding protocol messages are directed to a single respective processing unit.3. The method of wherein each processing unit operates within a single control plane.4. The method of wherein the single control plane is a private branch exchange (PBX) claim 3 , a SIP Media Gateway processor claim 3 , or a SIP Proxy server.5. The method of further comprising hashing the call identifier with a hash function at the receiving processor unit in order to determine ...

Подробнее
04-01-2018 дата публикации

ENGINE ARCHITECTURE FOR PROCESSING FINITE AUTOMATA

Номер: US20180004483A1
Принадлежит:

An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network. 1. An apparatus comprising:at least one hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing, the at least one HNA processor including:a plurality of clusters, each cluster of the plurality of clusters including a plurality of HNA processing units (HPUs);an HNA on-chip instruction queue configured to store at least one HNA instruction; andan HNA scheduler, the HNA scheduler configured to select a given HPU of the plurality of HPUs of the plurality of clusters and assign the at least one HNA instruction to the given HPU selected in order to initiate matching at least one regular expression pattern in an input stream received from a network.2. The apparatus of claim 1 , wherein the apparatus further comprises a plurality of super-clusters claim 1 , each super-cluster including a corresponding plurality of clusters of the plurality of clusters and a super-cluster graph memory exclusive to a corresponding super-cluster claim 1 , the super-cluster graph memory accessible to a corresponding plurality of HPUs of the corresponding plurality of clusters of the corresponding super-cluster and configured to store a subset of nodes of at least one per-pattern NFA statically ...

Подробнее
10-01-2019 дата публикации

DATA PROCESSING UNIT FOR COMPUTE NODES AND STORAGE NODES

Номер: US20190012278A1
Принадлежит:

A new processing architecture is described in which a data processing unit (DPU) is utilized within a device. Unlike conventional compute models that are centered around a central processing unit (CPU), example implementations described herein leverage a DPU that is specially designed and optimized for a data-centric computing model in which the data processing tasks are centered around, and the primary responsibility of, the DPU. For example, various data processing tasks, such as networking, security, and storage, as well as related work acceleration, distribution and scheduling, and other such tasks are the domain of the DPU. The DPU may be viewed as a highly programmable, high-performance input/output (I/O) and data-processing hub designed to aggregate and process network and storage I/O to and from multiple other components and/or devices. This frees resources of the CPU, if present, for computing-intensive tasks. 1. A device comprising:one or more storage devices; and a networking unit configured to control input and output of data between the data processing unit and a network,', 'a plurality of programmable processing cores configured to perform processing tasks on the data, and', 'one or more host units configured to at least one of control input and output of the data between the data processing unit and one or more application processors or control storage of the data with the storage devices., 'a data processing unit communicatively coupled to the storage devices, the data processing unit comprising2. The device of claim 1 , wherein at least one of the application processors comprises a central processing unit (CPU) claim 1 , and wherein the data processing unit is positioned between and communicatively coupled to the CPU and the storage devices claim 1 , wherein the data processing unit is configured to retrieve the data from the storage devices on behalf of the CPU claim 1 , store the data to the storage devices on behalf of the CPU claim 1 , and ...

Подробнее
10-01-2019 дата публикации

Data processing unit for stream processing

Номер: US20190012350A1
Принадлежит: Fungible Inc

A new processing architecture is described that utilizes a data processing unit (DPU). Unlike conventional compute models that are centered around a central processing unit (CPU), the DPU that is designed for a data-centric computing model in which the data processing tasks are centered around the DPU. The DPU may be viewed as a highly programmable, high-performance I/O and data-processing hub designed to aggregate and process network and storage I/O to and from other devices. The DPU comprises a network interface to connect to a network, one or more host interfaces to connect to one or more application processors or storage devices, and a multi-core processor with two or more processing cores executing a run-to-completion data plane operating system and one or more processing cores executing a multi-tasking control plane operating system. The data plane operating system is configured to support software functions for performing the data processing tasks.

Подробнее
16-01-2020 дата публикации

DETERMININSTIC FINITE AUTOMATA NODE CONSTRUCTION AND MEMORY MAPPING FOR REGULAR EXPRESSION ACCELERATOR

Номер: US20200019339A1
Принадлежит:

An example processing device includes a memory including a discreet finite automata (DFA) buffer configured to store at least a portion of a DFA graph, the DFA graph comprising a plurality of nodes, each of the nodes having zero or more arcs each including a respective label and pointing to a respective subsequent node of the plurality of nodes, at least one of the plurality of nodes comprising a match node, wherein the at least portion of the DFA graph comprises one or more slots of a memory slice, the one or more slots comprising data representing one or more of the arcs for at least one node of the plurality of nodes, and a DFA engine implemented in circuitry, the DFA engine comprising one or more DFA threads implemented in circuitry and configured to evaluate a payload relative to the DFA graph. 1. A processing device comprising:a memory including a discreet finite automata (DFA) buffer configured to store at least a portion of a DFA graph, the DFA graph comprising a plurality of nodes, each of the nodes having zero or more arcs each including a respective label and pointing to a respective subsequent node of the plurality of nodes, at least one of the plurality of nodes comprising a match node, wherein the at least portion of the DFA graph comprises one or more slots of a memory slice, the one or more slots comprising data representing one or more of the arcs for at least one node of the plurality of nodes; and a current node memory storing a value representing a current node of the plurality of nodes in the DFA graph; and', 'a payload offset memory storing a value representing a position of a current symbol in a sequence of symbols of payload data,, 'a DFA engine implemented in circuitry, the DFA engine comprising one or more DFA threads implemented in circuitry, each of the DFA threads comprising determine a label of one of the arcs of the current node that matches the current symbol;', 'update the value of the current node memory to a value representative of ...

Подробнее
16-01-2020 дата публикации

INSTRUCTION-BASED NON-DETERMINISTIC FINITE STATE AUTOMATA ACCELERATOR

Номер: US20200019404A1
Принадлежит:

An example processing device includes a memory including a non-deterministic finite automata (NFA) buffer configured to store a plurality of instructions defining an ordered sequence of instructions of at least a portion of an NFA graph, the portion of the NFA graph comprising a plurality of nodes arranged along a plurality of paths. The NFA engine determines a current symbol and one or more subsequent symbols of a payload segment that satisfy a match condition specified by a subset of instructions of the plurality of instructions for a path of the plurality of paths and in response to determining the current symbol and the one or more subsequent symbols of the payload segment that satisfy the match condition, outputs an indication that the payload data has resulted in a match. 1. A processing device comprising:a memory including a non-deterministic finite automata (NFA) buffer configured to store a plurality of instructions defining an ordered sequence of instructions of at least a portion of an NFA graph, the portion of the NFA graph comprising a plurality of nodes arranged along a plurality of paths; and a program counter storing a value defining a next instruction of the plurality of instructions; and', determine the current symbol and one or more subsequent symbols of the payload segment that satisfy a match condition specified by a subset of instructions of the plurality of instructions for a path of the plurality of paths, the subset of instructions comprising the next instruction and one or more subsequent instructions of the plurality of instructions; and', 'in response to determining the current symbol and the one or more subsequent symbols of the payload segment that satisfy the match condition, output an indication that the payload data has resulted in a match., 'a payload offset memory storing a value defining a position of a current symbol in an ordered sequence of symbols of a payload segment of payload data, the NFA engine further comprising a ...

Подробнее
05-03-2015 дата публикации

Generating a Non-Deterministic Finite Automata (NFA) Graph for Regular Expression Patterns with Advanced Features

Номер: US20150066927A1
Принадлежит: Cavium LLC

In an embodiment, a method of compiling a pattern into a non-deterministic finite automata (NFA) graph includes examining the pattern for a plurality of elements and a plurality of node types. Each node type can correspond with an element. Each element of the pattern can be matched at least zero times. The method further includes generating a plurality of nodes of the NFA graph. Each of the plurality of nodes can be configured to match for one of the plurality of elements. The node can indicate the next node address in the NFA graph, a count value, and/or node type corresponding to the element. The node can also indicate the element representing a character, character class or string. The character can also be a value or a letter.

Подробнее
05-03-2015 дата публикации

Traversal With Arc Configuration Information

Номер: US20150066991A1
Автор: Goyal Rajan
Принадлежит:

An apparatus, and corresponding method, for generating a graph used in performing a search for a match of at least one expression in an input stream is presented. The graph includes a number of interconnected nodes connected solely by valid arcs. A valid arc may also include a nodal bit map including structural information of a node to which the valid arc points to. A walker process may utilize the nodal bit map to determine if a memory access is necessary. The nodal bit map reduces the number of external memory access and therefore reduces system run time. 1. A computer implemented method comprising:given a current node and an arc pointing from the current node to a next node, analyzing arcs in a data structure to determine which of the arcs are valid arcs pointing from the next node;constructing arc configuration information associated with the next node, the arc configuration information representing the valid arcs pointing from the next node; andstoring the arc configuration information associated with the next node, enabling the arc configuration information to be evaluated and the valid arcs pointing from the next node to be identified from the evaluation of the arc configuration information without the next node being read.2. The method of claim 1 , wherein storing the arc configuration information includes storing the arc configuration information in the arc pointing from the current node to the next node.3. The method of claim 1 , wherein storing the arc configuration information includes storing the arc configuration information in a table accessed via an identifier of the next node.4. The method of claim 1 , wherein the data structure is an automata graph.5. The method of claim 1 , wherein the arc configuration information comprises a bit map.6. The method of claim 1 , wherein constructing the arc configuration information associated with the next node includes:providing a listing of indicator values, each indicator value being associated with a ...

Подробнее
05-03-2015 дата публикации

Engine Architecture for Processing Finite Automata

Номер: US20150067123A1
Принадлежит:

An engine architecture for processing finite automata includes a hyper non-deterministic automata (HNA) processor specialized for non-deterministic finite automata (NFA) processing. The HNA processor includes a plurality of super-clusters and an HNA scheduler. Each super-cluster includes a plurality of clusters. Each cluster of the plurality of clusters includes a plurality of HNA processing units (HPUs). A corresponding plurality of HPUs of a corresponding plurality of clusters of at least one selected super-cluster is available as a resource pool of HPUs to the HNA scheduler for assignment of at least one HNA instruction to enable acceleration of a match of at least one regular expression pattern in an input stream received from a network. 1. A security appliance operatively coupled to a network , the security appliance comprising:at least one Central Processing Unit (CPU) core; and a plurality of super-clusters, each super-cluster including a plurality of clusters, each cluster of the plurality of clusters including a plurality of HNA processing units (HPUs), the at least one CPU core configured to select at least one super-cluster of the plurality of super-clusters;', 'an HNA on-chip instruction queue configured to store at least one HNA instruction; and', 'an HNA scheduler configured to select a given HPU of the plurality of HPUs of the plurality of clusters of the at least one super-cluster selected and assign the at least one HNA instruction to the given HPU selected in order to initiate matching at least one regular expression pattern in an input stream received from the network., 'at least one hyper non-deterministic automata (HNA) processor operatively coupled to the at least one CPU core and specialized for non-deterministic finite automata (NFA) processing, the at least one HNA processor including2. The security appliance of claim 1 , wherein each super-cluster further includes a super-cluster graph memory exclusive to a corresponding super-cluster claim ...

Подробнее
05-03-2015 дата публикации

System and Method to Traverse a Non-Deterministic Finite Automata (NFA) Graph Generated for Regular Expression Patterns with Advanced Features

Номер: US20150067836A1
Принадлежит: Cavium, Inc.

In one embodiment, a method of walking an non-deterministic finite automata (NFA) graph representing a pattern includes extracting a node type and an element from a node of the NFA graph. The method further includes matching a segment of a payload for the element by matching the payload for the element at least zero times, the number of times based on the node type. 1. A method of walking a non-deterministic finite automata (NFA) graph representing a pattern , the method comprising:extracting a node type, a next node address, and an element from a node of the NFA graph; andmatching a segment of a payload with the element by matching the payload with the element at least zero times, the number of times based on the node type.2. The method of claim 1 , wherein the node type is at least one of a variable count claim 1 , fixed count claim 1 , fixed-variable count claim 1 , character claim 1 , case insensitive character claim 1 , character class claim 1 , case sensitive string claim 1 , case insensitive string claim 1 , marked claim 1 , split.3. The method of claim 1 , further comprising pushing an entry to a run stack claim 1 , the entry indicating at least one of the node type claim 1 , an address of a next node in the graph claim 1 , a payload offset claim 1 , a count value claim 1 , a duplicate bit claim 1 , a reverse bit.4. The method of claim 3 , wherein pushing the entry to the run stack is based on the node type.5. The method of claim 4 , wherein pushing the entry to the run stack is performed if the node type is variable count claim 4 , fixed-variable count claim 4 , or split.6. The method of claim 1 , further comprising:popping a top entry of a run stack;loading a graph node indicated by the popped top entry; andmatching the segment of the payload with the element, the element indicated in the loaded graph node, at a payload offset indicated in the popped top entry.7. The method of claim 6 , wherein popping the top entry of the run stack is performed after a ...

Подробнее
05-03-2015 дата публикации

METHOD AND APPARATUS FOR PROCESSING FINITE AUTOMATA

Номер: US20150067863A1
Принадлежит: Cavium, Inc.

A method and corresponding apparatus for run time processing use a Deterministic Finite Automata (DFA) and Non-Deterministic Finite Automata (NFA) to find the existence of a pattern in a payload. A subpattern may be selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic. The DFA may be generated from selected subpatterns from all patterns in the set, and at least one NFA may be generated for at least one pattern in the set, optimizing run time performance of the run time processing. 1. A security appliance operatively coupled to a network , the security appliance comprising:at least one memory; walk characters of a payload through a unified deterministic finite automata (DFA) stored in the at least one memory, by traversing nodes of the unified DFA with characters from the payload, the unified DFA generated from subpatterns selected from each pattern in a set of one or more regular expression patterns based on at least one heuristic; and', 'walk characters of the payload through at least one non-deterministic finite automata (NFA) stored in the at least one memory, by traversing nodes of the at least one NFA with characters from the payload, the at least one NFA generated for at least one pattern in the set, a portion of the at least one pattern used for generating the at least one NFA, and at least one walk direction for walking characters through the at least one NFA, being based on whether a length of a subpattern selected from the at least one pattern is fixed or variable and a location of the subpattern selected within the at least one pattern., 'at least one processor operatively coupled to the at least one memory, the at least one processor configured to2. The security appliance of claim 1 , wherein the at least one processor is further configured to report a match of the at least one pattern in the payload based on traversing an NFA node claim 1 , of the at least one NFA claim 1 , associated with ...

Подробнее
10-03-2016 дата публикации

ANCHORED PATTERNS

Номер: US20160070818A1
Принадлежит:

A method and apparatus relate to recognizing anchored patterns from an input stream. Patterns from a plurality of given patterns are marked as anchored patterns. An anchored state tree for the anchored patterns of the plurality of given patterns is built, including nodes representing a state of the anchored state tree. For each node of the anchored state tree, a failure value equivalent to a node representing a state in an unanchored state tree representing unanchored patterns of the plurality of given patterns is determined. 1. A method comprising:in a processor:building an unanchored state graph for unanchored patterns of a plurality of given patterns, the unanchored state graph including nodes representing a state of the unanchored state graph;building a separate anchored state graph for given patterns, of the plurality of given patterns, marked as anchored patterns, the anchored state graph including nodes representing a state of the anchored state graph;for each node of the anchored state graph, determining a failure value equivalent to a node representing a state in an unanchored state graph representing unanchored patterns of the plurality of given patterns; andincluding a failure value of a root node of the anchored state graph, the failure value being equivalent to a root node of the unanchored state graph.2. The method of claim 1 , wherein the anchored state graph includes a root node that is set as a start node for processing anchored and unanchored patterns in an input payload.3. The method of claim 1 , further comprising marking the given patterns claim 1 , of the plurality of given patterns claim 1 , as the anchored patterns.4. The method of claim 3 , wherein marking includes adding a reference indicating a location within an input of a string of text to begin searching for the respective anchored pattern.5. A method comprising:in a processor:building an unanchored state graph for unanchored patterns of a plurality of given patterns, the unanchored ...

Подробнее
01-04-2021 дата публикации

DATA INGESTION AND STORAGE BY DATA PROCESSING UNIT HAVING STREAM-PROCESSING HARDWARE ACCELERATORS

Номер: US20210097047A1
Принадлежит:

A system comprises a data processing unit (DPU) integrated circuit having programmable processor cores and hardware-based accelerators configured for processing streams of data units; and software executing on one or more of the processing cores. In response to a request to perform an operation on a set of one or more data tables, each having one or more columns of data arranged in a plurality of rows, the software configures the DPU to: input at least a portion of the rows of each of the database tables as at least one or more streams of data units, process the one or more streams of data units with the hardware-based accelerators to apply one or more of compression, encoding or encryption to produce a resultant stream of data units; and write the resultant stream of data units to a storage in a tree data structure. 1. A system comprising:a data processing unit (DPU) integrated circuit having programmable processor cores and hardware-based accelerators configured for processing streams of data units; andsoftware executing on one or more of the processing cores, input at least a portion of the rows of each of the database tables as at least one or more streams of data units,', 'process the one or more streams of data units with the hardware-based accelerators to apply one or more of compression, encoding or encryption to produce a resultant stream of data units; and', 'write the resultant stream of data units to a storage in a tree data structure having a root node pointing to a set of one or more table nodes that each correspond to a respective one of the data tables,, 'wherein, in response to a request to perform an operation on a set of one or more data tables, each having one or more columns of data arranged in a plurality of rows, the software configures the DPU to2. The system of claim 1 , wherein the DPU is configured to write the resultant stream of data units in the tree data structure so that each of the table nodes points to one or more column nodes that ...

Подробнее
30-04-2015 дата публикации

Packet Classification

Номер: US20150117461A1
Принадлежит:

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a security appliance coupled to a network uses a classifier table having a plurality of rules, the plurality of rules having at least one field, to build a decision tree structure including a plurality of nodes, the plurality of nodes including a subset of the plurality of rules. The methods may produce wider, shallower trees that result in shorter search times and reduced memory requirements for storing the trees. 1. A method comprising:in a processor, building a decision tree structure including a plurality of nodes, each node representing a subset of a plurality of rules having at least one field; andfor at least one node of the decision tree, determining a number of cuts that may be made on each at least one field creating child nodes equal to the number of cuts and selecting a field on which to cut the at least one node based on a comparison of an average of a difference between an average number of rules per child node created and an actual number of rules per child node created per each at least one field.2. The method of wherein the plurality of rules are stored in a classifier table.3. The method of wherein determining the number of cuts is based on a maximum number of cuts for a given storage capacity.4. The method of wherein selecting includes selecting the field on which to cut the at least one node into a number of child nodes based on the field being a field of the at least one field with the smallest average of the difference between an average number of rules per child node and an actual number of rules per child node.5. The method of further comprising:cutting the at least one node into a number of child nodes on the selected field; andstoring the decision tree structure in a memory.6. The method of wherein cutting includes cutting the at least one node in an event the at least one node has greater than a predetermined ...

Подробнее
16-04-2020 дата публикации

MULTIMODE CRYPTOGRAPHIC PROCESSOR

Номер: US20200119903A1
Принадлежит:

This disclosure describes techniques that include performing cryptographic operations (encryption, decryption, generation of a message authentication code). Such techniques may involve the data processing unit performing any of multiple modes of encryption, decryption, and/or other cryptographic operation procedures or standards, including, Advanced Encryption Standard (AES) cryptographic operations. In some examples, the security block is implemented as a unified, multi-threaded, high-throughput encryption and decryption system for performing multiple modes of AES operations. 1. A method comprising:accessing, by a device that includes a multistage Advanced Encryption Standard (AES) pipeline configured to perform AES cryptographic operations, mode selection data;identifying, by the device and based on the mode selection data, a selected AES mode from among a plurality of AES cryptographic operation modes capable of being performed by the device;receiving, by the device, a plurality of sets of input data to be processed by a cryptographic operation associated with the selected AES mode;generating, by the device and from the plurality of sets of input data based on the selected AES mode, a plurality of sets of pipeline input data;processing, by the multistage AES pipeline using one or more cryptographic keys, each of the plurality of sets of pipeline input data concurrently using a plurality of threads to generate a plurality of sets of pipeline output data, wherein each of the plurality of sets of pipeline output data is generated by the multistage AES pipeline based on a respective one of the plurality of sets of pipeline input data; andgenerating, by the device and based on each of the plurality of sets of pipeline output data and the selected AES mode, a plurality of sets of mode output data, wherein each of the plurality of sets of mode output data corresponds to a respective one of the plurality of sets of input data after performing the cryptographic operation ...

Подробнее
31-07-2014 дата публикации

WORK MIGRATION IN A PROCESSOR

Номер: US20140215478A1
Принадлежит: Cavium, Inc.

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. A work product may be migrated between lookup engines to complete the rule matching process. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. An apparatus for processing a packet comprising:a plurality of clusters, each cluster including a plurality of processors for processing lookup requests and a local memory storing a set of rules;at least one of the plurality of clusters being configured to:generate a work product associated with one of the lookup requests, the work product corresponding to a processing of the lookup request;determine whether to forward the work product to another of the plurality of clusters; andbased on the determination, forward the work product to another of the plurality of clusters.2. The apparatus of claim 1 , further comprising a front-end configured to forward the lookup request to the plurality of clusters and receive response messages from the plurality of clusters claim 1 , the front-end including a table storing information on the set of rules at the at least one of the plurality of clusters.3. The apparatus of claim 2 , wherein the front-end is configured to forward a key request and a key identifier (KID) corresponding to the key to one of the plurality of clusters based on the table claim 2 , the key including data extracted from a packet associated with the lookup request.4. The apparatus of claim 2 , wherein the front-end is configured to forward a key to a subset of the plurality of clusters based on the table claim 2 , the key including data ...

Подробнее
21-05-2015 дата публикации

On-Chip Memory (OCM) Physical Bank Parallelism

Номер: US20150143060A1
Принадлежит:

According to an example embodiment, a processor is provided including an integrated on-chip memory device component. The on-chip memory device component includes a plurality of memory banks, and multiple logical ports, each logical port coupled to one or more of the plurality of memory banks, enabling access to multiple memory banks, among the plurality of memory banks, per clock cycle, each memory bank accessible by a single logical port per clock cycle and each logical port accessing a single memory bank per clock cycle. 1. An integrated on-chip memory device component , comprising:a plurality of memory banks; andmultiple logical ports, each logical port coupled to one or more of the plurality of memory banks, enabling access to multiple memory banks for multiple independent data access requests, each data access request including a memory address with a first set of bits indicative of a memory bank, among the plurality of memory banks, each memory bank accessible by a single logical port per clock cycle and each logical port accessing a single memory bank per clock cycle.2. An integrated on-chip memory device component as in claim 1 , wherein the memory address further includes a second set of bits indicative of a memory area within the memory bank indicated by the first set of bits.3. An integrated on-chip memory device component as in claim 1 , wherein each memory bank claim 1 , among the plurality of memory banks claim 1 , is accessible by one or more of the multiple logical ports.4. An integrated on-chip memory device component as in claim 1 , wherein each memory bank claim 1 , among the plurality of memory banks claim 1 , is accessible by each of the multiple logical ports.5. An integrated on-chip memory device component as in claim 1 , wherein data of different types are assigned storage in different memory banks among the plurality of memory banks.6. An integrated on-chip memory device component as in claim 5 , wherein assigning storage to data of ...

Подробнее
30-04-2020 дата публикации

INLINE RELIABILITY CODING FOR STORAGE ON A NETWORK

Номер: US20200133771A1
Принадлежит:

This disclosure describes a programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets. This disclosure also describes techniques that include enabling data durability coding on a network. In some examples, such techniques may involve storing data in fragments across multiple fault domains in a manner that enables efficient recovery of the data using only a subset of the data. Further, this disclosure describes techniques that include applying a unified approach to implementing a variety of durability coding schemes. In some examples, such techniques may involve implementing each of a plurality of durability coding and/or erasure coding schemes using a common matrix approach, and storing, for each durability and/or erasure coding scheme, an appropriate set of matrix coefficients. 1. A method comprising:generating from a set of data, by data durability circuitry within a data processing unit, a plurality of data fragments;storing, by the data processing unit, the plurality of data fragments, wherein at least some of the plurality of data fragments is stored within a different fault domain, wherein each fault domain includes one or more hardware subsystems not included within any other of the fault domains;receiving, by the data processing unit and from a requesting device, a request to access at least a portion of the set of data;determining, by the data processing unit, that one or more of the plurality of data fragments is not available;identifying, by the data processing unit, a plurality of available data fragments, wherein the plurality of available data fragments is a subset of the plurality of data fragments;retrieving, by the data processing unit, at least a subset of the plurality of available data fragments over a network;generating, by the data durability circuitry within the data processing unit, a reconstructed set of data from ...

Подробнее
07-05-2020 дата публикации

DATA PROCESSING UNIT HAVING HARDWARE-BASED RANGE ENCODING AND DECODING

Номер: US20200142642A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes examples of retrieving speculative probability values for range coding a plurality of bits with a single read instruction to a on-chip memory that stores a table of probability values. This disclosure also describes examples of storing state information used for context-coding packets of a data stream so that the state information is available after switching between data streams. 1. A method of context-based coding , the method comprising:generating state information for context-based coding a first set of one or more packets of an application;compressing the state information to generate compressed state information after coding the first set of one or more packets;decompressing the compressed state information to reconstruct the state information; andcontext-based coding a second set of one or more packets of the application based on the reconstructed state information.2. The method of claim 1 , wherein the compressed state information comprises a first compressed state information claim 1 , the method further comprising:during context-based coding of the second set of one or more packets, updating the reconstructed state information to generate state information for context-based coding the second set of one or more packets;compressing the state information for context-based coding the second set of one or more packets to generate a second compressed state information;storing the second compressed stated information in the memory;retrieving the second compressed state information from the memory;decompressing the second compressed state information to reconstruct the second state ...

Подробнее
07-05-2020 дата публикации

DATA PROCESSING UNIT HAVING HARDWARE-BASED RANGE ENCODING AND DECODING

Номер: US20200145020A1
Принадлежит:

A highly programmable data processing unit includes multiple processing units for processing streams of information, such as network packets or storage packets. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. The data processing unit is configured to retrieve speculative probability values for range coding a plurality of bits with a single read instruction to an on-chip memory that stores a table of probability values. The data processing unit is configured to store state information used for context-coding packets of a data stream so that the state information is available after switching between data streams. 1. A method of context-based coding , the method comprising:determining, by a range coder implemented in circuitry of a device, a first context value for a first context for a plurality of bits of a symbol to be coded, wherein the first context value for the first context is same for the plurality of bits;retrieving, by the range coder, speculative probability values associated with the first context value for the first context from a table of probability values;for one or more bits of the plurality of bits of the symbol, determining, by the range coder, respective second context values for a second context;for the one or more bits of the plurality of bits of the symbol, determining, by the range coder, respective probability values from the retrieved speculative probability values based on at least the respective second context values for the second context; andrange coding, by the range coder, the one or more bits of the plurality of bits of the symbol based on the respective determined probability values.2. The method of claim 1 , wherein a second context value for a second context for a first bit of the one or more bits is different than a second context value for a second context for a second bit of the one or more bits.3. The method of claim 1 , ...

Подробнее
07-05-2020 дата публикации

PARALLEL CODING OF SYNTAX ELEMENTS FOR JPEG ACCELERATOR

Номер: US20200145680A1
Принадлежит:

A device includes a memory configured to store image data and an image coding unit implemented in circuitry. The image coding unit is configured to code a first value of a first instance of a first syntax element of a first block of image data and determine a first context for coding a second value of a second instance of the first syntax element of a second block of the image data. The image coding unit is configured to context-based code the second value of the second instance of the first syntax element of the second block of the image data after coding the first value of the first instance of the first syntax element using the first context and code a third value of a first instance of a second syntax element of the first block in parallel with coding the second value or after coding the second value. 1. A method comprising:coding, by an image coding unit implemented in circuitry of a device, a first value of a first instance of a first syntax element of a first block of image data;determining, by the image coding unit, a first context for coding a second value of a second instance of the first syntax element of a second block of the image data;context-based coding, by the image coding unit, the second value of the second instance of the first syntax element of the second block of the image data after coding the first value of the first instance of the first syntax element using the first context; andcoding, by the image coding unit, a third value of a first instance of a second syntax element of the first block in parallel with coding the second value or after coding the second value.2. The method of claim 1 , wherein the first syntax element comprises a last-non-zero (LNZ) high syntax element claim 1 , and wherein the second syntax element comprises one or more of a non-zero alternating current (AC) high values syntax element claim 1 , an LNZ low syntax element claim 1 , an AC low coefficient map syntax element claim 1 , a non-zero AC low values syntax element ...

Подробнее
07-05-2020 дата публикации

MEMORY LAYOUT FOR JPEG ACCELERATOR

Номер: US20200145681A1
Принадлежит:

A device includes a memory configured to store image data and an image coding unit implemented in circuitry. The image coding unit is configured to store a first portion of a set of context information in memory of the image coding unit as an array representing a direct access table and store a second portion of the set of context information in a hash table. The image coding unit is further configured to determine whether a context value for context-based coding of a value of an instance of a syntax element for a block of image data is stored in the array or in the hash table, retrieve the context value from either the array or the hash table according to the determination, and context-based code the value of the instance of the syntax element using the context value. 1. A method comprising:storing, by an image coding unit implemented in circuitry of a device, a first portion of a set of context information in memory of the image coding unit as an array representing a direct access table;storing, by the image coding unit, a second portion of the set of context information in a hash table;determining, by the image coding unit, whether a context value for context-based coding of a value of an instance of a syntax element for a block of image data is stored in the array or in the hash table;retrieving, by the image coding unit, the context value from either the array or the hash table according to the determination; andcontext-based coding the value of the instance of the syntax element using the context value.2. The method of claim 1 , wherein the block comprises a first block claim 1 , wherein the value comprises a first value claim 1 , wherein the instance comprises a first instance of the syntax element claim 1 , and wherein the context value comprises a second value of a second instance of the syntax element for a second block.3. The method of claim 1 , wherein storing the second portion comprises:executing a hash function that maps a first identifier for a first ...

Подробнее
07-05-2020 дата публикации

WORK ALLOCATION FOR JPEG ACCELERATOR

Номер: US20200145682A1
Принадлежит:

A device includes a memory configured to store image data and an image coding unit. The image coding unit is configured to decode a first set of one or more bits of a first value of a first instance of a first syntax element of a block of image data, determine that the first set of one or more bits have values indicating that one or more values of respective instances of one or more other syntax elements of the block of image data are to be decoded. In response to the determination, the image coding unit is configured to decode one or more bits of the one or more values of the respective instances of the one or more other syntax elements of the block prior to decoding a second set of one or more bits of the first value of the first instance of the first syntax element. 1. A method comprising:decoding, by an image coding unit implemented in circuitry of a device, a first set of one or more bits of a first value of a first instance of a first syntax element of a block of image data;determining, by the image coding unit, that the first set of one or more bits have values indicating that one or more values of respective instances of one or more other syntax elements of the block of image data are to be decoded; andin response to the determination, decoding, by the image coding unit, one or more bits of the one or more values of the respective instances of the one or more other syntax elements of the block prior to decoding a second set of one or more bits of the first value of the first instance of the first syntax element.2. The method of claim 1 ,wherein decoding the first set of one or more bits comprises decoding, by a first decoding engine of a plurality of decoding engines of the image coding unit, the first set of one or more bits, andwherein decoding the one or more bits of the one or more values of the respective instances of the one or more other syntax elements comprises decoding, by a second decoding engine of the plurality of decoding engines, the one or ...

Подробнее
21-05-2020 дата публикации

SERVICE CHAINING HARDWARE ACCELERATORS WITHIN A DATA STREAM PROCESSING INTEGRATED CIRCUIT

Номер: US20200159568A1
Принадлежит:

This disclosure describes techniques that include establishing a service chain of operations that are performed on a network packet as a sequence of operations. In one example, this disclosure describes a method that includes storing, by a data processing unit integrated circuit, a plurality of work unit frames in a work unit stack representing a plurality of service chain operations, including a first service chain operation, a second service chain operation, and a third service chain operation; executing, by the data processing unit integrated circuit, the first service chain operation, wherein executing the first service chain operation generates operation data; determining, by the data processing unit integrated circuit and based on the operation data, whether to perform the second service chain operation; and executing, by the data processing unit integrated circuit, the third service chain operation after skipping the second service chain operation. 1. A data processing unit integrated circuit comprising:a plurality of processing cores, each of the cores configured to execute one or more of a plurality of software work unit handlers;an accelerator unit, implemented in circuitry, configured to execute one or more data processing operations; anda memory configured to store a plurality of work units arranged as a work unit stack, each of the work units associated with a network packet, each work unit specifying one of the plurality of software work unit handlers for processing the network packet and specifying one of the cores for executing the specified software work unit handler, and at least one of the plurality of work units specifying one of the data processing operations to be performed by the accelerator unit, wherein the work unit stack specifies a set of service chain operations to be performed on the network packet, and wherein the set of operations include:processing the network packet by the plurality of software work unit handlers, andperforming the ...

Подробнее
21-05-2020 дата публикации

MATCHING TECHNIQUES IN DATA COMPRESSION ACCELERATOR OF A DATA PROCESSING UNIT

Номер: US20200159840A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a hardware-based programmable data compression accelerator for the data processing unit including a pipeline for performing string substitution. The disclosed string substitution pipeline, referred to herein as a “search block,” is configured to perform string search and replacement functions to compress an input data stream. In some examples, the search block is a part of a compression process performed by the data compression accelerator. The search block may support single and multi-thread processing, and multiple levels of compression effort. In order to achieve high-throughput, the search block processes multiple input bytes per clock cycle per thread. 1. A method comprising:receiving, by a match block of a search engine of a processing device, one or more history addresses of potential previous occurrences of a current byte string beginning at a current byte position in an input data stream;determining, by the match block, whether at least one forward match occurs between the current byte position of the current byte string and the history addresses of one or more previous occurrences of byte strings, the forward match including subsequent byte positions in a forward direction of the input data stream, wherein the history addresses comprise byte positions of the previous occurrences of byte strings stored in a history buffer;determining, by the match block, whether at least one backward match occurs between the current byte position of the current byte string and the history addresses of the one or more previous occurrences of byte strings, the backward match including preceding ...

Подробнее
21-05-2020 дата публикации

Merging techniques in data compression accelerator of a data processing unit

Номер: US20200162584A1
Принадлежит: Fungible Inc

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a hardware-based programmable data compression accelerator for the data processing unit including a pipeline for performing string substitution. The disclosed string substitution pipeline, referred to herein as a “search block,” is configured to perform string search and replacement functions to compress an input data stream. In some examples, the search block is a part of a compression process performed by the data compression accelerator. The search block may support single and multi-thread processing, and multiple levels of compression effort. In order to achieve high-throughput, the search block processes multiple input bytes per clock cycle per thread.

Подробнее
02-07-2015 дата публикации

METHOD AND SYSTEM FOR SKIPPING OVER GROUP(S) OF RULES BASED ON SKIP GROUP RULE

Номер: US20150186781A1
Принадлежит:

A method and corresponding system for providing a skip group rule feature is disclosed. When a search for a key matches a skip group rule in a group of prioritized rules, the search skips over rules having priorities lower than the skip group rule and the search continues to a next group. A convenient example of a compiler rewrites the lower priority rules by subtracting the skip group rule from them. The subtraction includes subtracting range, exact-match, mask, and prefix fields. The rewritten rules appear to a search processor as typical rules. Beneficially, the search processor requires no additional logic to process a skip group rule, skip over lower priority rules, and go on to search a next group of rules. Advantageously, this approach enables any number of skip group rules to be defined allowing for better classification of network data. 1. A method for forcing a search processor to skip over rules within a group of rules , the method comprising:in a complier provided with a set of rules for matching a key, the set of rules divided into groups, each group being prioritized with respect to each other and each rule within each group being prioritized with respect to each other, and the set of rules including at least one skip group rule:rewriting rules belonging to a same group as the skip group rule and having priorities lower than the skip group rule, the lower priority rules being rewritten based on the skip group rule such that in response to matching a key to the skip group rule, a search processor skips over the skip group rule and the lower priority rules; andproviding the rewritten rules to the search processor.2. The method of wherein rewriting the lower priority rules includes subtracting the skip group rule from each of the lower priority rules claim 1 , each lower priority rule being rewritten with a respective subtracted rule as one or more rewritten rules.3. The method of wherein each rule includes at least one field and wherein subtracting ...

Подробнее
18-09-2014 дата публикации

PACKET EXTRACTION OPTIMIZATION IN A NETWORK PROCESSOR

Номер: US20140269718A1
Принадлежит:

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. A lookup front-end receives lookup requests from a host, and processes these lookup requests to generate key requests for forwarding to the lookup engines. Based on information in the packet, the lookup front-end can optimize start times for sending key requests as a continuous stream with minimal delay. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. A method of processing a packet comprising:receiving a first segment of a packet;determining, based on the first segment, beats of a second segment of the packet containing portions of a key;determining a start time at which the key may begin to be forwarded as a continuous stream, the start time being based on a prediction of when the portions of the key in the second segment will be received; andinitiating forwarding the key at the start time to a processing cluster configured to operate rule matching for the packet.2. The method of claim 1 , further comprising completing forwarding the key at an end time occurring after receipt of all of the portions of the key in the second segment.3. The method of claim 1 , wherein the key is a first key claim 1 , and further comprising:determining, based on the first segment, beats of a second segment of the packet containing portions of a second key;determining a start time at which the second key may begin to be forwarded as a continuous stream, the start time being based on a prediction of when the portions of the second key in the second segment will be received; andinitiating forwarding the second key at the start time to a processing cluster configured to operate rule matching for the packet.4. The method of claim 3 , wherein the ...

Подробнее
09-07-2015 дата публикации

LOOKUP CLUSTER COMPLEX

Номер: US20150195200A1
Принадлежит:

A packet processor provides for rule matching of packets in a network architecture. The packet processor includes a lookup cluster complex having a number of lookup engines and respective on-chip memory units. The on-chip memory stores rules for matching against packet data. Each of the lookup engines receives a key request associated with a packet and determines a subset of the rules to match against the packet data. As a result of the rule matching, the lookup engine returns a response message indicating whether a match is found. 1. A method comprising:receiving a request associated with a packet, the request including an identifier (ID);selecting at least one entry in a table indicated by the ID, the entry providing a starting address associated with a set of rules;determining a subset of rules based on at least one field of the request, the subset of rules being a portion of the set of rules;applying the at least one field against the subset of rules; andoutputting a response signal indicating whether the at least one field matches at least one rule of the subset of rules.2. The method of claim 1 , wherein the request includes a key format table index.3. The method of claim 2 , further comprising parsing the request based on the key format table index.4. The method of claim 1 , wherein the set of rules is a portion of rules stored in a memory.5. The method of claim 1 , wherein determining a subset of rules includes determining at least one bucket claim 1 , the at least one bucket including pointers to the subset of rules.6. The method of claim 5 , wherein the at least one bucket includes a plurality of buckets claim 5 , and wherein the entry includes a node associated with the plurality of buckets.7. The method of claim 6 , wherein processing the entry includes processing the node to determine the plurality of buckets.8. The method of claim 6 , wherein the node is associated with the plurality of buckets via at least one intermediary node.9. The method of claim ...

Подробнее
09-07-2015 дата публикации

PROCESSING REQUEST KEYS BASED ON A KEY SIZE SUPPORTED BY UNDERLYING PROCESSING ELEMENTS

Номер: US20150195262A1
Принадлежит: Cavium, Inc.

A packet classification system, methods, and apparatus are provided for packet classification. A processor of a router coupled to a network processes data packets received from a network. The processor creates a request key using information extracted from a packet. The processor splits the request key into an n number of partial request keys if at least one predetermined criterion is met. The processor also sends a non-final request that includes an i-th partial request key to a corresponding search table of an n number of search tables, wherein i1 and each of the n number of partial request keys is associated with a distinct set of information extracted from the packet;sending a non-final request that includes an i-th partial request key to a corresponding search table of an n number of search tables, wherein i Подробнее

18-09-2014 дата публикации

Scheduling Method and Apparatus for Scheduling Rule Matching in a Processor

Номер: US20140279805A1
Принадлежит: Cavium, Inc.

In a network search processor, configured to handle search requests in a router, a scheduler for scheduling rule matching threads initiated by a plurality of initiating engines is designed to make efficient use of the resources in the network search processor while providing high speed performance. According to at least one example embodiment, the scheduler and a corresponding scheduling method comprise: determining a set of bundles of rule matching threads, each bundle being initiated by a separate initiating engine; distributing rule matching threads in each bundle into a number of subgroups of rule matching threads; assigning the subgroups of rule matching threads associated with each bundle of the set of bundles to multiple scheduling queues; and sending rule matching threads, assigned to each scheduling queue, to rule matching engines according to an order based on priorities associated with the respective bundles of rule matching threads. 1. A method of scheduling rule matching threads initiated by a plurality of initiating engines in a network search processor , for processing by multiple matching engines of the network search processor , the method comprising:determining, by a scheduler, a set of bundles of rule matching threads, each bundle being initiated by a separate initiating engine;distributing rule matching threads in each bundle of the set of bundles into a number of subgroups of rule matching threads;assigning the subgroups of rule matching threads associated with each bundle of the set of bundles to multiple scheduling queues; andsending rule matching threads, assigned to each scheduling queue, toward rule matching engines according to an order based on priorities associated with the respective bundles of rule matching threads.2. A method according to further comprising receiving claim 1 , at the scheduler claim 1 , data corresponding to one or more bundles of rule matching threads from one or more initiating engines of the plurality of initiating ...

Подробнее
18-09-2014 дата публикации

METHOD AND AN ACCUMULATOR SCOREBOARD FOR OUT-OF-ORDER RULE RESPONSE HANDLING

Номер: US20140279806A1
Принадлежит: Cavium, Inc.

According to at least one example embodiment, a method and a corresponding accumulator scoreboard for managing bundles of rule matching threads processed by one or more rule matching engines comprise: recording, for each rule matching thread in a given bundle of rule matching threads, a rule matching result in association with a priority corresponding to the respective rule matching thread; determining a final rule matching result, for the given bundle of rule matching threads, based at least in part on the corresponding indications of priorities; and generating a response state indicative of the determined final rule matching result for reporting to a host processor or a requesting processing engine. 1. A method of managing bundles of rule matching threads processed by one or more rule matching engines in a search processor , the method comprising:recording, for each rule matching thread in a given bundle of rule matching threads, a rule matching result in association with a priority corresponding to the respective rule matching thread;determining a final rule matching result, for the given bundle of rule matching threads, based at least in part on the priorities corresponding to the rule matching threads in the given bundle; andgenerating a response state indicative of the determined final rule matching result for reporting to a host processor.2. A method according to further comprising recording claim 1 , for each rule matching thread in the given bundle of rule matching threads claim 1 , an identification of a rule walking engine initiating the rule matching thread.3. A method according to further comprising recording a count of the number of rule matching threads in the given bundle of rule matching threads.4. A method according to claim 1 , wherein determining a final rule matching result includes:maintaining an indication of expected priorities associated with rule matching threads in the given bundle; anddetermining whether rule matching results are recorded ...

Подробнее
18-09-2014 дата публикации

BATCH INCREMENTAL UPDATE

Номер: US20140279850A1
Принадлежит: Cavium, Inc.

A system, apparatus, and method are provided for adding, deleting, and modifying rules in one update from the perspective of an active search process for packet classification. While a search processor searches for one or more rules that match keys generated from received packets, there is a need to add, delete, or modify rules. By organizing a plurality incremental updates for adding, deleting, or modifying rules into a batch update, several operations for incorporating the incremental updates may be made more efficient by minimizing a number of updates required. 1. A method comprising:receiving a batch update including a plurality of incremental updates for a Rule Compiled Data Structure (RCDS) representing a decision tree for a set of rules used for packet classification, the RCDS being utilized for packet classification by an active search process;updating the set of rules and one or more rule lists based on the batch update received, each of the one or more rule lists being a subset of the set of rules associated with a category subtree of a housekeeping tree, the housekeeping tree being an augmented representation of the RCDS;updating the housekeeping tree associated with each category subtree associated with the one or more rule lists updated and building a changeset specifying one or more modifications to the RCDS based on the housekeeping tree updated; andapplying the changeset built to the RCDS in a manner enabling the RCDS to atomically incorporate the plurality of incremental updates from the perspective of the active search process utilizing the RCDS.2. The method of wherein a number of operations triggered by the plurality of incremental updates included in the batch update received is less than another number of operations triggered by the plurality of incremental updates received on an incremental basis.3. The method of wherein applying the changeset built to the RCDS includes issuing a number of instructions to a device and the number of ...

Подробнее
18-09-2014 дата публикации

NSP Manager

Номер: US20140280357A1
Принадлежит: Cavium, Inc.

In an embodiment, a method of updating a memory with a plurality of memory lines, the memory storing a tree, a plurality of buckets, and a plurality of rules, can include maintaining a copy of the memory with a plurality of memory lines. The method can further include writing a plurality of changes to at least one of the tree, the plurality of buckets, and the plurality of rules to the copy. The method can additionally include determining whether each of the plurality of changes is an independent write or a dependent write. The method can further include merging independent writes to the same line of the copy. The method further includes transferring updates from the plurality of lines of the copy to the plurality of lines of the memory. 1. A method of managing a database including a tree , a plurality of buckets , and a plurality of rules , the method comprising:providing a memory with a plurality of cluster memories, each cluster memory having a plurality of banks and a plurality of access ports, the memory storing the database across the plurality of cluster memories; andpacking nodes of the tree in each of the plurality of cluster memories such that walking the tree accesses a minimal amount of cluster memories in the memory and walking the tree accesses each particular cluster memory no more than once.2. The method of claim 1 , further comprising packing a first particular number of bucket chunks per bucket and a second particular number of rule pointers per bucket chunk based on addresses of the rules.3. The method of claim 1 , further comprising allocating the rules in the memory in a same order as an order of the rules in bucket chunks of the buckets.4. The method of claim 1 , further comprising:replicating a rule or a chunk of rules across a first and second bank in a particular cluster memory such that the rule or chunk of rules can be accessed on the second bank when the first bank has a memory access conflict during a particular clock cycle.5. The method ...

Подробнее
18-09-2014 дата публикации

Merging Independent Writes, Separating Dependent And Independent Writes, And Error Roll Back

Номер: US20140281809A1
Принадлежит: Cavium, Inc.

In an embodiment, a method of updating a memory with a plurality of memory lines, the memory storing a tree, a plurality of buckets, and a plurality of rules, can include maintaining a copy of the memory with a plurality of memory lines. The method can further include writing a plurality of changes to at least one of the tree, the plurality of buckets, and the plurality of rules to the copy. The method can additionally include determining whether each of the plurality of changes is an independent write or a dependent write. The method can further include merging independent writes to the same line of the copy. The method further includes transferring updates from the plurality of lines of the copy to the plurality of lines of the memory. 1. A method of updating a memory with a plurality of memory lines , the memory storing a tree , a plurality of buckets , and a plurality of rules , the method comprising:maintaining a copy of the memory with a plurality of memory lines;writing a plurality of changes to at least one of the tree, the plurality of buckets, and the plurality of rules to the copy;determining whether each of the plurality of changes is an independent write or a dependent write;merging independent writes to the same line of the copy in a single line write; andtransferring updates from the plurality of lines of the copy to the plurality of lines of the memory.2. The method of claim 1 , further comprising:transferring the merged independent writes to lines of the memory; andupon successful transfer of the merged independent writes to the lines of the memory, transferring the dependent writes to the lines of the memory.3. The method of claim 1 , further comprising:determining whether writing changes caused an error; andreversing changes to the copy if the changes did cause an error;wherein transferring updates includes transferring updates from the copy to the memory if the changes did not cause an error.4. The method of claim 3 , wherein the error indicates ...

Подробнее
16-07-2015 дата публикации

BLOCK MASK REGISTER

Номер: US20150201047A1
Принадлежит: Cavium, Inc.

A packet classification system, methods, and corresponding apparatus are provided for enabling packet classification. A processor of a routing appliance coupled to a network compiles data structures to process keys associated with a particular block mask register (BMR) of a plurality of BMRs. For each BMR of the plurality of BMRs, the processor identifies at least one of or a combination of: i) at least a portion of a field of a plurality of rules and ii) a subset of fields of the plurality of fields to be masked. The processor also builds at least one data structure used to traverse a plurality of rules based on the identified at least one of or a combination of: i) at least a portion of a field of a plurality of rules and ii) a subset of fields of the plurality of fields to be masked. 1. A method , executed by one or more processors , for compiling data structures to process keys associated with a particular block mask register (BMR) of a plurality of BMRs , the method comprising: identifying at least one of or a combination of: i) at least a portion of a field of a plurality of rules and ii) a subset of fields of the plurality of fields to be masked; and', 'building at least one data structure used to traverse a plurality of rules based on the identified at least one of or a combination of: i) at least a portion of a field of a plurality of rules and ii) a subset of fields of the plurality of fields to be masked., 'for each BMR of the plurality of BMRs2. The method of further comprising: 'creating a new rule set from the plurality of rules, wherein the identified at least one of or a combination of: i) the at least a portion of a field of the plurality of rules and ii) the subset of fields of the plurality of fields to be masked is masked from the new rule set.', 'for each BMR of the plurality of BMRs3. The method of further comprising:mapping the new rule set to a corresponding data structure traversable on a plurality of rules only on the identified at least ...

Подробнее
06-08-2015 дата публикации

Method And Apparatus For Optimizing Finite Automata Processing

Номер: US20150220845A1
Принадлежит: Cavium LLC

A method, and corresponding apparatus and system are provided for optimizing matching at least one regular expression pattern in an input stream by walking at least one finite automaton in a speculative manner. The speculative manner may include iteratively walking at least two nodes of a given finite automaton, of the at least one finite automaton, in parallel, with a segment, at a current offset within a payload, of a packet in the input stream, based on positively matching the segment at a given node of the at least two nodes walked in parallel, the current offset being updated to a next offset per iteration.

Подробнее
10-08-2017 дата публикации

METHOD AND APPARATUS FOR VIRTUALIZATION

Номер: US20170228183A1
Принадлежит:

A virtual system on chip (VSoC) is an implementation of a machine that allows for sharing of underlying physical machine resources between different virtual systems. A method or corresponding apparatus of the present invention relates to a device that includes a plurality of virtual systems on chip and a configuring unit. The configuring unit is arranged to configure resources on the device for the plurality of virtual systems on chip as a function of an identification tag assigned to each virtual system on chip. 1. A device comprising:a plurality of virtual systems on chip, each virtual system on chip (VSoC) relating to a subset of a plurality of resources on a single physical chip, the plurality of resources including a memory; and 'set an access control element of a plurality of access control elements to control whether a given VSoC of the plurality of virtual systems on chip is enabled to access the given memory, wherein a granularity of memory protection provided is based on a memory size of the memory and a total element number of the plurality of access control elements.', 'a configuring unit on the single physical chip, the configuring unit arranged to2. The device of claim 1 , wherein the configuring unit is further arranged to:assign a unique identification tag of a plurality of identification tags to each VSoC;assign each memory subset a given identification tag of the plurality of identification tags, the given identification tag assigned to a corresponding VSoC to which the memory subset is assigned; andprovide the granularity of memory protection further based on a number of the plurality of identification tags assigned.3. The device of claim 1 , wherein:the total element number of the plurality of access control elements is static;the programmed memory size of the given memory is variable.4. The device of claim 2 , wherein the number of the plurality of identification tags assigned is variable.5. The device of claim 1 , wherein the given memory is a ...

Подробнее
30-10-2014 дата публикации

Intelligent Graph Walking

Номер: US20140324900A1
Принадлежит:

An apparatus, and corresponding method, for performing a search for a match of at least one expression in an input stream is presented. A graph including a number of interconnected nodes is generated. A compiler may assign at least one starting node and at least one ending node. The starting node includes a location table with node position information of an ending node and a sub-string value associated with the ending node. Using the node position information and a string comparison function, intermediate nodes located between the starting and ending nodes may be bypassed. The node bypassing may reduce the number of memory accesses required to read the graph. 1. A method comprising:generating a graph including a plurality of interconnected nodes in a device operatively coupled to a network, the plurality of interconnected nodes including at least one starting node and a plurality of ending nodes, the at least one starting node associated with a comparison command and a location table including multiple entries, each entry of the multiple entries including node position information of a respective ending node of the plurality of ending nodes and a location table string value of a sub-string between the at least one starting node and the respective ending node; andemploying the comparison command to compare at least one location table string value of the multiple entries with an input sub-string value from an input stream to detect a common sub-string of at least one expression matching in the input stream, the input stream received from the network via a hardware interface of the device.2. The method of claim 1 , wherein employing is based on positively matching a given segment from the input stream at the at least one starting node and identifying the at least one starting node as a given node of the plurality of interconnected nodes associated with a corresponding location table.3. The method of claim 1 , further comprising determining a first length of the input ...

Подробнее
16-07-2020 дата публикации

DATA PROCESSING UNIT HAVING HARDWARE-BASED PARALLEL VARIABLE-LENGTH CODEWORD DECODING

Номер: US20200228148A1
Принадлежит:

A highly programmable device, referred to generally as a data processing unit, having multiple processing units for processing streams of information, such as network packets or storage packets, is described. The data processing unit includes one or more specialized hardware accelerators configured to perform acceleration for various data-processing functions. This disclosure describes a parallel decoding of codewords within input data stream based on a codeword type and position. 1. A method of codeword processing , the method comprising:identifying, with a plurality of detector circuits operating in parallel starting from respective bits of sequential bits of an input data stream, one or more codewords within the input data stream;determining, with the plurality of detector circuits operating in parallel, type values for the one or more identified codewords;determining, with a plurality of fuser circuits, whether codewords identified from one or more of the plurality of detector circuits are combinable with codewords identified by other ones of the plurality of detector circuits to generate a plurality of super codewords based on respective type values of the codewords;generating, with one or more of the fuser circuits, the plurality of super codewords based on the determination of whether the codewords are combinable; andprefix-free, variable length decoding the input data stream based on one or more of the generated super codewords.2. The method of claim 1 , further comprising:determining, with a selector circuit, a current decoding position in the input data stream, wherein a start of a first super codeword is at the current decoding position, and wherein a start of a second super codeword is within the first super codeword;determining, with the selector circuit, at least one of a size of the first super codeword or a size of the first super codeword and one or more super codewords combined with the first super codeword;outputting, with the selector circuit, at ...

Подробнее
25-08-2016 дата публикации

SYSTEM AND METHOD FOR RULE MATCHING IN A PROCESSOR

Номер: US20160248739A1
Принадлежит:

In one embodiment, a system includes a format block configured to receive a key, at least one rule, and rule formatting information. The rule can have one or more dimensions. The format block can be further configured to extract each of the dimensions from the at least one rule. The system can further include a plurality of dimension matching engines (DME). Each DME can be configured to receive the key and a corresponding formatted dimension, and process the key and the corresponding dimension for returning a match or nomatch. The system can further include a post processing block configured to analyze the matches or no matches returned from the DMEs and return a response based on the returned matches or nomatches. 1. A system comprising:a format block configured to (a) receive a key including one or more bits from a packet, at least one rule for matching the key, and rule formatting information, the at least one rule having at least one rule dimension, the at least one rule dimension including a set of one or more bits from a corresponding rule of the at least one rule, and (b) extract each at least one rule dimension from the at least one rule; anda plurality of dimension matching engines (DMEs), each DME, of the plurality of DMEs, coupled to the format block and configured to receive the key and a corresponding formatted dimension, and process the key and the corresponding formatted dimension for returning a match or nomatch.2. The system of claim 1 , wherein the matches or no matches returned from the plurality of DMEs enables an analysis of the matches or no matches and return of a response based on the returned matches or nomatches.3. The system of claim 1 , wherein the format block includes:a start block configured to find rule starts, mark invalid or deactivated rules, and pre-calculate terms of the at least one rule dimension;a middle block configured to remove marked rules, extract rule format from headers, and extract priority from headers;a tween block ...

Подробнее
09-09-2021 дата публикации

Multimode cryptographic processor

Номер: US20210281394A1
Принадлежит: Fungible Inc

This disclosure describes techniques that include performing cryptographic operations (encryption, decryption, generation of a message authentication code). Such techniques may involve the data processing unit performing any of multiple modes of encryption, decryption, and/or other cryptographic operation procedures or standards, including, Advanced Encryption Standard (AES) cryptographic operations. In some examples, the security block is implemented as a unified, multi-threaded, high-throughput encryption and decryption system for performing multiple modes of AES operations.

Подробнее
18-12-2014 дата публикации

System and Method to Provide Non-Coherent Access to a Coherent Memory System

Номер: US20140372709A1
Принадлежит:

In one embodiment, a system comprises a memory and a memory controller that provides a cache access path to the memory and a bypass-cache access path to the memory, receives requests to read graph data from the memory on the bypass-cache access path and receives requests to read non-graph data from the memory on the cache access path. A method comprises receiving a request at a memory controller to read graph data from a memory on a bypass-cache access path, receiving a request at the memory controller to read non-graph data from the memory through a cache access path, and arbitrating, in the memory controller, among the requests using arbitration. 1. A system comprising:a memory;a memory controller providing a cache access path to the memory and a bypass-cache access path to the memory, the memory controller receiving requests to access finite automata (FA) data at the memory on the bypass-cache access path and receiving requests to access non-FA data at the memory on the cache access path.2. The system of wherein the memory controller receives requests to access FA data and non-FA data at the memory on the cache access path.3. The system of wherein the non-FA data comprises packet data.4. The system of wherein the memory stores FA data and non-FA data.5. The system of wherein the memory controller reads the requested FA data or non-FA data.6. The system of wherein the memory controller receives the requests to access FA data from a co-processor.7. The system of wherein the co-processor includes at least one of a deterministic automata processing unit and a nondeterministic automata processing unit.8. The system of wherein the co-processor is configured to stop sending access requests to the memory controller to stop the access of selected FA data from the memory when the selected FA data is being written to the memory on the cache access path.9. The system of wherein the memory controller receives requests to access FA data and non-FA data from a cache controller. ...

Подробнее
15-10-2015 дата публикации

Processing Of Finite Automata Based On Memory Hierarchy

Номер: US20150293846A1
Принадлежит: Cavium, Inc.

At least one processor may be operatively coupled to a plurality of memories and a node cache and configured to walk nodes of a per-pattern non-deterministic finite automaton (NFA). Nodes of the per-pattern NFA may be stored amongst one or more of the plurality of memories based on a node distribution determined as a function of hierarchical levels mapped to the plurality of memories and per-pattern NFA storage allocation settings configured for the hierarchical levels, optimizing run time performance of the walk. 1. A security appliance operatively coupled to a network , the security appliance comprising:a plurality of memories configured to store nodes of at least one finite automaton, the at least one finite automaton including a given per-pattern non-deterministic finite automaton (NFA) of at least one per-pattern NFA, the given per-pattern NFA generated for a respective regular expression pattern and including a respective set of nodes; andat least one processor operatively coupled to the plurality of memories and configured to walk nodes of the respective set of nodes with segments of a payload of an input stream to match the respective regular expression pattern in the input stream, the respective set of nodes stored amongst one or more memories of the plurality of memories based on a node distribution determined as a function of hierarchical levels mapped to the plurality of memories and per-pattern NFA storage allocation settings configured for the hierarchical levels.2. The security appliance of claim 1 , wherein the respective set of nodes of the given per-pattern NFA are statically stored amongst the one or more memories of the plurality of memories based on the node distribution.3. The security appliance of claim 1 , wherein to walk nodes of the respective set of nodes with segments of the payload of the input stream claim 1 , the at least one processor is further configured to walk from a given node to a next node of the respective set of nodes based ...

Подробнее