Поиск патентов

Настройки

Глубина выборки

Укажите год

Небесная энциклопедия

Космические корабли и станции, автоматические КА и методы их проектирования, бортовые комплексы управления, системы и средства жизнеобеспечения, особенности технологии производства ракетно-космических систем

Подробнее

Мониторинг СМИ

Мониторинг СМИ и социальных сетей. Сканирование интернета, новостных сайтов, специализированных контентных площадок на базе мессенджеров. Гибкие настройки фильтров и первоначальных источников.

Подробнее

Форма поиска

Ключевые слова. Может быть несколько по одной на строку

Поддерживает ввод нескольких поисковых фраз (по одной на строку). При поиске обеспечивает поддержку морфологии русского и английского языка

Автор

Ведите корректный номера.

Владелец

Ведите корректный номера.

Классы IPC

Ведите корректный номера.

Классы CPC

Ведите корректный номера.

Начиная с года

Укажите год

Заканчивая годом

Укажите год

Применить Всего найдено 58. Отображено 58.

12-04-2012 дата публикации

Efficient implementation of arrays of structures on simt and simd architectures

Номер: US20120089792A1

Автор: Brett W. Coon, Brian Fahs, Henry Packard Moreton, John R. Nickolls, Kathleen Elliott Nickolls

Принадлежит: Linares Medical Devices LLC, Nvidia Corp

One embodiment of the present invention sets forth a technique providing an optimized way to allocate and access memory across a plurality of thread/data lanes. Specifically, the device driver receives an instruction targeted to a memory set up as an array of structures of arrays. The device driver computes an address within the memory using information about the number of thread/data lanes and parameters from the instruction itself. The result is a memory allocation and access approach where the device driver properly computes the target address in the memory. Advantageously, processing efficiency is improved where memory in a parallel processing subsystem is internally stored and accessed as an array of structures of arrays, proportional to the SIMT/SIMD group width (the number of threads or lanes per execution group).

Подробнее

Номер записи: 1

30-08-2012 дата публикации

Programmable graphics processor for multithreaded execution of programs

Номер: US20120218267A1

Автор: Brett W. Coon, John Erik Lindholm, Matthew P. Gerlach, Ming Y. Siu, Stuart F. Oberman

Принадлежит: Individual

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Подробнее

Номер записи: 2

30-05-2013 дата публикации

INDIRECT FUNCTION CALL INSTRUCTIONS IN A SYNCHRONOUS PARALLEL THREAD PROCESSOR

Номер: US20130138926A1

Автор: Coon Brett W., Lindholm John Erik, Mills Peter C., Nickolls John R., Nyland Lars

Принадлежит: NVIDIA CORPORATION

An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches. 1. A system for executing indirect function calls for synchronous parallel processing threads , the system comprising:an execution stack configured to store thread state information for a number of threads that are concurrently executed by the system; receive program instructions including control instructions;', 'execute the control instructions by pushing and popping the thread state information;', 'maintain an active mask that indicates active threads in a thread group that should be processed in parallel; and', 'serialize execution of a plurality of indirect function calls for each unique pointer within a set of pointers that corresponds to any of the active threads; and, 'a controller that is coupled to the execution stack and configured tomultiple processing engines that are configured to receive the program instructions and execute each program instruction in parallel for the threads in the thread group that should be processed in parallel according to the active mask.2. The system of claim 1 , wherein the controller is further configured to:receive a first control instruction that references the set of pointers to one or more functions in a program, each pointer in the set of pointers specifying an address of a corresponding function in the one or more functions;determine if two pointers in the set of pointers corresponding to active threads in the thread group are different, indicating that the active threads diverge during execution of the indirect function calls;push a first token onto the execution stack when the active threads diverge, the token ...

Подробнее

Номер записи: 3

16-01-2014 дата публикации

COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Номер: US20140019724A1

Автор: Coon Brett W., FAHS Brian, Nickolls John R., Nyland Lars, SIU Ming Y.

Принадлежит: NVIDIA CORPORATION

One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread. 1. A method for performing a scan operation across multiple threads , the method comprising:receiving a barrier instruction that specifies the scan operation for execution by a first thread of the multiple threads;combining a value associated with the first thread with an scan result for the multiple threads;communicating the scan result to the first thread; andcausing another instruction to be executed without waiting until the barrier instruction is received by a second thread of the multiple threads.2. The method of claim 1 , further comprising the steps of:determining that the second thread is the last thread of the multiple threads to receive the barrier instruction; andinitializing the scan result.3. The method of claim 1 , wherein the communication of the scan result to the first thread occurs before the value associated with the first thread is combined with the scan result.4. The method of claim 1 , wherein the communication of the scan result to the first thread occurs after the value associated with the first thread is combined with the scan result.5. The method of claim 1 , ...

Подробнее

Номер записи: 4

25-09-2014 дата публикации

PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Номер: US20140285500A1

Автор: Coon Brett W., GERLACH Matthew P., Lindholm John Erik, OBERMAN Stuart F., SIU Ming Y.

Принадлежит: NVIDIA CORPORATION

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer. 1. A system , comprising:a processor unit; and 'execution pipeline that is configured to execute a plurality of threads to process vertex data and to execute a plurality of threads to process fragment data.', 'a graphics processing unit that includes at least one'}2. The system of claim 1 , further comprising a memory that stores vertex program instructions that are executed by two or more threads in the at least one execution pipeline.3. The system of claim 2 , wherein the memory further stores fragment program instructions that are executed by two or more threads in the at least one execution pipeline.4. The system of claim 2 , wherein the graphics processing unit further includes a texture unit configured to read texture maps from the memory.5. The system of claim 1 , wherein claim 1 , during a first pass through the at least one execution pipeline claim 1 , a first plurality of threads is configured to perform vertex processing operations on vertex data.6. The system of claim 1 , wherein claim 1 , during a second pass through the at least one execution pipeline claim 1 , a second plurality of threads is configured to perform fragment processing operations on fragment data.7. The system of claim 1 , wherein the graphics processing unit further includes a raster unit configured to output fragment data and to perform scan conversion operations.8. The system of claim 1 , wherein the graphics processing unit further includes a raster ...

Подробнее

Номер записи: 5

17-08-2017 дата публикации

INSTRUCTIONS FOR MANAGING A PARALLEL CACHE HIERARCHY

Номер: US20170235581A1

Автор: Coon Brett W., Nickolls John R., Shebanow Michael C.

Принадлежит:

A technique for managing a parallel cache hierarchy that includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier. 1. A method , comprising:receiving an instruction from a scheduler unit, wherein the instruction comprises a load instruction or a store instruction and is associated with an address that identifies a memory region;determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of a parallel cache hierarchy; andexecuting the instruction and caching the data associated with the instruction in accordance with the policy identified by the cache operations modifier.2. The method of claim 1 , wherein the parallel cache hierarchy includes an L1 cache level and an L2 cache level3. The method of claim 2 , wherein each L1 cache included in the L1 cache level is associated with a different processing core included in a processor claim 2 , and the L2 cache level includes at least one L2 cache that each processing core is configured to access.4. A system claim 2 , comprising:a memory; and receives an instruction from a scheduler unit, wherein the instruction comprises a load instruction or a store instruction and is associated with an address that identifies a memory region,', 'determines that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of a parallel cache hierarchy; and', 'executes the instruction and caching the data associated with the instruction in accordance with the ...

Подробнее

Номер записи: 6

07-09-2017 дата публикации

PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Номер: US20170256022A1

Автор: Coon Brett W., GERLACH Matthew P., Lindholm John Erik, OBERMAN Stuart F., SIU Ming Y.

Принадлежит:

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer. 1. A system , comprising:a first processor; anda graphics processing unit that includes at least one execution pipeline, wherein the at least one execution pipeline executes a plurality of threads to process vertex data and executes a plurality of threads to process fragment data, and wherein the at least one execution pipeline is programmable to process vertex data as well as fragment data.2. The system of claim 1 , further comprising a memory that stores vertex program instructions that are executed by two or more threads in the at least one execution pipeline.3. The system of claim 2 , wherein the memory further stores fragment program instructions that are executed by two or more threads in the at least one execution pipeline.4. The system of claim 2 , wherein the graphics processing unit further includes a texture unit that reads texture maps from the memory.5. The system of claim 1 , wherein claim 1 , during a first pass through the at least one execution pipeline claim 1 , a first plurality of threads performs vertex processing operations on vertex data.6. The system of claim 5 , wherein claim 5 , during a subsequent pass through the at least one execution pipeline claim 5 , a second plurality of threads performs fragment processing operations on fragment data.7. The system of claim 1 , wherein the graphics processing unit further includes a rasterizer that outputs fragment data and performs scan conversion operations.8. The system ...

Подробнее

Номер записи: 7

13-10-2016 дата публикации

PROGRAMMABLE GRAPHICS PROCESSOR FOR MULTITHREADED EXECUTION OF PROGRAMS

Номер: US20160300319A9

Автор: Coon Brett W., GERLACH Matthew P., Lindholm John Erik, OBERMAN Stuart F., SIU Ming Y.

Принадлежит: NVIDIA CORPORATION

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer. 1. A system , comprising:a processor unit; and 'execution pipeline that is configured to execute a plurality of threads to process vertex data and to execute a plurality of threads to process fragment data.', 'a graphics processing unit that includes at least one'}2. The system of claim 1 , further comprising a memory that stores vertex program instructions that are executed by two or more threads in the at least one execution pipeline.3. The system of claim 2 , wherein the memory further stores fragment program instructions that are executed by two or more threads in the at least one execution pipeline.4. The system of claim 2 , wherein the graphics processing unit further includes a texture unit configured to read texture maps from the memory.5. The system of claim 1 , wherein claim 1 , during a first pass through the at least one execution pipeline claim 1 , a first plurality of threads is configured to perform vertex processing operations on vertex data.6. The system of claim 1 , wherein claim 1 , during a second pass through the at least one execution pipeline claim 1 , a second plurality of threads is configured to perform fragment processing operations on fragment data.7. The system of claim 1 , wherein the graphics processing unit further includes a raster unit configured to output fragment data and to perform scan conversion operations.8. The system of claim 1 , wherein the graphics processing unit further includes a raster ...

Подробнее

Номер записи: 8

08-12-2016 дата публикации

COOPERATIVE THREAD ARRAY REDUCTION AND SCAN OPERATIONS

Номер: US20160357560A1

Автор: Coon Brett W., FAHS Brian, Nickolls John R., Nyland Lars, SIU Ming Y.

Принадлежит:

One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread. 1. A subsystem for performing an aggregation operation across multiple threads , the subsystem comprising: receive a barrier aggregation instruction that specifies the aggregation operation for execution by a first thread of the multiple threads;', 'generate an aggregation result based on the aggregation operation;', 'communicate the aggregation result to each thread of the multiple threads; and', 'waiting until each of the multiple threads has received the barrier aggregation instruction before any additional instructions are executed., 'a barrier instruction execution unit that is configured to2. The subsystem of claim 1 , wherein the barrier aggregation instruction specifies a number of the multiple threads that are synchronized by the barrier aggregation instruction.3. The subsystem of claim 2 , wherein each of the threads has received the barrier aggregation instruction when a count equals the number of the multiple threads specified by the barrier aggregation instruction.4. The subsystem of claim 1 , wherein the first thread is included in a first thread group claim 1 , and the ...

Подробнее

Номер записи: 9

25-01-2011 дата публикации

Structured programming control flow in a SIMD architecture

Номер: US7877585B1

Автор: Brett W. Coon, John Erik Lindholm, John R. Nickolls, Svetoslav D. Tzvetkov

Принадлежит: Nvidia Corp

One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. A disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture. Additional control instructions are used to set up thread processing target addresses for synchronization, breaks, and returns.

Подробнее

Номер записи: 10

08-11-2011 дата публикации

Lock mechanism to enable atomic updates to shared memory

Номер: US8055856B2

Автор: Brett W. Coon, John R. Nickolls, Lars Nyland, Peter C. Mills

Принадлежит: Nvidia Corp

A system and method for locking and unlocking access to a shared memory for atomic operations provides immediate feedback indicating whether or not the lock was successful. Read data is returned to the requestor with the lock status. The lock status may be changed concurrently when locking during a read or unlocking during a write. Therefore, it is not necessary to check the lock status as a separate transaction prior to or during a read-modify-write operation. Additionally, a lock or unlock may be explicitly specified for each atomic memory operation. Therefore, lock operations are not performed for operations that do not modify the contents of a memory location.

Подробнее

Номер записи: 11

09-05-2000 дата публикации

Mousepad telephone

Номер: US6061446A

Автор: David Iglehart, Larry W. Coons, Leland Lester, Michael Duncan

Принадлежит: Siemens Information and Communication Networks Inc

An improved mousepad (100) includes a telephone integrated therewith. A keypad (108) for the telephone is disposed beneath a mylar layer (130). The keypad (108) includes a plurality of capacitive switches (108a, 108b) disposed just beneath the mylar layer. In a mousepad mode, a mouse may be moved around the surface of the mylar in the standard fashion. In a telephone mode, the user may activate the keypad (108) by pressing the buttons which are viewable beneath the mylar.

Подробнее

Номер записи: 12

12-04-2011 дата публикации

Maximized memory throughput using cooperative thread arrays

Номер: US7925860B1

Автор: Brett W. Coon, Norbert Juffa

Принадлежит: Nvidia Corp

In parallel processing devices, for streaming computations, processing of each data element of the stream may not be computationally intensive and thus processing may take relatively small amounts of time to compute as compared to memory accesses times required to read the stream and write the results. Therefore, memory throughput often limits the performance of the streaming computation. Generally stated, provided are methods for achieving improved, optimized, or ultimately, maximized memory throughput in such memory-throughput-limited streaming computations. Streaming computation performance is maximized by improving the aggregate memory throughput across the plurality of processing elements and threads. High aggregate memory throughput is achieved by balancing processing loads between threads and groups of threads and a hardware memory interface coupled to the parallel processing devices.

Подробнее

Номер записи: 13

05-10-2010 дата публикации

Generating event signals for performance register control using non-operative instructions

Номер: US7809928B1

Автор: Brett W. Coon, Ian A. Buck, John R. Nickolls, Roger L. Allen

Принадлежит: Nvidia Corp

One embodiment of an instruction decoder includes an instruction parser configured to process a first non-operative instruction and to generate a first event signal corresponding to the first non-operative instruction, and a first event multiplexer configured to receive the first event signal from the instruction parser, to select the first event signal from one or more event signals and to transmit the first event signal to an event logic block. The instruction decoder may be implemented in a multithreaded processing unit, such as a shader unit, and the occurrences of the first event signal may be tracked when one or more threads are executed within the processing unit. The resulting event signal count may provide a designer with a better understanding of the behavior of a program, such as a shader program, executed within the processing unit, thereby facilitating overall processing unit and program design.

Подробнее

Номер записи: 14

18-09-2012 дата публикации

Unified addressing and instructions for accessing parallel memory spaces

Номер: US8271763B2

Автор: Brett W. Coon, Ian A. Buck, John R. Nickolls, Robert Steven Glanville

Принадлежит: Nvidia Corp

One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

Подробнее

Номер записи: 15

07-07-2010 дата публикации

Indirect function call instructions in a synchronous parallel thread processor

Номер: GB2458556B

Автор: Brett W Coon, John Erik Lindholm, John R Nickolls, Lars Nyland, Peter C Mills

Принадлежит: Nvidia Corp

Подробнее

Номер записи: 16

10-11-2009 дата публикации

Structured programming control flow using a disable mask in a SIMD architecture

Номер: US7617384B1

Автор: Brett W. Coon, John Erik Lindholm, Svetoslav D. Tzvetkov

Принадлежит: Nvidia Corp

One embodiment of a computing system configured to manage divergent threads in a SIMD thread group includes a stack configured to store state information for processing control instructions. A parallel processing unit is configured to perform the steps of determining if one or more threads diverge during execution of a conditional control instruction. Threads that exit a program are identified as idle by a disable mask. Other threads that are disabled may be enabled once the divergent threads reach an instruction that enables the disabled threads. Use of the disable mask allows for the use of conditional return and break instructions in a multithreaded SIMD architecture.

Подробнее

Номер записи: 17

04-03-1971 дата публикации

DESULFURATION OF HEAVY HYDROCARBON OILS

Номер: BE755778A

Автор: G C Wray, G V Nelson, W R Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 18

25-11-2008 дата публикации

Register based queuing for texture requests

Номер: US7456835B2

Автор: Brett W. Coon, John Erik Lindholm, John R. Nickolls, Simon S. Moy

Принадлежит: Nvidia Corp

A graphics processing unit can queue a large number of texture requests to balance out the variability of texture requests without the need for a large texture request buffer. A dedicated texture request buffer queues the relatively small texture commands and parameters. Additionally, for each queued texture command, an associated set of texture arguments, which are typically much larger than the texture command, are stored in a general purpose register. The texture unit retrieves texture commands from the texture request buffer and then fetches the associated texture arguments from the appropriate general purpose register. The texture arguments may be stored in the general purpose register designated as the destination of the final texture value computed by the texture unit. Because the destination register must be allocated for the final texture value as texture commands are queued, storing the texture arguments in this register does not consume any additional registers.

Подробнее

Номер записи: 19

31-03-2011 дата публикации

Configurable cache for multiple clients

Номер: US20110078367A1

Автор: Alexander L. Minkin, Anjana Rajendran, Brett W. Coon, Charles McCarver, RaJeshwaren Selvanesan, Steven James HEINRICH, Stewart G. Carlton

Принадлежит: Nvidia Corp

One embodiment of the present invention sets forth a technique for providing a L1 cache that is a central storage resource. The L1 cache services multiple clients with diverse latency and bandwidth requirements. The L1 cache may be reconfigured to create multiple storage spaces enabling the L1 cache may replace dedicated buffers, caches, and FIFOs in previous architectures. A “direct mapped” storage region that is configured within the L1 cache may replace dedicated buffers, FIFOs, and interface paths, allowing clients of the L1 cache to exchange attribute and primitive data. The direct mapped storage region may used as a global register file. A “local and global cache” storage region configured within the L1 cache may be used to support load/store memory requests to multiple spaces. These spaces include global, local, and call-return stack (CRS) memory.

Подробнее

Номер записи: 20

31-03-2011 дата публикации

Support for Non-Local Returns in Parallel Thread SIMD Engine

Номер: US20110078418A1

Автор: Brett W. Coon, Guillermo Juan Rozas

Принадлежит: Nvidia Corp

One embodiment of the present invention sets forth a method for executing a non-local return instruction in a parallel thread processor. The method comprises the steps of receiving, within the thread group, a first long jump instruction and, in response, popping a first token from the execution stack. The method also comprises determining whether the first token is a first long jump token that was pushed onto the execution stack when a first push instruction associated with the first long jump instruction was executed, and when the first token is the first long jump token, jumping to the second instruction based on the address specified by the first long jump token, or, when the first token is not the first long jump token, disabling the active thread until the first long jump token is popped from the execution stack.

Подробнее

Номер записи: 21

24-03-2011 дата публикации

Instructions for managing a parallel cache hierarchy

Номер: US20110072213A1

Автор: Brett W. Coon, John R. Nickolls, Michael C. Shebanow

Принадлежит: Nvidia Corp

A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Подробнее

Номер записи: 22

01-04-2008 дата публикации

System and method for managing divergent threads in a SIMD architecture

Номер: US7353369B1

Автор: Brett W. Coon, John Erik Lindholm

Принадлежит: Nvidia Corp

One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is not a branch instruction, determining whether the program instruction includes a pop-synchronization bit, and updating an active program counter, where the fashion in which the active program counter is updated relates to whether the program instruction includes a pop-synchronization bit.

Подробнее

Номер записи: 23

24-03-2011 дата публикации

Credit-Based Streaming Multiprocessor Warp Scheduling

Номер: US20110072244A1

Автор: Brett W. Coon, Jered Wierzbicki, John Erik Lindholm, Robert J. Stoll, Stuart F. Oberman

Принадлежит: Nvidia Corp

One embodiment of the present invention sets forth a technique for ensuring cache access instructions are scheduled for execution in a multi-threaded system to improve cache locality and system performance. A credit-based technique may be used to control instruction by instruction scheduling for each warp in a group so that the group of warps is processed uniformly. A credit is computed for each warp and the credit contributes to a weight for each warp. The weight is used to select instructions for the warps that are issued for execution.

Подробнее

Номер записи: 24

07-08-2012 дата публикации

Hierarchical processor array

Номер: US8237705B2

Автор: Brett W. Coon, David Kirk, Emmett M. Kilgariff, John Danskin, John Erik Lindholm, John S. Montrym, Sean Jeffrey Treichler, Simon S. Moy

Принадлежит: Nvidia Corp

Apparatuses and methods are presented for a hierarchical processor. The processor comprises, at a first level of hierarchy, a plurality of similarly structured first level components, wherein each of the plurality of similarly structured first level components includes at least one combined function module capable of performing multiple classes of graphics operations, each of the multiple classes of graphics operations being associated with a different stage of graphics processing. The processor comprises, at a second level of hierarchy, a plurality of similarly structured second level components positioned within each one of the plurality of similarly structured first level components, wherein each of the plurality of similarly structured second level components is capable of carrying out different operations from the multiple classes of graphics operations, wherein each first level component is adapted to distribute work to the plurality of similarly structured second level components positioned within the first level component.

Подробнее

Номер записи: 25

06-06-2012 дата публикации

Instructions for managing a parallel cache hierarchy

Номер: GB2486132A

Автор: Brett W Coon, John R Nickolls, Michael C Shebanow

Принадлежит: Nvidia Corp

A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Подробнее

Номер записи: 26

23-07-1974 дата публикации

Hydrocracking effluent cooling prior to hydrodesulfurization

Номер: US3825485A

Автор: G Nelson, J Colvert, W Coons

Принадлежит: Texaco Inc

A RESIDUE-CONTAINING PETROLEUM FRACTION, 50 PERCENT BOILING ABOVE 1,000*F., IS HYDROCRACKED AND THEN HYDRODESULFURIZED. DURING THE ONSTREAM PERIOD, THE CONVERSION LEVEL OF MATERIAL BOILING ABOVE 1,000*F. IS MAINTAINED SUBSTANTIALLY CONSTANT WHILE THE SULFUR CONTENT OF THE HYDROCRACKED EFFLUENT INCREASES. THE SULFUR CONTENT OF A PRODUCT FRACTTION IS MAINTAINED SUBSTANTIALLY CONSTANT BY GRADUALLY INCREASING THE HYDROSULFURIZATION TEMPERATURE. THE EFFLUENT FROM THE HYDROCRACKING REACTTION IS COOLED BY ADDING AN AROMATIC-RICH FRACTION AND THE COOLED MIXTURE IS PASSED TO THE DESULFURIZATION REACTION.

Подробнее

Номер записи: 27

01-08-2012 дата публикации

Unified addressing and instructions for accessing parallel memory spaces

Номер: EP2480985A1

Автор: Brett W. Coon, Ian A. Buck, John R. Nickolls, Robert Steven Glanville

Принадлежит: Nvidia Corp

One embodiment of the present invention sets forth a technique for unifying the addressing of multiple distinct parallel memory spaces into a single address space for a thread. A unified memory space address is converted into an address that accesses one of the parallel memory spaces for that thread. A single type of load or store instruction may be used that specifies the unified memory space address for a thread instead of using a different type of load or store instruction to access each of the distinct parallel memory spaces.

Подробнее

Номер записи: 28

21-05-1968 дата публикации

Vibrating element electrometer with output signal magnified over input signal by a function of the mechanical q of the vibrating element

Номер: US3384820A

Автор: Grant W Coon, John Dimeff

Принадлежит: National Aeronautics and Space Administration NASA

Подробнее

Номер записи: 29

19-01-1976 дата публикации

PROCEDURE FOR HYDRO-DULVULATION OF HEAVY CARBONHYDRIDOLS

Номер: DK131685C

Автор: G C Wray, G V Nelson, W R Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 30

04-03-2014 дата публикации

Systems and method for managing divergent threads in a SIMD architecture

Номер: US8667256B1

Автор: Brett W. Coon, John Erik Lindholm

Принадлежит: Nvidia Corp

One embodiment of a computing system configured to manage divergent threads in a thread group includes a stack configured to store at least one token and a multithreaded processing unit. The multithreaded processing unit is configured to perform the steps of fetching a program instruction, determining that the program instruction is a branch instruction, determining that the program instruction is not a return or break instruction, determining whether the program instruction includes a set-synchronization bit, and updating an active program counter, where the manner in which the active program counter is updated depends on a branch instruction type.

Подробнее

Номер записи: 31

16-12-2007 дата публикации

System and method for grouping execution threads

Номер: TW200745953A

Автор: Brett W Coon, John Erik Lindholm

Принадлежит: Nvidia Corp

Подробнее

Номер записи: 32

02-05-1893 дата публикации

Design for a top

Номер: USD22378S

Автор: George W. Coon

Принадлежит:

Подробнее

Номер записи: 33

10-10-1905 дата публикации

Garden tool

Номер: CA95477A

Автор: James Anderson, Newton W. Coon

Принадлежит: Individual

Подробнее

Номер записи: 34

04-03-1974 дата публикации

[UNK]

Номер: NO129150B

Автор: G Nelson, G Wray, W Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 35

09-10-1973 дата публикации

Magnetic shielding and x-ray image intensifier tube using same

Номер: US3764841A

Автор: W Coon

Принадлежит: Individual

An x-ray image intensifier tube is evacuated by means of an appendage magnetically confined glow discharge getter-ion vacuum pump. A shell of soft iron is disposed surrounding the magnet of the appendage pump for shielding the image intensifier tube from the stray magnetic field of the appendage pump. A coating of magnetic material on the envelope of the intensifier tube, in the region facing the pump, provides additional magnetic shielding for shielding the interior of the tube from the stray magnetic field of the pump to further improve the resolution of the image intensifier tube.

Подробнее

Номер записи: 36

30-08-2012 дата публикации

Shared single-access memory with management of multiple parallel requests

Номер: US20120221808A1

Автор: Brett W. Coon, John R. Nickolls, Ming Y. Siu, Peter C. Mills, Stuart F. Oberman, Weizhong Xu

Принадлежит: Nvidia Corp

A memory is used by concurrent threads in a multithreaded processor. Any addressable storage location is accessible by any of the concurrent threads, but only one location at a time is accessible. The memory is coupled to parallel processing engines that generate a group of parallel memory access requests, each specifying a target address that might be the same or different for different requests. Serialization logic selects one of the target addresses and determines which of the requests specify the selected target address. All such requests are allowed to proceed in parallel, while other requests are deferred. Deferred requests may be regenerated and processed through the serialization logic so that a group of requests can be satisfied by accessing each different target address in the group exactly once.

Подробнее

Номер записи: 37

06-11-1900 дата публикации

Fire-extinguisher.

Номер: US661442A

Автор: Gustavus W Coon

Принадлежит: JESSE D KINGSBERY

Подробнее

Номер записи: 38

13-03-1906 дата публикации

Screen for potato diggers

Номер: CA98019A

Автор: Horace W. Coon

Принадлежит: Individual

Подробнее

Номер записи: 39

26-11-1946 дата публикации

Electric iron

Номер: CA438149A

Автор: W. Coons William

Принадлежит: Individual

Подробнее

Номер записи: 40

18-12-1972 дата публикации

[UNK]

Номер: SE352100B

Автор: G Nelson, G Wray, W Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 41

12-04-1976 дата публикации

Fremgangsmade til regenerering af afsvovlningskatalysatorer

Номер: DK132288C

Автор: G C Wray, G V Nelson, W R Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 42

30-11-1926 дата публикации

Safety razor drier

Номер: CA266209A

Автор: W. Coons James

Принадлежит: Individual

Подробнее

Номер записи: 43

16-02-2008 дата публикации

System and method for processing thread groups in a SIMD architecture

Номер: TW200809615A

Автор: Brett W Coon, John Erik Lindholm

Принадлежит: Nvidia Corp

Подробнее

Номер записи: 44

15-09-1942 дата публикации

Dealkylation

Номер: US2295672A

Автор: Kenneth W Coons, Virgil E Meharg

Принадлежит: Bakelite Corp

Подробнее

Номер записи: 45

21-06-2007 дата публикации

Ｓｉｍｄアーキテクチャ内でスレッドグループを処理するためのシステムおよび方法

Номер: JP2007157154A

Автор: Brett W Coon, John E Lindholm, エリックリンドルムジョン, ダブリュー．クーンブレット

Принадлежит: Nvidia Corp

【課題】そのハードウェアリソースを効率的に用いてより高いデータ処理スループットを達成するＳＭＩＤプロセッサを提供すること。【解決手段】ＳＩＭＤプロセッサの有効幅は、データ処理側の速度の何分の１かの速度でＳＩＭＤ処理装置の命令処理側をクロッキングし、各々が複数のデータ経路を有する複数の実行パイプラインを提供することにより、拡張される。そのため、より高いデータ処理スループットが達成されると同時に、命令はフェッチされ、クロックごとに一度発行される。この構成はまた、１つの大きなスレッドグループがＳＩＭＤプロセッサを介してクラスタされ、一緒に実行されることも可能にし、その結果、グラフィックス処理に関して実行されるテクスチャメモリアクセスのようなあるタイプの動作に関してより大きなメモリ効率が達成されることが可能である。【選択図】図６

Подробнее

Номер записи: 46

01-10-2010 дата публикации

System and method for processing thread groups in a simd architecture

Номер: TWI331300B

Автор: Brett W Coon, John Erik Lindholm

Принадлежит: Nvidia Corp

Подробнее

Номер записи: 47

28-12-1915 дата публикации

Automobile-radiator.

Номер: US1166327A

Автор: Stanley W Coon

Принадлежит: Individual

Подробнее

Номер записи: 48

25-11-1974 дата публикации

[UNK]

Номер: SE371649B

Автор: G Nelson, G Wray, W Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 49

04-01-1973 дата публикации

Processo de hidro-desulfurizacao de fracoes de petroleo

Номер: BR6804766D0

Автор: G Nelson, G Wray, W Coons

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 50

18-08-1975 дата публикации

Fremgangsmåde til hydroafsvovlning af svære carbonhydridolier.

Номер: DK131685B

Автор: G C Wray, G V Nelson, W R Coons Jr

Принадлежит: Texaco Development Corp

Подробнее

Номер записи: 51

15-09-1942 дата публикации

Dealkylation using nickel sulphide catalyst

Номер: US2295673A

Автор: Kenneth W Coons, Virgil E Mcharg

Принадлежит: Bakelite Corp

Подробнее

Номер записи: 52

20-04-1926 дата публикации

Device for drying safety razors

Номер: US1581426A

Автор: James W Coons

Принадлежит: Individual

Подробнее

Номер записи: 53

24-04-2018 дата публикации

Cache operations and policies for a multi-threaded client

Номер: US09952977B2

Автор: Alexander L. Minkin, Anjana Rajendran, Brett W. Coon, Brian Fahs, Charles McCarver, John R. Nickolls, Rajeshwaran Selvanesan, Robert Steven Glanville, Steven James HEINRICH, Stewart Glenn Carlton

Принадлежит: Nvidia Corp

A method for managing a parallel cache hierarchy in a processing unit. The method including receiving an instruction that includes a cache operations modifier that identifies a level of the parallel cache hierarchy in which to cache data associated with the instruction; and implementing a cache replacement policy based on the cache operations modifier.

Подробнее

Номер записи: 54

28-11-2017 дата публикации

Cooperative thread array reduction and scan operations

Номер: US09830197B2

Автор: Brett W. Coon, Brian Fahs, John R. Nickolls, Lars Nyland, Ming Y Siu

Принадлежит: Nvidia Corp

One embodiment of the present invention sets forth a technique for performing aggregation operations across multiple threads that execute independently. Aggregation is specified as part of a barrier synchronization or barrier arrival instruction, where in addition to performing the barrier synchronization or arrival, the instruction aggregates (using reduction or scan operations) values supplied by each thread. When a thread executes the barrier aggregation instruction the thread contributes to a scan or reduction result, and waits to execute any more instructions until after all of the threads have executed the barrier aggregation instruction. A reduction result is communicated to each thread after all of the threads have executed the barrier aggregation instruction and a scan result is communicated to each thread as the barrier aggregation instruction is executed by the thread.

Подробнее

Номер записи: 55

23-05-2017 дата публикации

Programmable graphics processor for multithreaded execution of programs

Номер: US09659339B2

Автор: Brett W. Coon, John Erik Lindholm, Matthew P. Gerlach, Ming Y. Siu, Stuart F. Oberman

Принадлежит: Nvidia Corp

A processing unit includes multiple execution pipelines, each of which is coupled to a first input section for receiving input data for pixel processing and a second input section for receiving input data for vertex processing and to a first output section for storing processed pixel data and a second output section for storing processed vertex data. The processed vertex data is rasterized and scan converted into pixel data that is used as the input data for pixel processing. The processed pixel data is output to a raster analyzer.

Подробнее

Номер записи: 56

02-05-2017 дата публикации

Instructions for managing a parallel cache hierarchy

Номер: US09639479B2

Автор: Brett W. Coon, John R. Nickolls, Michael C. Shebanow

Принадлежит: Nvidia Corp

A method for managing a parallel cache hierarchy in a processing unit. The method includes receiving an instruction from a scheduler unit, where the instruction comprises a load instruction or a store instruction; determining that the instruction includes a cache operations modifier that identifies a policy for caching data associated with the instruction at one or more levels of the parallel cache hierarchy; and executing the instruction and caching the data associated with the instruction based on the cache operations modifier.

Подробнее

Номер записи: 57

02-05-2017 дата публикации

Indirect function call instructions in a synchronous parallel thread processor

Номер: US09639365B2

Автор: Brett W. Coon, John Erik Lindholm, John R. Nickolls, Lars Nyland, Peter C. Mills

Принадлежит: Nvidia Corp

An indirect branch instruction takes an address register as an argument in order to provide indirect function call capability for single-instruction multiple-thread (SIMT) processor architectures. The indirect branch instruction is used to implement indirect function calls, virtual function calls, and switch statements to improve processing performance compared with using sequential chains of tests and branches.

Подробнее

Номер записи: 58