가상화 클러스터 환경에서 네임 노드를 할당하는 방법 및 장치

06-09-2016 дата публикации
Номер:
KR0101654969B1
Контакты:
Номер заявки: 00-14-102014684
Дата заявки: 10-02-2014

[1]

Relates to a technique to process ways of distributing a data large scale the present invention refers to, least one different tray the presence of node name optimum in cluster environment virtualization a selects a node name relates to device and method.

[2]

Recent computer and fully utilize the hardware resource of the virtual technique is introduced there is a trend used, easily manufacture, . Virtual shell eggs physical resources several virtual resource into to the abstraction, each virtual resource processings for different with a different operating body the d-number determination multiplexed and efficient machine selects one of printers of number provisioning. under public affairs. The deficiency in addition particular virtual server and isolate the other virtual server number novel services to a integer greater than one can be under public affairs solubility and system for a can be reliability.

[3]

While large scale displayed recent, in particular in the fields of database for processing data parallel, dispersion database. brisk a higher melting. I.e., number system core manipulator parallelism of determination hardware under public affairs for maximizing software technology and exclusively on the basis of virtual and together parallel techniques are: a recent DB are defined..

[4]

Parallel DB pair of including a key and a value data technique (key-value pair) is composed of a, the data processing performance on the first dielectric. A typical example is (Google File system) GFS of google Bigtable and, and Hbase and Hadoop battalion's source disclosure, is Hive, programming model include color-reduce, DryadLINQ, use an such as MPI.

[5]

[...] (Hadoop) that allow a large amount of data can be treated distributed application program of a computer to operate in a cluster on Java software framework as for supporting, greatly [...] distributed file system by map deuce (HDFS: HADOOP Distributed File System) and made of a (color-reduce).

[6]

[...] distributed file system by (HDFS) of the existing method according to node name when build a cluster Hadoop (Name Node) only one node is only the various door number point reference temperature. For example, name node of memory to file on an node name set of two request group directory and number one and, because the one node name receives an input file is about number to throughput, all users/one application program a door must using node name is connected to the semiconductor layer. point number.

[7]

In addition, plurality of independency, the, which is a collection of node name Federation node name (NameNode Federation) that is fixed by attaching the Velcro tape applied [...] to general outline, heterogeneous computing result of calculation is displayed a in a virtual machine environment when applying a Federation node name, each name node performance of and are different from each other, thereby work as existing method (job) assigned to nodes with namespaces, due to unbalance on performance on outer surface of the can which has a weight corresponding to weight point number door.

[8]

Said addressed number door such as for the purpose of the invention, node name in cluster environment virtualization assigning job to. under public affairs number of method.

[9]

Said door such as number addressed for it is another object of the present invention, large parallel distributed processing a system. under public affairs number.

[10]

According to an embodiment of the present invention opens the folder of his/her said virtualization in cluster environment the method for assigning node name, work (job) a television, at least two included in cluster virtualization name of nodes considering processing speed performance or workings name to at least two task nodes assign name a comprises a step of determining a node.

[11]

Here, said virtualization cluster includes, multiple physical machine of present in each one of the many cyber nodes can be constructed into a set of.

[12]

Here, at least two said name each of the nodes, one for each multiple physical machines. may be positioned.

[13]

Here, the method allocation node name said, [...] distributed file system by (HDFS: HADOOP Distributed File System) can be applied based on.

[14]

Here, said work assign a determining a node name, name of nodes at least two in the bit lines node name two selects (randomly), two selected and operation performance of node name is selected by considering the two nodes in the name of operating in the node name assign. can be determined.

[15]

Said other purposes for achieving large scale according to an embodiment of the present invention a system for processing parallel distributed data, at least two included in cluster virtualization name centering on a user such that the node is constituted of a set of nodes and Federation, Federation node name at least two included in or working performance of node name (job) considering speed, a process for the name to at least two task nodes assign number node name determining a node name includes fishermen.

[16]

Said virtualization the present invention according to such as node name in cluster environment and large method for assigning a system to carry out the parallel distributed data by using a plurality of name node that utilize a virtualization optimum in cluster environment effectively node name may be selected.

[17]

In addition, performance of node name according to the present invention a check for and operation while minimizing to performance is maintained for a consistent..

[18]

Also disclosed Figure 1 shows a general outline to explain the file system is. Multiple of the present invention in the embodiment also Figure 2 shows a node name account cluster is applied for is general outline. Also according to an embodiment of the present invention Figure 3 shows a large scale a system to carry out the parallel distributed data to explain the general outline is. Also Figure 4 shows a node name in cluster environment virtualization according to an embodiment of the present invention for assigning is flow to explain the method. Figure 5 shows a multiple of the present invention in the embodiment also applied with node name are assigned work in a cluster on virtualization node name account the hollow is graphs for. Virtualization Figure 6 according to an embodiment of the present invention for assigning node name in cluster environment in the event that the vehicle should applying method, the same time in work is graph for comparing a voltage obtained.

[19]

Various modification of the present invention refers to various that can apply in the embodiment may have a low bar, specific in the embodiment illustrated to drawing defined in the description are disclosed and. rapidly and to reduce a memory. However, the present invention with a particular embodiment of the physical shape not defined to be, included within the scope of the present invention all changing a concept and techniques, including replacement water and equalization should understood. Each drawing while describes similar references in a similar was to use components.

[20]

Number 1, number 2, A, a set of terms, such as B describes various elements which may be used; however, said components are said terms is don't is defined by. Said terms are components of one an object from other components is carried out by using an acidulous only. For example, without a wireless type through a wire rights of the present invention number 1 number 2 component can be designated components, similarly number 2 number 1 component elements can be designated. And/or a substrate having a number of associated term of items combination or plurality of associated a substrate of items includes which item.

[21]

To other components is any component "is connected" know "is connected" when-mentioned that, that different structural elements thereof connected directly to or may be is connected, intermediate the other components may be present that. to be understood. While, to other components is any component "is directly connected with the" know "are directly connected" when-mentioned that, intermediate the other components there is no will should be understood.

[22]

The present application only a term use in a particular in the embodiment used to describe the to be, is not intending to be defining the present invention. Contextually representation a plurality of differently it is apparent that without the carelessly, includes multiple representations. In the present application, "comprising" or "having ." a set of terms, such as a specification to the features, number, step, operation, components, component or a combination of these is designates the feature to which is present does, number to execute another aspect of one or more, step, operation, components, component or a combination of these existence of back pre possibility or additionally not in the number should understood.

[23]

Other is not defined, technical or scientific for a term including to the all terms are person with skill in the art in the present invention is in the field of the upwardly urged by equivalent to those that would have been understood have the meanings of wet liquid to flow down. Generally are defined as the dictionary used for, such as terms are on wherein the nodes refining the context of related techniques consistent semantics and having having the meanings must be interpreted to, the present application, become manifest in a do not define, excessively or is ideal for the widest sense of the formal does not interpreted.

[24]

First, in the present invention using. off at the first and the second briefly terms.

[25]

[...] distributed file system by its cost is inexpensive (HDFS) as a mass hardware, and signals occurrence of failure before because the system sends the data to the PDA part number, several file always a program stored in the host is copying and stores the so as to be decentralized in the files. In addition information about the contents, and location file copy of a general game, is distributed storage is composed of. Accelerating and angular velocity file thus dispersed computer various information is stored at many is on search time may include is that simultaneously searches where the one place of the has is not concentrated.

[26]

(Color-reduce) map deuce an efficient, data processing for various using computer technology and exclusively on the basis of distributed data processing, first, map (Map) step large scale data various computer so as to be decentralized in the data the new processed in parallel for wherein the emission of electrons (intermediate result), deuce (Reduce) step thus resulting intermediate with the optical coupled resultant obtain desired results are produced.

[27]

Hereinafter, the present invention according to preferred embodiment a drawing with an. rapidly and to reduce a memory reference to.

[28]

Also disclosed Figure 1 shows a general outline to explain the file system is.

[29]

First, of the present invention ensures that a more in the liquid chamber of the existing method to a distributed file systems are described.

[30]

Also 1 with a, [...] distributed file system by a master node (HDFS) name node (110) and a plurality of data node (120 - 1 to 120-n) consists of including. The HDFS multiple files to support the larger files management soluble stores a copy has chip drives. Data node (120 - 1 to 120-n) and stores a copy file, name node (110) file metadata (meta data) for storing of wet liquid to flow down.

[31]

Specifically, name node (110) file system name space while managing client (130) treating a an access request to file from. HDFS divided into unit data in the exposure step several data node (120 - 1 to 120-n) are dispersed in are stored. Wherein, client (130) different types of user terminal can be means.

[32]

Data node (120 - 1 to 120-n) the client (130) from data input/output processes the request, name node (110) has a data node (120 - 1 to 120-n) (heartbit) heartbeat from periodically under data node (120 - 1 to 120-n) can be checking a state of an.

[33]

Availability of the data HDFS to ensure after dividing the relational data on block basis, again data node (120 - 1 to 120-n) carboxyl to (Replication) in a distributed environment with stores the number. Each other on one is two n data made a complete copy of the entire nodes in the distributed storage a technique only only one content completely in-the-that is, the original data even survive can be degraded and restore the pressure, in order to achieve a high soluble carboxyl number is middle of storage is increased.

[34]

Just, one 1 according to HDFS also the agent node name segmentation processing of parallel distributed data according to various door number point can be generated the a specified negative.

[35]

Multiple of the present invention in the embodiment also Figure 2 shows a node name account cluster is applied for is general outline.

[36]

Also refers to surface 2, one cluster (20) to several independent name nodes (210 - 1, .. 210-K, .. 210-n) were present and they such name nodes (210 - 1, .. 210-K, .. 210-n) the each unique name space (NS: NameSpace) material may have a.

[37]

In addition, block pools (Block Pools) (230) the cluster (20) entire for blocks of each name nodes information center through a wire/wireless (210 - 1, .. 210-K, .. 210-n) (Block Pools) such disassembling of the whole block pools (230) mark up portion corresponding to a specific may have. For example, NameNode 1 (210 - 1) the Pool 1 (230 - 1) having, NameNode k (210-K) the Pool k (230-K) has a, NameNode n (210-n) the Pool n (230-n) can take the.

[38]

For example, cluster (20) the N name two nodes (210 - 1, .. 210-K, .. 210-n) may have, each name node (210 - 1, .. 210-K, .. 210-n) has name space material may have a (namespace). In addition cluster (20) the M data nodes (220 - 1, .. 220-K, .. 220-n) may have, data nodes (220 - 1, .. 220-K, .. 220-n) the name can cooperate KIPO & the nodes.

[39]

Wherein, namespace (NS: NameSpace) the file (file) and directory (directory) generation region by using a transparent transfer (create), number ablation (delete), modified (modify) and list (list) and capable of supporting such as general outline region in can be means.

[40]

In addition, block storage (Block Storage) has a data node a member of a easy (membership) manages, block (block) generation region by using a transparent transfer (create), ablation (modify) (delete) and modification such as number and at the base plate to be general outline region in can be means. Wherein, block (block) of a number of data nodes distributed storage receives the divided of data can be mixture by the addition of an initiator unit.

[41]

While, plurality of independency name set of node "Federation node name (Name Node Federation)" can be designated.

[42]

Also according to an embodiment of the present invention Figure 3 shows a large scale a system to carry out the parallel distributed data to explain the general outline is.

[43]

Federation node name (Name Node Federation) (310) is established only node physical clusters are Hadoop converted into the cluster environment when with, virtual node Hadoop virtualization with door on performance in cluster environment can be is disclosed point number.

[44]

I.e., the performance of virtual node physical node affects the performance of most since the, virtualization Hadoop cluster includes each physical node depending upon the capabilities of the performance of virtual nodes is cluster of environment having different protocols in other.

[45]

For example, name node Federation (310) to be arranged, when (job) of operating in the, all name node (311) be equal to the performance of home head boom, a wire rope holder. may be placed (job).

[46]

A result of calculation is displayed computing heterogeneous however in a virtual machine environment Federation node name (310) node name each when applying a the performance of existing, owing to different work as method (job) assigned to nodes with namespaces, due to unbalance on performance on outer surface of the can is so that, the Wis.

[47]

The large scale according to an embodiment of the present invention a system for processing parallel distributed data virtualization the work node name plurality in cluster environment to effectively allocate (job) describes techniques for the. under public affairs number.

[48]

Large scale a system for processing parallel distributed data Federation node name (310) and drained number node name (330) can be constructed including.

[49]

Federation node name (310) the virtualization at least two included in cluster name nodes can be constructed into a set of.

[50]

Wherein, virtualization of machine multiple physical clusters present in each one of the many cyber nodes may be configured into a set of, each other in the time order of node name one for each physical machines. may be positioned.

[51]

Determinator number node name (330) the name node Federation (310) at least two included in or working performance of node name (job) considering speed, a process for the name to at least two task nodes assign name node. can be determined.

[52]

Specifically, at least two two in name name node of nodes (randomly) selects the bit lines, two selected and operation performance of node name is selected by considering the two nodes in the name of operating in the node name assign. can be determined.

[53]

Also reference to 3, (HDFS) [...] distributed file system by based on large for parallel processing. off at the first and the second systems are described.

[54]

In Figure 3 each physical machine (30) one [...] name node (311) and a plurality of [...] data node (321) can be constructed including. In addition, each physical machine (30) included in node name [...] (311) into a set of Federation node name (310) can construct.

[55]

Determinator number node name (330) the name node Federation (310) node name [...] included in (311) to the number can be and manage.

[56]

Node name that can be designated High Ability NameNode determinator number (330) which has a low overall namespace (namespace) information about the pool disassembling of the whole block and a provided with at number management and or the overall cluster designed to execute the KIPO & [...].

[57]

For example, as a number of jobs (job) when is to be executed,, determinator number node name (330) is each physically routine of and a current performance of node [...] by considering node name (311) selecting a the selector and the procedure part is mined [...] name node (311) pictures relating to (job) receiving the packet data or a basic part can be in addition.

[58]

[...] name node (311) by private network node interface nodes of the physical physical node is present in the one by one into of virtual [...] data node (321) are in blocks having name space individual information about the material may have a (namespace).

[59]

Determinator number node name (330) is e.g. assigning job to node name [...] by comparing the. off at the first and the second.

[60]

1) the selected node name entire - (number 1 technique)

[61]

Number 1 the work nodes name entire according to technique (job) can be allocates are sequentially. Physical environment existing mirror transistor number 1 of operating in the Federation node name (job) placing a method with virtualization of the method applicable to cluster Hadoop Hadoop NameNode of the other performance when the overall imbalance caused a cluster performance from the head-KIPO & foot spot welder occur.

[62]

2) entire name routine of performance and of nodes (number 2 technique) - selected by considering the

[63]

According to technique number 2 and operation performance of node name entire check a work can assign KIPO & (job). Number 2 technique performance consistent Hadoop NameNode 14a is maintained it is constant and operation performance of successful to check on the upper portion and from the head-KIPO & foot spot welder occur.

[64]

3) entire name nodes to randomly chosen (number 3 technique) -

[65]

According to technique number 3 one nodes entire name name work by randomly selecting a node can assign KIPO & (job). Such technique and number 2 number 3 mirror transistor routine of performance and cost for checking the but free from, as well as number 1 technique Hadoop NameNode of the total imbalance caused a cluster performance from the head-KIPO & foot spot welder occur.

[66]

4) 2 dog entire name nodes performance by randomly selecting a final selection for the braking and/and operation - (number 4 technique)

[67]

According to technique number 4 in two entire name nodes 2 selects the bit lines node name, two selected 2 and operation performance of node name check a work by determining a final name node can assign KIPO & (job). 3 number 1 to number 4 on mirror transistor in techniques it became door number point number relating to and check consistency of performance message logging is displayed, and by the ability to consistent some extent agreement point can check cost the unit is off.

[68]

Also Figure 4 shows a node name in cluster environment virtualization according to an embodiment of the present invention for assigning is flow to explain the method.

[69]

Also refers to surface 4, according to an embodiment of the present invention node name in cluster environment virtualization method for assigning the, work (job) the steps of receiving and work assign comprises a step of determining a node name the can be constructed.

[70]

First, a workpiece assigned from a client capable of receiving (job). (S410).

[71]

Virtualization name at least two included in cluster of nodes considering processing speed performance or workings name to at least two task nodes assign name node. can be determined.

[72]

Wherein, virtualization of machine multiple physical clusters present in each one of the many cyber nodes may be configured into a set of, at least two physical each other in the time order of node name one for each machines. may be positioned.

[73]

At least two two nodes name the bit lines node name (randomly) may be selected to (S420).

[74]

Two selected nodes in the name of operating in the assign (job) can determine the node name (S430). I.e., two selected and operation performance of node name is selected by considering the two nodes in the name of operating in the assign (job) can be final decision node name.

[75]

Finally, final decision name node can assign KIPO & job to (S440).

[76]

In addition, the method allocation node name, [...] distributed file system by (HDFS: HADOOP Distributed File System) but may be employed in the cement precursor based on, limited to not.

[77]

Figure 5 shows a multiple of the present invention in the embodiment also applied with node name are assigned work in a cluster on virtualization node name account the hollow is graphs for.

[78]

I.e., the above-mentioned Figure 5 4 of techniques, the performance for the may compare the is graphs representing the result of simulation.

[79]

(Assumption) home for simulation as follows.

[80]

A) virtualization Hadoop cluster 10 is Hadoop NameNode NameNode 1 to 5 up to the double there are two 5 has poor performance node name the group, up to two 5 NameNode 6 to 10 impossible due node name B group. array block, and by dividing 6.

[81]

B) name node number of an intervening determinator 4 of techniques according to selecting Hadoop NameNode 100 to be removed over the entire surface of dispensing a work (job). And cluster in an entire certain job among the jobs (job) is processing vessel with a great overall height to. in calculating the time.

[82]

C) group A of having one of the work is Hadoop NameNode (job) often decreases the time it takes the to 2 seconds, group B Hadoop NameNode of processing (job) having one of the work is often decreases the time it takes the home a 1 seconds.

[83]

D) the gas spraying gun Hadoop NameNode is engaged between an the check and performance a cost per gun 0. 01 a home seconds.

[84]

The number 1 to aromatic polycarbonate 5a also when the work (job) a assigned to nodes name is indicative of the frequency and, to aromatic polycarbonate 5b also work when the number 2 (job) a assigned to nodes name is indicative of the frequency and, the number 3 to aromatic polycarbonate 5c also when the work (job) is the frequency at which the red exhibiting assigned to nodes name, 5d also work when the number 4 to aromatic polycarbonate (job) is name assigned to nodes exhibits and the frequency at which the red. Wherein, the is put random selection technique and number 3 embodiment the number 4 technique total 3 times.

[85]

Also from a 5a, all output of that module independently of the performance of node name work especially evenly across node name (job) overall cluster performance is allocated from the head-KIPO & foot spot welder occur.

[86]

Also from a 5b, with the group least performance A the work (job) linking with the allocation may a to. Just, name node to check and operation performance of successful on the upper portion and from the head-KIPO & foot spot welder occur.

[87]

Also from a 5c, work (job) is assigned to nodes name without consistency is can be viewed. I.e., randomly work output of that module independently of the performance of node name (job) is allocated cluster on in response to a performance from the head-KIPO & foot spot welder occur.

[88]

Also from a 5d, with the group least performance A the work (job) linking with the assigned it is found that the KIPO &. In addition, a randomly chosen two performance of node name and comparing the calculated only and operation since the cost added to forward to won't make much difference.

[89]

Virtualization Figure 6 according to an embodiment of the present invention for assigning node name in cluster environment in the event that the vehicle should applying method, the same time in work is graph for comparing a voltage obtained.

[90]

(Assumption) home aforementioned Figure 6 shows a in Figure 5 based on the aforementioned 4 of techniques work of 100 according to processing (job) exhibits to a predetermined.

[91]

Also 6 with a, number 4 will be most pronounced when according to technique plays as an oxidant or as a processing time is short, according to technique number 3 will be most pronounced when a treatment time is reduced and can be viewed.

[92]

I.e., number 4 technique compromising cost and check consistency of performance in a method, the ability to consistent some extent can check cost the unit is off.

[93]

According to an embodiment of the present invention the above-mentioned virtualization method for assigning node name in cluster environment and large data parallel distributed a system for processing that utilize a plurality of name node name optimum in cluster environment virtualization efficiently selected node outputs a relay driving signal.. I.e., a check for and operation performance of node name while minimizing to performance is maintained for a consistent..

[94]

Thus, the present invention refers to performance is improved overall cluster virtualization may contribute to its.

[95]

But in said of the present invention preferred embodiment described by referring to, is a classic mirror server art corresponding a lead one skilled in the art of the present invention concept and region patent the following is claimed is within such a range that causes no away from the present invention various modified and change can be 2000 database for each consumer.



[96]

Disclosed are a method and an apparatus for selecting an optimum name node in a virtual cluster environment, wherein a plurality of name nodes exist. The method for allocating a name node in a virtual cluster environment includes a step of receiving a job; and a step of determining a name node, wherein the job is to be allocated, among at least two name nodes by considering a function of at least two name nodes or a processing speed as to the job, which are included in a virtual cluster. Accordingly, the optimum name node in the virtual cluster environment utilizing a plurality of name nodes may be effectively chosen.



Virtualization data cluster environment performed in a distributed processing system in method allocation node name, work (job) the steps of receiving; and virtualization included in cluster name at least two said performance or of nodes based on the consideration of the processing speed workings said at least two said name in nodes assign work comprises a step of determining a node name is, said work assign a determining a node name, name of nodes at least two said two in the bit lines node name selects (randomly), said selected two performance of node name and operation by considering said selected two said nodes name node name assign work characterized in that determining a virtualization method for assigning node name in cluster environment.

According to Claim 1, said virtualization cluster includes, multiple physical machine many cyber present in each one of the of is constituted of a set of nodes characterized in that virtualization method for assigning node name in cluster environment.

According to Claim 1, said name at least two each of the nodes, one for each multiple physical machines disposed a virtualization method for assigning node name in cluster environment.

According to Claim 1, the name node allocation method, (HDFS: HADOOP Distributed File System) [...] distributed file system by be applied based on characterized in that virtualization method for assigning node name in cluster environment.

Number ablation

In a system for processing large, at least two included in cluster virtualization name centering on a user such that the node is constituted of a set of nodes Federation; and Federation node name included in said at least two said or working performance of node name (job), a process for the speed to take account of the voltage on said at least two said work name in nodes determining a node name assign number node name comprising a fishermen, said name node number the fishermen, at least two said two in name name node of nodes (randomly) selects the bit lines, said selected two performance of node name and operation by considering said selected two said nodes work name node name assign large scale is provided which is excellent in a system to carry out the parallel distributed data.

According to Claim 6, said virtualization cluster includes, multiple physical machine of present in each one of the many cyber nodes into a set of large scale is characterized in that it is comprised a system to carry out the parallel distributed data.

According to Claim 6, said name at least two each of the nodes, multiple physical machines one for each large scale disposed a system to carry out the parallel distributed data.

According to Claim 6, said large scale a system for processing parallel distributed data, (HDFS: HADOOP Distributed File System) [...] distributed file system by, based on a large scale is characterized in a system to carry out the parallel distributed data.

Number ablation