Review this prior art and click on the thumbs up (or down) to indicate whether this submission should be forwarded to IP Australia.
If you login then you can add an annotation by typing in the box at the bottom of the screen to comment on the relevance of the prior art to the claims of the patent application.

| Summary / Description | Conceptually the idea presented in the prior art is very similar to the one presented in the application. Both of it is aiming to achieve better storage space and bandwidth by clustering data files with similar content, determined by a similar hash-based approach, and avoid storing redundant data via differential encoding scheme. |
| Type of Prior Art | Online Publication |
| URL | http://www.usenix.org/event/use... |
| Author/Creator | Fred Douglis, Jason LaVoie, John M. Tracey |
| Title | Redundancy Elimination Within Large Collections of Files |
| Publication Date | May 12, 2004 |
| Publisher | USENIX 2004 Conference |
| Directions to Document Location | |
| Additional Information | |
| Notes | |
Excerpt Ongoing advancements in technology lead to ever-increasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that reduces data sizes with an effectiveness comparable to the more expensive techniques, but at a cost comparable to the faster but less effective ones. The scheme, called Redundancy Elimination at the Block Level (REBL), leverages the benefits of compression, duplicate block suppression, and delta-encoding to eliminate a broad spectrum of redundant data in a scalable and efficient manner. REBL generally encodes more compactly than compression (up to a factor of 14) and a combination of compression and duplicate suppression (up to a factor of 6.7). REBL also encodes similarly to a technique based on delta-encoding, reducing overall space significantly in one case. Furthermore, REBL uses super-fingerprints, a technique that reduces the data needed to identify similar blocks while dramatically reducing the computational requirements of matching the blocks: it turns O(n2) comparisons into hash table lookups. As a result, using super-fingerprints to avoid enumerating matching data objects decreases computation in the resemblance detection phase of REBL by up to a couple orders of magnitude. |
A distributed, differential electronic-data storage system comprising: client computers that direct data objects to data storage within the distributed, differential electronic-data storage system; component data-storage systems that together provide data storage for the distributed, differential electronic-data storage system; and a routing component that directs data objects, received from the clients computers, through logical bins to component data-storage systems by a data-compression-enhancing method.
| Relevance | The prior art does discuss about the routing component but the distributed differential data storage discussed is very similar. |
The distributed, differential electronic-data storage system of claim 3 wherein the query-based compression-enhancing routing method is a chunk-based query-based compression-enhancing routing method in which a currently considered component data-storage system responds to a query for a data object by: determining a first number of matches between a second number of hash values received from the routing component with hash values stored within the currently considered component data-storage system; and returning to the routing component a value based on the determined first number of matches, the value one of the determined first number of matches, a ratio of the determined first number of matches divided by the second number of hash values, and one minus the ratio of the determined first number of matches divided by the second number of hash values.
| Relevance | The hash-based scheme used to determine similarity is very similar conceptually to the one discussed in this prior art. |
The method of claim 8 wherein a currently considered component data-storage system responds to a query for a data object by: determining a first number of matches between a second number of received hash values with hash values stored within the currently considered component data storage system; and returning a value based on the determined first number of matches, the value one of the determined first number of matches, a ratio of the determined first number of matches divided by the second number of hash values, and one minus the ratio of the determined first number of matches divided by the second number of hash values.
| Relevance | The hash-based scheme used to determine similarity is very similar conceptually to the one discussed in this prior art. |





United States