I’m currently working on an application that persists Java serialized data (using ObjectOutputStream) in a database. Java’s serialization format compresses very well – so why not compress the data when storing it then decompress it while reading for a quick win? The problem is that there will still be legacy, uncompressed data, which the application will not be able to access if it assumes all data is now gzipped.
The solution is to use MaybeGZIPInputStream instead of GZIPInputStream. For example, when reading, instead of:
ObjectInputStream ois = new ObjectInputStream(new GZIPInputStream(databaseInputStream));
use MaybeGZIPInputStream instead:
ObjectInputStream ois = new ObjectInputStream(new MaybeGZIPInputStream(databaseInputStream));
And always write data using GZIPOutputStream. Now all of that existing data can be still be read, and newly written data gets the benefit of taking up much less storage (and taking up far less bandwidth / time being transferred between the application servers and the database).
Here’s the source code of MaybeGZIPInputStream:
import java.io.IOException;
import java.io.InputStream;
import java.io.PushbackInputStream;
import java.util.zip.GZIPInputStream;
/** Detect if the given {@link InputStream} contains compressed data. If it does, wrap it in a {@link GZIPInputStream}. If it doesn't, don't.
* @author Craig Andrews
*
*/
public class MaybeGZIPInputStream extends InputStream {
private final InputStream in ;
public MaybeGZIPInputStream(final InputStream in ) throws IOException {
final PushbackInputStream pushbackInputStream = new PushbackInputStream( in , 2);
if (isGZIP(pushbackInputStream)) {
this.in = new GZIPInputStream(pushbackInputStream);
} else {
this.in = pushbackInputStream;
}
}
private boolean isGZIP(final PushbackInputStream pushbackInputStream) throws IOException {
final byte[] bytes = new byte[2];
final int bytesRead = pushbackInputStream.read(bytes);
if (bytesRead > 0) {
pushbackInputStream.unread(bytes, 0, bytesRead);
}
if (bytesRead == 2) {
if ((bytes[0] == (byte)(GZIPInputStream.GZIP_MAGIC)) && (bytes[1] == (byte)(GZIPInputStream.GZIP_MAGIC >> 8))) {
return true;
}
}
return false;
}
public int read() throws IOException {
return in.read();
}
public int hashCode() {
return in.hashCode();
}
public int read(byte[] b) throws IOException {
return in.read(b);
}
public boolean equals(Object obj) {
return in.equals(obj);
}
public int read(byte[] b, int off, int len) throws IOException {
return in.read(b, off, len);
}
public long skip(long n) throws IOException {
return in.skip(n);
}
public String toString() {
return in.toString();
}
public int available() throws IOException {
return in.available();
}
public void close() throws IOException { in .close();
}
public void mark(int readlimit) { in .mark(readlimit);
}
public void reset() throws IOException { in .reset();
}
public boolean markSupported() {
return in.markSupported();
}
}