A recurring problem in several of the projects I have in the pipeline is the matter of handling ZLIB. Java, through java.util.zip
offers ZLIB compression with a 32k byte window (but no means of tuning the window) with the DeflaterOutputStream
. The .Net framework doesn't offer direct ZLIB at all, but provides naked Deflate via System.IO.Compression.DeflateStream
.
That gives us enough to be able to reflate the output of a ZLIB deflation, since a ZLIB is a 2 byte header, a deflate section and finally a 4 byte checksum:
import clr clr.AddReference( 'vjslib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' ) from java.io import * from java.util.zip import * from System.IO import * from System.IO.Compression import * from System import Array, Byte, SByte, Int64 from System.Text import Encoding def readFileToSByteArray(name): # open a file via Java I/O raw = RandomAccessFile(name, 'r') # print "File size in bytes = %d " % (raw.length()) # read into signed byte array dim = Array[Int64]([raw.length()]) helper = Array[SByte]([0]) buffer = Array.CreateInstance(helper[0].GetType(), dim) raw.readFully(buffer) raw.close() return buffer def sbyteArrayToUbyteArray(buffer, offset=0, length=-1): if length < 0: length = buffer.Length dim = Array[Int64]([length]) helper = Array[Byte]([0]) ubuffer = Array.CreateInstance(helper[0].GetType(), dim) # copy into unsigned byte array for i in range(0,length): ubuffer[i] = buffer[i+offset] & 0xff return ubuffer def jdeflate(buffer): sink0 = ByteArrayOutputStream() sink = DeflaterOutputStream(sink0) sink.write(buffer, 0, buffer.Length) sink.close() sink0.close() return sink0.toByteArray() def update_adler(adler, buffer): s1 = adler & 0xffff s2 = (adler >> 16) & 0xffff BASE = 65521 for n in range(0,buffer.Length): s1 = (s1 + buffer[n]) % BASE s2 = (s2 + s1) % BASE return (s2 << 16) + s1 def ninflate(buffer): mem = MemoryStream(buffer) inflate = DeflateStream(mem, CompressionMode.Decompress) mem = MemoryStream() while True: x = inflate.ReadByte() if x < 0: break mem.WriteByte(x) inflate.Close() mem.Close() return mem.ToArray() ##========================================== # select a file name = ... # make signed, unsigned buffers and string buffer = readFileToSByteArray(name) print "Constructed buffer size %d" % (buffer.Length) ubuffer = sbyteArrayToUbyteArray(buffer) instring = Encoding.Default.GetString(ubuffer) # compute the Adler32 checksum adler = update_adler(1, ubuffer) print "Adler32 = %d" % (adler) # deflate in Java deflated = jdeflate(buffer) # check the ZLIB header -- expect 120, 156 actually (120,-100) first = (deflated[0] & 0xff) second = (deflated[1] & 0xff) print "header value EXPECTED 120:156 ACTUAL %d:%d" % (first, second) i = deflated.Length-4 x = ((deflated[i]&0xff) << 24) | ((deflated[i+1]&0xff) << 16) | ((deflated[i+2]&0xff) << 8) | (deflated[i+3]&0xff) print "Java Adler32 = %d" % (x) # discard header and Adler32 tail newbuffer = sbyteArrayToUbyteArray(deflated, 2, deflated.Length-6) print "ZLIB length %d" % (deflated.Length) print "deflate section length %d" % (newbuffer.Length) # reflate out = ninflate(newbuffer) # compare with input print "reflated buffer length : %d" % (out.Length) outstring = Encoding.Default.GetString(out) same = outstring.Equals(instring) print "Output == Input ?? %s" %(str(same))
That works fine; but the converse, taking a .Net deflate and adding the appropriate top and tail took a bit of getting to work.
First pitfall -- the InflaterInputStream
has to be read in chunks as large as feasible, rather than byte at a time, so as not to throw a premature EOFException
. That overcome, I got a result of the right length, but differing in the final few characters for the file under test, which I resolved by doing a belt-and-braces closing of the deflate operation. The un-refactored code I currently have continues from the above as:
# now the other way around sink0 = MemoryStream() sink = DeflateStream(sink0, CompressionMode.Compress) sink.Write(ubuffer, 0, ubuffer.Length) ## overkill the flush and close here sink.Flush() sink.Close() sink0.Flush() sink0.Close() deflated = sink0.ToArray() print "deflate section length %d" % (deflated.Length) #now inflate dim = Array[Int64]([deflated.Length+6]) jbuffer = Array.CreateInstance(buffer[0].GetType(), dim) jbuffer[0] = 120 # 0111 0100 = 120 : 32 kbit window, Deflate jbuffer[1] = -100 # 10 0 ????? = 128 + x : default compress, no dict, checksum def sbyte(x): y = x & 0xff if y > 127: return y - 256 return y for i in range(0,deflated.Length): jbuffer[i+2] = sbyte(deflated[i]) i = deflated.Length+2 jbuffer[i] = sbyte(adler>>24) jbuffer[i+1] = sbyte(adler>>16) jbuffer[i+2] = sbyte(adler>>8) jbuffer[i+3] = sbyte( adler ) source0 = ByteArrayInputStream(jbuffer) source = InflaterInputStream(source0) dim = Array[Int64]([ubuffer.Length]) helper = Array[SByte]([0]) xbuffer = Array.CreateInstance(helper[0].GetType(), dim) ## take big bites offset = 0 try: while True: x = source.read(xbuffer, offset, xbuffer.Length-offset) if x < 0: break offset += x except EOFException: pass reflated = sbyteArrayToUbyteArray(xbuffer) print "reflated length is %d" % (reflated.Length) outstring = Encoding.Default.GetString(reflated) same = outstring.Equals(instring) print same
This should allow me to simplify the baggage accumulated for the C#/Erlang bridge, which currently uses a second-generation port of the original 'C' ZLIB for this sort of interoperation.
Much Later — I have ported this to F# on .net 4.5.
There's a lot of buffer to stream to buffer conversion involved -- the {n,j}{in,de}flate
functions are just that. It's ndeflateZLIB
, ninflateZLIBFull
that do the complete transformation from a byte array to a compressed byte array and back again. The sbyte bit is purely a J# compatibility thing. Full solution here.
3 comments:
Hi Steve - Thanks for the very interesting code. Under .NET 4.5 I encounter the 'Block length does not match its complement' error when using DeflateStream. Will your approach work under .NET 4.5? Best, Brian
Making a quick search for that error message gave me this link for a case where that error was observed in the inflate operation, and the remedy.
The key bit of my FePy code code above is line 97 of the first fragment, where I drop the first 2 and last 4 bytes of the ZLIB compressed material -- the ZLIB header and footer data -- to get at the pure deflated stream in-between.
I ported the FePy code to F# in .net 4.5, and yes, it still works there. The important bit is
// remove the ZLIB wrapper to reveal the inner deflate stream
// Use this before ninflate
let stripZLIBwrapper (buffer:array<byte>) =
let size = buffer.Length - 6
buffer |> Seq.skip 2 |> Seq.take size |> Seq.toArray
let ninflateZLIB = stripZLIBwrapper >> ninflate
where ninflate just does the job of pushing an input byte[] buffer through a DeflateStream(..., CompressionMode.Decompress) into another buffer as above.
Post a Comment