A recurring problem in several of the projects I have in the pipeline is the matter of handling ZLIB. Java, through java.util.zip offers ZLIB compression with a 32k byte window (but no means of tuning the window) with the DeflaterOutputStream. The .Net framework doesn't offer direct ZLIB at all, but provides naked Deflate via System.IO.Compression.DeflateStream.
That gives us enough to be able to reflate the output of a ZLIB deflation, since a ZLIB is a 2 byte header, a deflate section and finally a 4 byte checksum:
import clr
clr.AddReference( 'vjslib, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' )
from java.io import *
from java.util.zip import *
from System.IO import *
from System.IO.Compression import *
from System import Array, Byte, SByte, Int64
from System.Text import Encoding
def readFileToSByteArray(name):
# open a file via Java I/O
raw = RandomAccessFile(name, 'r')
# print "File size in bytes = %d " % (raw.length())
# read into signed byte array
dim = Array[Int64]([raw.length()])
helper = Array[SByte]([0])
buffer = Array.CreateInstance(helper[0].GetType(), dim)
raw.readFully(buffer)
raw.close()
return buffer
def sbyteArrayToUbyteArray(buffer, offset=0, length=-1):
if length < 0:
length = buffer.Length
dim = Array[Int64]([length])
helper = Array[Byte]([0])
ubuffer = Array.CreateInstance(helper[0].GetType(), dim)
# copy into unsigned byte array
for i in range(0,length):
ubuffer[i] = buffer[i+offset] & 0xff
return ubuffer
def jdeflate(buffer):
sink0 = ByteArrayOutputStream()
sink = DeflaterOutputStream(sink0)
sink.write(buffer, 0, buffer.Length)
sink.close()
sink0.close()
return sink0.toByteArray()
def update_adler(adler, buffer):
s1 = adler & 0xffff
s2 = (adler >> 16) & 0xffff
BASE = 65521
for n in range(0,buffer.Length):
s1 = (s1 + buffer[n]) % BASE
s2 = (s2 + s1) % BASE
return (s2 << 16) + s1
def ninflate(buffer):
mem = MemoryStream(buffer)
inflate = DeflateStream(mem, CompressionMode.Decompress)
mem = MemoryStream()
while True:
x = inflate.ReadByte()
if x < 0:
break
mem.WriteByte(x)
inflate.Close()
mem.Close()
return mem.ToArray()
##==========================================
# select a file
name = ...
# make signed, unsigned buffers and string
buffer = readFileToSByteArray(name)
print "Constructed buffer size %d" % (buffer.Length)
ubuffer = sbyteArrayToUbyteArray(buffer)
instring = Encoding.Default.GetString(ubuffer)
# compute the Adler32 checksum
adler = update_adler(1, ubuffer)
print "Adler32 = %d" % (adler)
# deflate in Java
deflated = jdeflate(buffer)
# check the ZLIB header -- expect 120, 156 actually (120,-100)
first = (deflated[0] & 0xff)
second = (deflated[1] & 0xff)
print "header value EXPECTED 120:156 ACTUAL %d:%d" % (first, second)
i = deflated.Length-4
x = ((deflated[i]&0xff) << 24) | ((deflated[i+1]&0xff) << 16) | ((deflated[i+2]&0xff) << 8) | (deflated[i+3]&0xff)
print "Java Adler32 = %d" % (x)
# discard header and Adler32 tail
newbuffer = sbyteArrayToUbyteArray(deflated, 2, deflated.Length-6)
print "ZLIB length %d" % (deflated.Length)
print "deflate section length %d" % (newbuffer.Length)
# reflate
out = ninflate(newbuffer)
# compare with input
print "reflated buffer length : %d" % (out.Length)
outstring = Encoding.Default.GetString(out)
same = outstring.Equals(instring)
print "Output == Input ?? %s" %(str(same))That works fine; but the converse, taking a .Net deflate and adding the appropriate top and tail took a bit of getting to work.
First pitfall -- the InflaterInputStream has to be read in chunks as large as feasible, rather than byte at a time, so as not to throw a premature EOFException. That overcome, I got a result of the right length, but differing in the final few characters for the file under test, which I resolved by doing a belt-and-braces closing of the deflate operation. The un-refactored code I currently have continues from the above as:
# now the other way around
sink0 = MemoryStream()
sink = DeflateStream(sink0, CompressionMode.Compress)
sink.Write(ubuffer, 0, ubuffer.Length)
## overkill the flush and close here
sink.Flush()
sink.Close()
sink0.Flush()
sink0.Close()
deflated = sink0.ToArray()
print "deflate section length %d" % (deflated.Length)
#now inflate
dim = Array[Int64]([deflated.Length+6])
jbuffer = Array.CreateInstance(buffer[0].GetType(), dim)
jbuffer[0] = 120 # 0111 0100 = 120 : 32 kbit window, Deflate
jbuffer[1] = -100 # 10 0 ????? = 128 + x : default compress, no dict, checksum
def sbyte(x):
y = x & 0xff
if y > 127:
return y - 256
return y
for i in range(0,deflated.Length):
jbuffer[i+2] = sbyte(deflated[i])
i = deflated.Length+2
jbuffer[i] = sbyte(adler>>24)
jbuffer[i+1] = sbyte(adler>>16)
jbuffer[i+2] = sbyte(adler>>8)
jbuffer[i+3] = sbyte( adler )
source0 = ByteArrayInputStream(jbuffer)
source = InflaterInputStream(source0)
dim = Array[Int64]([ubuffer.Length])
helper = Array[SByte]([0])
xbuffer = Array.CreateInstance(helper[0].GetType(), dim)
## take big bites
offset = 0
try:
while True:
x = source.read(xbuffer, offset, xbuffer.Length-offset)
if x < 0:
break
offset += x
except EOFException:
pass
reflated = sbyteArrayToUbyteArray(xbuffer)
print "reflated length is %d" % (reflated.Length)
outstring = Encoding.Default.GetString(reflated)
same = outstring.Equals(instring)
print sameThis should allow me to simplify the baggage accumulated for the C#/Erlang bridge, which currently uses a second-generation port of the original 'C' ZLIB for this sort of interoperation.
Much Later — I have ported this to F# on .net 4.5.
There's a lot of buffer to stream to buffer conversion involved -- the {n,j}{in,de}flate functions are just that. It's ndeflateZLIB, ninflateZLIBFull that do the complete transformation from a byte array to a compressed byte array and back again. The sbyte bit is purely a J# compatibility thing. Full solution here.


3 comments:
Hi Steve - Thanks for the very interesting code. Under .NET 4.5 I encounter the 'Block length does not match its complement' error when using DeflateStream. Will your approach work under .NET 4.5? Best, Brian
Making a quick search for that error message gave me this link for a case where that error was observed in the inflate operation, and the remedy.
The key bit of my FePy code code above is line 97 of the first fragment, where I drop the first 2 and last 4 bytes of the ZLIB compressed material -- the ZLIB header and footer data -- to get at the pure deflated stream in-between.
I ported the FePy code to F# in .net 4.5, and yes, it still works there. The important bit is
// remove the ZLIB wrapper to reveal the inner deflate stream
// Use this before ninflate
let stripZLIBwrapper (buffer:array<byte>) =
let size = buffer.Length - 6
buffer |> Seq.skip 2 |> Seq.take size |> Seq.toArray
let ninflateZLIB = stripZLIBwrapper >> ninflate
where ninflate just does the job of pushing an input byte[] buffer through a DeflateStream(..., CompressionMode.Decompress) into another buffer as above.
Post a Comment