API¶
avroc¶
This module holds the main public API.
- avroc.compile_encoder(schema)¶
Construct a callable which encodes Python objects to bytes according to an Avro schema.
- Parameters
schema (dict (see Schema Types)) – The schema to use when encoding data. Usually, this is a
dict.- Return type
function encoder(msg) -> bytes
- avroc.compile_decoder(writer_schema, reader_schema=None)¶
Construct a callable which decodes Python objects from a bytes reader.
- Parameters
writer_schema (dict (see Schema Types)) – The schema used by the writer when encoding data. Usually, this is a
dict.reader_schema (Optional[dict] (see Schema Types)) – An optional schema to transform messages into when decoding data. The schema must be compatible with the writer’s schema, in the sense described by the Avro spec; see Schema Resolution for details.
- Return type
function decoder(fp) -> msg
- avroc.read_file(fo, schema)¶
Read a file containing Avro messages. The file should already be opened, and should be opened in binary mode (like
open(path, "rb"), for example).The messages are provided as an iterator. To get all the messages in a list, you can use
list(read_file(fp)), for example.The optional
schemaparameter can be used to read data into a different shape than the writer used; see Schema Resolution for more.Note that the writer’s schema is always included in an Avro data file, so the schema is purely optional - you only need to pass it if you want to use a different schema than the writer used during encoding.
- Parameters
fo (IO[bytes]) – A handle to a file-like bytes source to read.
schema (Optional[dict] (see Schema Types)) – An optional schema to transform messages into when decoding data. The schema must be compatible with the writer’s schema, in the sense described by the Avro spec; see Schema Resolution for details. If no schema is provided, then the writer’s schema is used.
- Returns
An iterator of the messages in the file. The messages’ type depend on the schema used when decoding, as laid out in Message Types.
- Return type
Iterable[msg]
- avroc.write_file(fo, schema, messages)¶
Write messages to an open file according under a given Avro schema.
All messages in the iterable will be consumed and written.
- Parameters
fo (IO[bytes]) – A handle to a file-like bytes destination to write to.
schema (dict (see Schema Types)) – The schema to use when encoding data.
messages (Iterable[msg]) – An iterable of the messages to write into the file. The messages must be encodable under the given schema; see Message Types for details.
- class avroc.AvroFileWriter(fo, schema, codec=NullCodec, block_size=1000)¶
A low-level class for writing Avro data to a file, complete with all persnickety details. Most users should use
write_file.- AvroFileWriter provides these additional capabilities on top of
write_file: You can write messages one-by-one, rather than passing an entire iterator of messages.
You can choose a compression codec to apply to all data bytes written to the file; the codec is stored in the Avro header so other readers will know how to read the data automatically.
You can pick a block size and choose exactly when flushes occur.
Writes are buffered, and written in blocks of the given
block-size. As a result, it is important to callflush()to be ensure that all writes are actually persisted to the underlying file.This can be done by using the AvroFileWriter as a context manager. For example, like this:
with open("data.avro", "wb") as f: with AvroFileWriter(f, schema) as w: w.write(msg1) w.write(msg2) w.write(msg3) # When the 'with' block is exited, all writes will be # flushed, so this is safe.
- Parameters
fo (File-like in bytes mode) – A file-like object that can be written to in binary mode.
schema (dict (see Schema Types)) – The schema to use when encoding data.
codec (avroc.codec.Codec) – A compression codec to use when encoding data. The valid options are all the classes in
avroc.codec. Make sure to pass an instantiated instance, not a class.
- write(msg)¶
Write a single message to the Avro file. Writes are batched into large blocks; call
flush()to flush the current block.- Parameters
msg – A message conforming to the writer’s schema.
- flush()¶
Flush any outstanding writes to the underlying file.
- __enter__()¶
Returns self, allowing the writer to be used as a context manager.
- __exit__(exc_type, exc_value, exc_traceback)¶
Flushes any buffered writes and exits the context-managed block.
- AvroFileWriter provides these additional capabilities on top of
avroc.codec¶
Avro has some officially-endorsed codecs which can be used when writing files (and are automatically selected when reading encoded files). Using these can help you save some space, at the cost of a bit of CPU time for compression and decompression.
Avroc implements all the codecs from the Avro specification.
- class avroc.codec.Codec¶
Abstract base class, implemented by the other classes in this module. Those classes are:
Class
Description
No compression
Compress with DEFLATE, similar to gzip
Compress with snappy
Compress with bzip2
Compress with xz, from the lzma family
Compress with zstandard
- class avroc.codec.NullCodec¶
A NullCodec does no compression. It just passes data through.
- class avroc.codec.DeflateCodec(compression_level=None)¶
A DeflateCodec uses the deflate algorithm from RFC 1951.
- class avroc.codec.SnappyCodec¶
A SnappyCodec uses Google’s snappy compression algorithm, followed by a 4-byte CRC32 checksum.
- class avroc.codec.ZstandardCodec(compressor=None)¶
A ZstandardCodec uses the zstandard compression algorithm.
- Parameters
compressor (
zstandard.ZstdCompressor) – A compressor, possibly which has already been trained on other data, which should be used when compressing data. If unset, then a compressor with all the default values is used.