=== API === .. py:module:: avroc :synopsis: The main entrypoint ``avroc`` --------- This module holds the main public API. .. py:function:: compile_encoder(schema) Construct a callable which encodes Python objects to bytes according to an Avro schema. :param schema: The schema to use when encoding data. Usually, this is a ``dict``. :type schema: dict (see :ref:`schema-types`) :rtype: function encoder(msg) -> bytes .. py:function:: compile_decoder(writer_schema, reader_schema=None) Construct a callable which decodes Python objects from a bytes reader. :param writer_schema: The schema used by the writer when encoding data. Usually, this is a ``dict``. :type writer_schema: dict (see :ref:`schema-types`) :param reader_schema: An optional schema to transform messages into when decoding data. The schema must be compatible with the writer's schema, in the sense described by the Avro spec; see :ref:`schema-resolution` for details. :type reader_schema: Optional[dict] (see :ref:`schema-types`) :rtype: function decoder(fp) -> msg .. py:function:: read_file(fo, schema) Read a file containing Avro messages. The file should already be opened, and should be opened in binary mode (like ``open(path, "rb")``, for example). The messages are provided as an iterator. To get all the messages in a list, you can use ``list(read_file(fp))``, for example. The optional ``schema`` parameter can be used to read data into a different shape than the writer used; see :ref:`schema-resolution` for more. Note that the writer's schema is always included in an Avro data file, so the schema is purely optional - you only need to pass it if you want to use a *different* schema than the writer used during encoding. :param fo: A handle to a file-like bytes source to read. :type fo: IO[bytes] :param schema: An optional schema to transform messages into when decoding data. The schema must be compatible with the writer's schema, in the sense described by the Avro spec; see :ref:`schema-resolution` for details. If no schema is provided, then the writer's schema is used. :type schema: Optional[dict] (see :ref:`schema-types`) :returns: An iterator of the messages in the file. The messages' type depend on the schema used when decoding, as laid out in :ref:`message-types`. :rtype: Iterable[msg] .. py:function:: write_file(fo, schema, messages) Write messages to an open file according under a given Avro schema. All messages in the iterable will be consumed and written. :param fo: A handle to a file-like bytes destination to write to. :type fo: IO[bytes] :param schema: The schema to use when encoding data. :type schema: dict (see :ref:`schema-types`) :param messages: An iterable of the messages to write into the file. The messages must be encodable under the given schema; see :ref:`message-types` for details. :type messages: Iterable[msg] .. py:class:: AvroFileWriter(fo, schema, codec=NullCodec, block_size=1000) A low-level class for writing Avro data to a file, complete with all persnickety details. Most users should use :py:obj:`write_file`. AvroFileWriter provides these additional capabilities on top of :py:obj:`write_file`: - You can write messages one-by-one, rather than passing an entire iterator of messages. - You can choose a compression codec to apply to all data bytes written to the file; the codec is stored in the Avro header so other readers will know how to read the data automatically. - You can pick a block size and choose exactly when flushes occur. Writes are buffered, and written in `blocks` of the given ``block-size``. As a result, it is important to call :py:meth:`flush` to be ensure that all writes are actually persisted to the underlying file. This can be done by using the AvroFileWriter as a context manager. For example, like this: .. code-block:: py with open("data.avro", "wb") as f: with AvroFileWriter(f, schema) as w: w.write(msg1) w.write(msg2) w.write(msg3) # When the 'with' block is exited, all writes will be # flushed, so this is safe. :param fo: A file-like object that can be written to in binary mode. :type fo: File-like in bytes mode :param schema: The schema to use when encoding data. :type schema: dict (see :ref:`schema-types`) :param codec: A compression codec to use when encoding data. The valid options are all the classes in :py:mod:`avroc.codec`. Make sure to pass an instantiated instance, not a class. :type codec: avroc.codec.Codec .. py:method:: write(msg) Write a single message to the Avro file. Writes are batched into large blocks; call :py:meth:`flush` to flush the current block. :param msg: A message conforming to the writer's schema. .. py:method:: flush() Flush any outstanding writes to the underlying file. .. py:method:: __enter__() Returns self, allowing the writer to be used as a context manager. .. py:method:: __exit__(exc_type, exc_value, exc_traceback) Flushes any buffered writes and exits the context-managed block. .. py:module:: avroc.codec :synopsis: Compression codecs which can be used when writing Avro files ``avroc.codec`` --------------- Avro has some officially-endorsed codecs which can be used when writing files (and are automatically selected when reading encoded files). Using these can help you save some space, at the cost of a bit of CPU time for compression and decompression. Avroc implements all the codecs `from the Avro specification `_. .. py:class:: Codec() Abstract base class, implemented by the other classes in this module. Those classes are: +---------------------------+------------------------------+ | Class | Description | +===========================+==============================+ | :py:obj:`NullCodec` |No compression | +---------------------------+------------------------------+ | :py:obj:`DeflateCodec` |Compress with DEFLATE, similar| | |to gzip | +---------------------------+------------------------------+ | :py:obj:`SnappyCodec` |Compress with snappy | +---------------------------+------------------------------+ | :py:obj:`Bzip2Codec` |Compress with bzip2 | +---------------------------+------------------------------+ | :py:obj:`XZCodec` |Compress with xz, from the | | |lzma family | +---------------------------+------------------------------+ | :py:obj:`ZstandardCodec` |Compress with zstandard | +---------------------------+------------------------------+ .. py:class:: NullCodec() A NullCodec does no compression. It just passes data through. .. py:class:: DeflateCodec(compression_level=None) A DeflateCodec uses the deflate algorithm from `RFC 1951 `_. :param compression_level: The Deflate compression level to use. Higher is more compressed. 0 is no compression, and 9 is max compression. Defaults to the default compression level of Python's :py:obj:`zlib` (currently 6). :type compression_level: int .. py:class:: SnappyCodec() A SnappyCodec uses Google's `snappy `_ compression algorithm, followed by a 4-byte CRC32 checksum. .. py:class:: Bzip2Codec() A Bzip2Codec uses the `bzip2 `_ compression algorithm. .. py:class:: XZCodec() A XZCodec uses the `lzma `_ compression algorithm. .. py:class:: ZstandardCodec(compressor=None) A ZstandardCodec uses the `zstandard `_ compression algorithm. :param compressor: A compressor, possibly which has already been trained on other data, which should be used when compressing data. If unset, then a compressor with all the default values is used. :type compressor: :py:obj:`zstandard.ZstdCompressor`