dulwich.pack module¶
Classes for dealing with packed git objects.
A pack is a compact representation of a bunch of objects, stored using deltas where possible.
They have two parts, the pack file, which stores the data, and an index that tells you where the data is.
To find an object you look in all of the index files ‘til you find a match for the object name. You then use the pointer got from this as a pointer in to the corresponding packfile.
- class dulwich.pack.DeltaChainIterator(file_obj, resolve_ext_ref=None)¶
Bases:
object
Abstract iterator over pack data based on delta chains.
Each object in the pack is guaranteed to be inflated exactly once, regardless of how many objects reference it as a delta base. As a result, memory usage is proportional to the length of the longest delta chain.
Subclasses can override _result to define the result type of the iterator. By default, results are UnpackedObjects with the following members set:
offset
obj_type_num
obj_chunks
pack_type_num
delta_base (for delta types)
comp_chunks (if _include_comp is True)
decomp_chunks
decomp_len
crc32 (if _compute_crc32 is True)
- ext_refs()¶
- classmethod for_pack_data(pack_data, resolve_ext_ref=None)¶
- record(unpacked)¶
- set_pack_data(pack_data)¶
- class dulwich.pack.FilePackIndex(filename, file=None, contents=None, size=None)¶
Bases:
PackIndex
Pack index that is based on a file.
To do the loop it opens the file, and indexes first 256 4 byte groups with the first byte of the sha id. The value in the four byte group indexed is the end of the group that shares the same starting byte. Subtract one from the starting byte and index again to find the start of the group. The values are sorted by sha id within the group, so do the math to find the start and end offset and then bisect in to find if the value is present.
Create a pack index object.
Provide it with the name of the index file to consider, and it will map it whenever required.
- calculate_checksum()¶
Calculate the SHA1 checksum over this pack index.
Returns: This is a 20-byte binary digest
- check()¶
Check that the stored checksum matches the actual checksum.
- close()¶
- get_pack_checksum()¶
Return the SHA1 checksum stored for the corresponding packfile.
Returns: 20-byte binary digest
- get_stored_checksum()¶
Return the SHA1 checksum stored for this index.
Returns: 20-byte binary digest
- iterentries()¶
Iterate over the entries in this pack index.
- Returns: iterator over tuples with object name, offset in packfile and
crc32 checksum.
- object_index(sha)¶
Return the index in to the corresponding packfile for the object.
Given the name of an object it will return the offset that object lives at within the corresponding pack file. If the pack file doesn’t have the object then None will be returned.
- property path¶
- class dulwich.pack.MemoryPackIndex(entries, pack_checksum=None)¶
Bases:
PackIndex
Pack index that is stored entirely in memory.
Create a new MemoryPackIndex.
- Parameters
entries – Sequence of name, idx, crc32 (sorted)
pack_checksum – Optional pack checksum
- get_pack_checksum()¶
Return the SHA1 checksum stored for the corresponding packfile.
Returns: 20-byte binary digest
- iterentries()¶
Iterate over the entries in this pack index.
- Returns: iterator over tuples with object name, offset in packfile and
crc32 checksum.
- object_index(sha)¶
Return the index in to the corresponding packfile for the object.
Given the name of an object it will return the offset that object lives at within the corresponding pack file. If the pack file doesn’t have the object then None will be returned.
- object_sha1(index)¶
Return the SHA1 corresponding to the index in the pack file.
- class dulwich.pack.Pack(basename, resolve_ext_ref: Optional[Callable[[bytes], Tuple[int, UnpackedObject]]] = None)¶
Bases:
object
A Git pack object.
- check()¶
Check the integrity of this pack.
- Raises
ChecksumMismatch – if a checksum for the index or data is wrong
- check_length_and_checksum()¶
Sanity check the length and checksum of the pack index and data.
- close()¶
- property data¶
The pack data object being used.
- entries(progress=None)¶
Yield entries summarizing the contents of this pack.
- Parameters
progress – Progress function, called with current and total object count.
Returns: iterator of tuples with (sha, offset, crc32)
- classmethod from_lazy_objects(data_fn, idx_fn)¶
Create a new pack object from callables to load pack data and index objects.
- classmethod from_objects(data, idx)¶
Create a new pack object from pack data and index objects.
- get_raw(sha1)¶
- get_raw_unresolved(sha1)¶
Get raw unresolved data for a SHA.
- Parameters
sha1 – SHA to return data for
- Returns: Tuple with pack object type, delta base (if applicable),
list of data chunks
- get_ref(sha) Tuple[int, int, UnpackedObject] ¶
Get the object for a ref SHA, only looking in this pack.
- get_stored_checksum()¶
- property index¶
The index being used.
Note: This may be an in-memory index
- iterobjects()¶
Iterate over the objects in this pack.
- keep(msg=None)¶
Add a .keep file for the pack, preventing git from garbage collecting it.
- Parameters
msg – A message written inside the .keep file; can be used later to determine whether or not a .keep file is obsolete.
Returns: The path of the .keep file, as a string.
- name()¶
The SHA over the SHAs of the objects in this pack.
- pack_tuples()¶
Provide an iterable for use with write_pack_objects.
- Returns: Object that can iterate over (object, path) tuples
and provides __len__
- resolve_object(offset, type, obj, get_ref=None)¶
Resolve an object, possibly resolving deltas when necessary.
Returns: Tuple with object type and contents.
- sorted_entries(progress=None)¶
Return entries in this pack, sorted by SHA.
- Parameters
progress – Progress function, called with current and total object count
Returns: Iterator of tuples with (sha, offset, crc32)
- class dulwich.pack.PackChunkGenerator(num_records=None, records=None, progress=None, compression_level=-1)¶
Bases:
object
- sha1digest()¶
- class dulwich.pack.PackData(filename, file=None, size=None)¶
Bases:
object
The data contained in a packfile.
Pack files can be accessed both sequentially for exploding a pack, and directly with the help of an index to retrieve a specific object.
The objects within are either complete or a delta against another.
The header is variable length. If the MSB of each byte is set then it indicates that the subsequent byte is still part of the header. For the first byte the next MS bits are the type, which tells you the type of object, and whether it is a delta. The LS byte is the lowest bits of the size. For each subsequent byte the LS 7 bits are the next MS bits of the size, i.e. the last byte of the header contains the MS bits of the size.
For the complete objects the data is stored as zlib deflated data. The size in the header is the uncompressed object size, so to uncompress you need to just keep feeding data to zlib until you get an object back, or it errors on bad data. This is done here by just giving the complete buffer from the start of the deflated object on. This is bad, but until I get mmap sorted out it will have to do.
Currently there are no integrity checks done. Also no attempt is made to try and detect the delta case, or a request for an object at the wrong position. It will all just throw a zlib or KeyError.
Create a PackData object representing the pack in the given filename.
The file must exist and stay readable until the object is disposed of. It must also stay the same size. It will be mapped whenever needed.
Currently there is a restriction on the size of the pack as the python mmap implementation is flawed.
- calculate_checksum()¶
Calculate the checksum for this pack.
Returns: 20-byte binary SHA1 digest
- check()¶
Check the consistency of this pack.
- close()¶
- create_index(filename, progress=None, version=2, resolve_ext_ref=None)¶
Create an index file for this data file.
- Parameters
filename – Index filename.
progress – Progress report function
Returns: Checksum of index file
- create_index_v1(filename, progress=None, resolve_ext_ref=None)¶
Create a version 1 file for this data file.
- Parameters
filename – Index filename.
progress – Progress report function
Returns: Checksum of index file
- create_index_v2(filename, progress=None, resolve_ext_ref=None)¶
Create a version 2 index file for this data file.
- Parameters
filename – Index filename.
progress – Progress report function
Returns: Checksum of index file
- property filename¶
- classmethod from_file(file, size=None)¶
- classmethod from_path(path)¶
- get_compressed_data_at(offset)¶
Given offset in the packfile return compressed data that is there.
Using the associated index the location of an object can be looked up, and then the packfile can be asked directly for that object using this function.
- get_object_at(offset)¶
Given an offset in to the packfile return the object that is there.
Using the associated index the location of an object can be looked up, and then the packfile can be asked directly for that object using this function.
- get_stored_checksum()¶
Return the expected checksum stored in this pack.
- iterentries(progress=None, resolve_ext_ref=None)¶
Yield entries summarizing the contents of this pack.
- Parameters
progress – Progress function, called with current and total object count.
Returns: iterator of tuples with (sha, offset, crc32)
- iterobjects(progress=None, compute_crc32=True)¶
- property path¶
- sorted_entries(progress=None, resolve_ext_ref=None)¶
Return entries in this pack, sorted by SHA.
- Parameters
progress – Progress function, called with current and total object count
Returns: Iterator of tuples with (sha, offset, crc32)
- class dulwich.pack.PackIndex¶
Bases:
object
An index in to a packfile.
Given a sha id of an object a pack index can tell you the location in the packfile of that object if it has it.
- get_pack_checksum()¶
Return the SHA1 checksum stored for the corresponding packfile.
Returns: 20-byte binary digest
- iterentries()¶
Iterate over the entries in this pack index.
- Returns: iterator over tuples with object name, offset in packfile and
crc32 checksum.
- object_index(sha)¶
Return the index in to the corresponding packfile for the object.
Given the name of an object it will return the offset that object lives at within the corresponding pack file. If the pack file doesn’t have the object then None will be returned.
- object_sha1(index)¶
Return the SHA1 corresponding to the index in the pack file.
- objects_sha1()¶
Return the hex SHA1 over all the shas of all objects in this pack.
Note: This is used for the filename of the pack.
- class dulwich.pack.PackIndex1(filename, file=None, contents=None, size=None)¶
Bases:
FilePackIndex
Version 1 Pack Index file.
Create a pack index object.
Provide it with the name of the index file to consider, and it will map it whenever required.
- class dulwich.pack.PackIndex2(filename, file=None, contents=None, size=None)¶
Bases:
FilePackIndex
Version 2 Pack Index file.
Create a pack index object.
Provide it with the name of the index file to consider, and it will map it whenever required.
- class dulwich.pack.PackIndexer(file_obj, resolve_ext_ref=None)¶
Bases:
DeltaChainIterator
Delta chain iterator that yields index entries.
- class dulwich.pack.PackInflater(file_obj, resolve_ext_ref=None)¶
Bases:
DeltaChainIterator
Delta chain iterator that yields ShaFile objects.
- class dulwich.pack.PackStreamCopier(read_all, read_some, outfile, delta_iter=None)¶
Bases:
PackStreamReader
Class to verify a pack stream as it is being read.
The pack is read from a ReceivableProtocol using read() or recv() as appropriate and written out to the given file-like object.
Initialize the copier.
- Parameters
read_all – Read function that blocks until the number of requested bytes are read.
read_some – Read function that returns at least one byte, but may not return the number of bytes requested.
outfile – File-like object to write output through.
delta_iter – Optional DeltaChainIterator to record deltas as we read them.
- verify()¶
Verify a pack stream and write it to the output file.
See PackStreamReader.iterobjects for a list of exceptions this may throw.
- class dulwich.pack.PackStreamReader(read_all, read_some=None, zlib_bufsize=4096)¶
Bases:
object
Class to read a pack stream.
The pack is read from a ReceivableProtocol using read() or recv() as appropriate.
- property offset¶
- read(size)¶
Read, blocking until size bytes are read.
- read_objects(compute_crc32=False)¶
Read the objects in this pack file.
- Parameters
compute_crc32 – If True, compute the CRC32 of the compressed data. If False, the returned CRC32 will be None.
- Returns: Iterator over UnpackedObjects with the following members set:
offset obj_type_num obj_chunks (for non-delta types) delta_base (for delta types) decomp_chunks decomp_len crc32 (if compute_crc32 is True)
- Raises
ChecksumMismatch – if the checksum of the pack contents does not match the checksum in the pack trailer.
zlib.error – if an error occurred during zlib decompression.
IOError – if an error occurred writing to the output file.
- recv(size)¶
Read up to size bytes, blocking until one byte is read.
- class dulwich.pack.SHA1Reader(f)¶
Bases:
object
Wrapper for file-like object that remembers the SHA1 of its data.
- check_sha()¶
- close()¶
- read(num=None)¶
- tell()¶
- class dulwich.pack.SHA1Writer(f)¶
Bases:
object
Wrapper for file-like object that remembers the SHA1 of its data.
- close()¶
- offset()¶
- tell()¶
- write(data)¶
- write_sha()¶
- class dulwich.pack.UnpackedObject(pack_type_num, delta_base, decomp_len, crc32)¶
Bases:
object
Class encapsulating an object unpacked from a pack file.
These objects should only be created from within unpack_object. Most members start out as empty and are filled in at various points by read_zlib_chunks, unpack_object, DeltaChainIterator, etc.
End users of this object should take care that the function they’re getting this object from is guaranteed to set the members they need.
- comp_chunks¶
- crc32¶
- decomp_chunks¶
- decomp_len¶
- delta_base¶
- obj_chunks¶
- obj_type_num¶
- offset¶
- pack_type_num¶
- sha()¶
Return the binary SHA of this object.
- sha_file()¶
Return a ShaFile from this object.
- dulwich.pack.apply_delta(src_buf, delta)¶
Based on the similar function in git’s patch-delta.c.
- Parameters
src_buf – Source buffer
delta – Delta instructions
- dulwich.pack.bisect_find_sha(start, end, sha, unpack_name)¶
Find a SHA in a data blob with sorted SHAs.
- Parameters
start – Start index of range to search
end – End index of range to search
sha – Sha to find
unpack_name – Callback to retrieve SHA by index
Returns: Index of the SHA, or None if it wasn’t found
- dulwich.pack.chunks_length(chunks)¶
- dulwich.pack.compute_file_sha(f, start_ofs=0, end_ofs=0, buffer_size=65536)¶
Hash a portion of a file into a new SHA.
- Parameters
f – A file-like object to read from that supports seek().
start_ofs – The offset in the file to start reading at.
end_ofs – The offset in the file to end reading at, relative to the end of the file.
buffer_size – A buffer size for reading.
Returns: A new SHA object updated with data read from the file.
- dulwich.pack.create_delta(base_buf, target_buf)¶
Use python difflib to work out how to transform base_buf to target_buf.
- Parameters
base_buf – Base buffer
target_buf – Target buffer
- dulwich.pack.deltify_pack_objects(objects, window_size=None)¶
Generate deltas for pack objects.
- Parameters
objects – An iterable of (object, path) tuples to deltify.
window_size – Window size; None for default
- Returns: Iterator over type_num, object id, delta_base, content
delta_base is None for full text entries
- dulwich.pack.iter_sha1(iter)¶
Return the hexdigest of the SHA1 over a set of names.
- Parameters
iter – Iterator over string objects
Returns: 40-byte hex sha1 digest
- dulwich.pack.load_pack_index(path)¶
Load an index file by path.
- Parameters
path – Path to the index file
Returns: A PackIndex loaded from the given path
- dulwich.pack.load_pack_index_file(path, f)¶
Load an index file from a file-like object.
- Parameters
path – Path for the index file
f – File-like object
Returns: A PackIndex loaded from the given file
- dulwich.pack.obj_sha(type, chunks)¶
Compute the SHA for a numeric type and object chunks.
- dulwich.pack.pack_header_chunks(num_objects)¶
Yield chunks for a pack header.
- dulwich.pack.pack_object_chunks(type, object, compression_level=-1)¶
Generate chunks for a pack object.
- Parameters
type – Numeric type of the object
object – Object to write
compression_level – the zlib compression level
Returns: Chunks
- dulwich.pack.pack_object_header(type_num, delta_base, size)¶
Create a pack object header for the given object info.
- Parameters
type_num – Numeric type of the object.
delta_base – Delta base offset or ref, or None for whole objects.
size – Uncompressed object size.
Returns: A header for a packed object.
- dulwich.pack.pack_objects_to_data(objects)¶
Create pack data from objects
- Parameters
objects – Pack objects
Returns: Tuples with (type_num, hexdigest, delta base, object chunks)
- dulwich.pack.read_pack_header(read)¶
Read the header of a pack file.
- Parameters
read – Read function
- Returns: Tuple of (pack version, number of objects). If no data is
available to read, returns (None, None).
- dulwich.pack.read_zlib_chunks(read_some, unpacked, include_comp=False, buffer_size=4096)¶
Read zlib data from a buffer.
This function requires that the buffer have additional data following the compressed data, which is guaranteed to be the case for git pack files.
- Parameters
read_some – Read function that returns at least one byte, but may return less than the requested size.
unpacked – An UnpackedObject to write result data to. If its crc32 attr is not None, the CRC32 of the compressed bytes will be computed using this starting CRC32. After this function, will have the following attrs set: * comp_chunks (if include_comp is True) * decomp_chunks * decomp_len * crc32
include_comp – If True, include compressed data in the result.
buffer_size – Size of the read buffer.
Returns: Leftover unused data from the decompression. :raises zlib.error: if a decompression error occurred.
- dulwich.pack.take_msb_bytes(read, crc32=None)¶
Read bytes marked with most significant bit.
- Parameters
read – Read function
- dulwich.pack.unpack_object(read_all, read_some=None, compute_crc32=False, include_comp=False, zlib_bufsize=4096)¶
Unpack a Git object.
- Parameters
read_all – Read function that blocks until the number of requested bytes are read.
read_some – Read function that returns at least one byte, but may not return the number of bytes requested.
compute_crc32 – If True, compute the CRC32 of the compressed data. If False, the returned CRC32 will be None.
include_comp – If True, include compressed data in the result.
zlib_bufsize – An optional buffer size for zlib operations.
- Returns: A tuple of (unpacked, unused), where unused is the unused data
leftover from decompression, and unpacked in an UnpackedObject with the following attrs set:
obj_chunks (for non-delta types)
pack_type_num
delta_base (for delta types)
comp_chunks (if include_comp is True)
decomp_chunks
decomp_len
crc32 (if compute_crc32 is True)
- dulwich.pack.write_pack(filename, objects, deltify=None, delta_window_size=None, compression_level=-1)¶
Write a new pack data file.
- Parameters
filename – Path to the new pack file (without .pack extension)
objects – (object, path) tuple iterable to write. Should provide __len__
delta_window_size – Delta window size
deltify – Whether to deltify pack objects
compression_level – the zlib compression level
Returns: Tuple with checksum of pack file and index file
- dulwich.pack.write_pack_data(write, num_records=None, records=None, progress=None, compression_level=-1)¶
Write a new pack data file.
- Parameters
write – Write function to use
num_records – Number of records (defaults to len(records) if None)
records – Iterator over type_num, object_id, delta_base, raw
progress – Function to report progress to
compression_level – the zlib compression level
Returns: Dict mapping id -> (offset, crc32 checksum), pack checksum
- dulwich.pack.write_pack_header(write, num_objects)¶
Write a pack header for the given number of objects.
- dulwich.pack.write_pack_index(f, entries, pack_checksum)¶
Write a new pack index file.
- Parameters
f – File-like object to write to
entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
pack_checksum – Checksum of the pack file.
Returns: The SHA of the index file written
- dulwich.pack.write_pack_index_v1(f, entries, pack_checksum)¶
Write a new pack index file.
- Parameters
f – A file-like object to write to
entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
pack_checksum – Checksum of the pack file.
Returns: The SHA of the written index file
- dulwich.pack.write_pack_index_v2(f, entries, pack_checksum)¶
Write a new pack index file.
- Parameters
f – File-like object to write to
entries – List of tuples with object name (sha), offset_in_pack, and crc32_checksum.
pack_checksum – Checksum of the pack file.
Returns: The SHA of the index file written
- dulwich.pack.write_pack_object(write, type, object, sha=None, compression_level=-1)¶
Write pack object to a file.
- Parameters
write – Write function to use
type – Numeric type of the object
object – Object to write
compression_level – the zlib compression level
Returns: Tuple with offset at which the object was written, and crc32
- dulwich.pack.write_pack_objects(write, objects, delta_window_size=None, deltify=None, compression_level=-1)¶
Write a new pack data file.
- Parameters
write – write function to use
objects – Iterable of (object, path) tuples to write. Should provide __len__
delta_window_size – Sliding window size for searching for deltas; Set to None for default window size.
deltify – Whether to deltify objects
compression_level – the zlib compression level to use
Returns: Dict mapping id -> (offset, crc32 checksum), pack checksum