Commit 03eff2b0 authored by Дмитрий Никулин's avatar Дмитрий Никулин Committed by Никита Ефремов

Add file_read_backwards library

parent 16fbbe53
===============================
file_read_backwards
===============================
.. image:: https://img.shields.io/pypi/v/file_read_backwards.svg
:target: https://pypi.python.org/pypi/file_read_backwards
.. image:: https://img.shields.io/travis/RobinNil/file_read_backwards.svg?branch=master
:target: https://travis-ci.org/RobinNil/file_read_backwards.svg?branch=master
.. image:: https://readthedocs.org/projects/file-read-backwards/badge/?version=latest
:target: https://file-read-backwards.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://pyup.io/repos/github/RobinNil/file_read_backwards/shield.svg
:target: https://pyup.io/repos/github/RobinNil/file_read_backwards/
:alt: Updates
Memory efficient way of reading files line-by-line from the end of file
* Free software: MIT license
* Documentation: https://file-read-backwards.readthedocs.io.
Features
--------
This package is for reading file backward line by line as unicode in a memory efficient manner for both Python 2.7 and Python 3.
It currently supports ascii, latin-1, and utf-8 encodings.
It supports "\\r", "\\r\\n", and "\\n" as new lines.
Usage Examples
--------------
An example of using `file_read_backwards` for `python2.7`::
#!/usr/bin/env python2.7
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
# getting lines by lines starting from the last line up
for l in frb:
print l
Another example using `python3.3`::
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
# getting lines by lines starting from the last line up
for l in frb:
print(l)
Another way to consume the file is via `readline()`, in `python3.3`::
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
while True:
l = frb.readline()
if not l:
break
print(l, end="")
Credits
---------
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
1.0.0 (2016-12-18)
------------------
* First release on PyPI.
1.1.0 (2016-12-31)
------------------
* Added support for "latin-1".
* Marked the package "Production/Stable".
1.1.1 (2017-01-09)
------------------
* Updated README.rst for more clarity around encoding support and Python 2.7 and 3 support.
1.1.2 (2017-01-11)
------------------
* Documentation re-arrangement. Usage examples are now in README.rst
* Minor refactoring
1.2.0 (2017-09-01)
------------------
* Include context manager style as it provides cleaner/automatic close functionality
1.2.1 (2017-09-02)
------------------
* Made doc strings consistent to Google style and some code linting
1.2.2 (2017-11-19)
------------------
* Re-release of 1.2.1 for ease of updating pypi page for updated travis & pyup.
2.0.0 (2018-03-23)
------------------
Mimicing Python file object behavior.
* FileReadBackwards no longer creates multiple iterators (a change of behavior from 1.x.y version)
* Adding readline() function retuns one line at a time with a trailing new line and empty string when it reaches end of file.
The fine print: the trailing new line will be `os.linesep` (rather than whichever new line type in the file).
Metadata-Version: 2.0
Name: file-read-backwards
Version: 2.0.0
Summary: Memory efficient way of reading files line-by-line from the end of file
Home-page: https://github.com/RobinNil/file_read_backwards
Author: Robin Robin
Author-email: robinsquare42@gmail.com
License: MIT license
Keywords: file_read_backwards
Platform: UNKNOWN
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
===============================
file_read_backwards
===============================
.. image:: https://img.shields.io/pypi/v/file_read_backwards.svg
:target: https://pypi.python.org/pypi/file_read_backwards
.. image:: https://img.shields.io/travis/RobinNil/file_read_backwards.svg?branch=master
:target: https://travis-ci.org/RobinNil/file_read_backwards.svg?branch=master
.. image:: https://readthedocs.org/projects/file-read-backwards/badge/?version=latest
:target: https://file-read-backwards.readthedocs.io/en/latest/?badge=latest
:alt: Documentation Status
.. image:: https://pyup.io/repos/github/RobinNil/file_read_backwards/shield.svg
:target: https://pyup.io/repos/github/RobinNil/file_read_backwards/
:alt: Updates
Memory efficient way of reading files line-by-line from the end of file
* Free software: MIT license
* Documentation: https://file-read-backwards.readthedocs.io.
Features
--------
This package is for reading file backward line by line as unicode in a memory efficient manner for both Python 2.7 and Python 3.
It currently supports ascii, latin-1, and utf-8 encodings.
It supports "\\r", "\\r\\n", and "\\n" as new lines.
Usage Examples
--------------
An example of using `file_read_backwards` for `python2.7`::
#!/usr/bin/env python2.7
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
# getting lines by lines starting from the last line up
for l in frb:
print l
Another example using `python3.3`::
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
# getting lines by lines starting from the last line up
for l in frb:
print(l)
Another way to consume the file is via `readline()`, in `python3.3`::
from file_read_backwards import FileReadBackwards
with FileReadBackwards("/tmp/file", encoding="utf-8") as frb:
while True:
l = frb.readline()
if not l:
break
print(l, end="")
Credits
---------
This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.
.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage
=======
History
=======
1.0.0 (2016-12-18)
------------------
* First release on PyPI.
1.1.0 (2016-12-31)
------------------
* Added support for "latin-1".
* Marked the package "Production/Stable".
1.1.1 (2017-01-09)
------------------
* Updated README.rst for more clarity around encoding support and Python 2.7 and 3 support.
1.1.2 (2017-01-11)
------------------
* Documentation re-arrangement. Usage examples are now in README.rst
* Minor refactoring
1.2.0 (2017-09-01)
------------------
* Include context manager style as it provides cleaner/automatic close functionality
1.2.1 (2017-09-02)
------------------
* Made doc strings consistent to Google style and some code linting
1.2.2 (2017-11-19)
------------------
* Re-release of 1.2.1 for ease of updating pypi page for updated travis & pyup.
2.0.0 (2018-03-23)
------------------
Mimicing Python file object behavior.
* FileReadBackwards no longer creates multiple iterators (a change of behavior from 1.x.y version)
* Adding readline() function retuns one line at a time with a trailing new line and empty string when it reaches end of file.
The fine print: the trailing new line will be `os.linesep` (rather than whichever new line type in the file).
file_read_backwards-2.0.0.dist-info/DESCRIPTION.rst,sha256=UXNL9zcu_H5XjeCfnxqhADk3kQg5WS8qx8_GkyKDnv0,3647
file_read_backwards-2.0.0.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
file_read_backwards-2.0.0.dist-info/METADATA,sha256=SNqkrzocPrhWxro5NSVc6IVjEuy_ivz9UkP5XoTAQ_w,4417
file_read_backwards-2.0.0.dist-info/RECORD,,
file_read_backwards-2.0.0.dist-info/WHEEL,sha256=kdsN-5OJAZIiHN-iO4Rhl82KyS0bDWf4uBwMbkNafr8,110
file_read_backwards-2.0.0.dist-info/metadata.json,sha256=J2rLVwakld4LYHi1CVSoZMAhxTAK-T5i0gbCDienb38,942
file_read_backwards-2.0.0.dist-info/top_level.txt,sha256=J0c-zzN9i4B3noENqqGllyULovoXYowT-_VsvP5obD8,20
file_read_backwards/__init__.py,sha256=EgTdw29vRAhhLjqLt6AIH-trsQOcv9w843hhm43x1tA,182
file_read_backwards/__pycache__/__init__.cpython-36.pyc,,
file_read_backwards/__pycache__/buffer_work_space.cpython-36.pyc,,
file_read_backwards/__pycache__/file_read_backwards.cpython-36.pyc,,
file_read_backwards/buffer_work_space.py,sha256=7OW2fFMeEB_HRamzOQigEabkFiCmLO50_byO9D1E6oM,6446
file_read_backwards/file_read_backwards.py,sha256=Gi-P6vNTWtlR9J_2o0OnWEsDkZdjaJcQGkxstvIICvA,4069
Wheel-Version: 1.0
Generator: bdist_wheel (0.30.0)
Root-Is-Purelib: true
Tag: py2-none-any
Tag: py3-none-any
{"classifiers": ["Development Status :: 5 - Production/Stable", "Intended Audience :: Developers", "License :: OSI Approved :: MIT License", "Natural Language :: English", "Programming Language :: Python :: 2.7", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.3", "Programming Language :: Python :: 3.4", "Programming Language :: Python :: 3.5"], "extensions": {"python.details": {"contacts": [{"email": "robinsquare42@gmail.com", "name": "Robin Robin", "role": "author"}], "document_names": {"description": "DESCRIPTION.rst"}, "project_urls": {"Home": "https://github.com/RobinNil/file_read_backwards"}}}, "generator": "bdist_wheel (0.30.0)", "keywords": ["file_read_backwards"], "license": "MIT license", "metadata_version": "2.0", "name": "file-read-backwards", "summary": "Memory efficient way of reading files line-by-line from the end of file", "test_requires": [{"requires": ["mock"]}], "version": "2.0.0"}
\ No newline at end of file
# -*- coding: utf-8 -*-
from .file_read_backwards import FileReadBackwards # noqa: F401
__author__ = """Robin Robin"""
__email__ = 'robinsquare42@gmail.com'
__version__ = '2.0.0'
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""BufferWorkSpace module."""
import os
new_lines = ["\r\n", "\n", "\r"]
new_lines_bytes = [n.encode("ascii") for n in new_lines] # we only support encodings that's backward compat with ascii
class BufferWorkSpace:
"""It is a helper module for FileReadBackwards."""
def __init__(self, fp, chunk_size):
"""Convention for the data.
When read_buffer is not None, it represents contents of the file from `read_position` onwards
that has not been processed/returned.
read_position represents the file pointer position that has been read into read_buffer
initialized to be just past the end of file.
"""
self.fp = fp
self.read_position = _get_file_size(self.fp) # set the previously read position to the
self.read_buffer = None
self.chunk_size = chunk_size
def add_to_buffer(self, content, read_position):
"""Add additional bytes content as read from the read_position.
Args:
content (bytes): data to be added to buffer working BufferWorkSpac.
read_position (int): where in the file pointer the data was read from.
"""
self.read_position = read_position
if self.read_buffer is None:
self.read_buffer = content
else:
self.read_buffer = content + self.read_buffer
def yieldable(self):
"""Return True if there is a line that the buffer can return, False otherwise."""
if self.read_buffer is None:
return False
t = _remove_trailing_new_line(self.read_buffer)
n = _find_furthest_new_line(t)
if n >= 0:
return True
# we have read in entire file and have some unprocessed lines
if self.read_position == 0 and self.read_buffer is not None:
return True
return False
def return_line(self):
"""Return a new line if it is available.
Precondition: self.yieldable() must be True
"""
assert(self.yieldable())
t = _remove_trailing_new_line(self.read_buffer)
i = _find_furthest_new_line(t)
if i >= 0:
l = i + 1
after_new_line = slice(l, None)
up_to_include_new_line = slice(0, l)
r = t[after_new_line]
self.read_buffer = t[up_to_include_new_line]
else: # the case where we have read in entire file and at the "last" line
r = t
self.read_buffer = None
return r
def read_until_yieldable(self):
"""Read in additional chunks until it is yieldable."""
while not self.yieldable():
read_content, read_position = _get_next_chunk(self.fp, self.read_position, self.chunk_size)
self.add_to_buffer(read_content, read_position)
def has_returned_every_line(self):
"""Return True if every single line in the file has been returned, False otherwise."""
if self.read_position == 0 and self.read_buffer is None:
return True
return False
def _get_file_size(fp):
return os.fstat(fp.fileno()).st_size
def _get_next_chunk(fp, previously_read_position, chunk_size):
"""Return next chunk of data that we would from the file pointer.
Args:
fp: file-like object
previously_read_position: file pointer position that we have read from
chunk_size: desired read chunk_size
Returns:
(bytestring, int): data that has been read in, the file pointer position where the data has been read from
"""
seek_position, read_size = _get_what_to_read_next(fp, previously_read_position, chunk_size)
fp.seek(seek_position)
read_content = fp.read(read_size)
read_position = seek_position
return read_content, read_position
def _get_what_to_read_next(fp, previously_read_position, chunk_size):
"""Return information on which file pointer position to read from and how many bytes.
Args:
fp
past_read_positon (int): The file pointer position that has been read previously
chunk_size(int): ideal io chunk_size
Returns:
(int, int): The next seek position, how many bytes to read next
"""
seek_position = max(previously_read_position - chunk_size, 0)
read_size = chunk_size
# examples: say, our new_lines are potentially "\r\n", "\n", "\r"
# find a reading point where it is not "\n", rewind further if necessary
# if we have "\r\n" and we read in "\n",
# the next iteration would treat "\r" as a different new line.
# Q: why don't I just check if it is b"\n", but use a function ?
# A: so that we can potentially expand this into generic sets of separators, later on.
while seek_position > 0:
fp.seek(seek_position)
if _is_partially_read_new_line(fp.read(1)):
seek_position -= 1
read_size += 1 # as we rewind further, let's make sure we read more to compensate
else:
break
# take care of special case when we are back to the beginnin of the file
read_size = min(previously_read_position - seek_position, read_size)
return seek_position, read_size
def _remove_trailing_new_line(l):
"""Remove a single instance of new line at the end of l if it exists.
Returns:
bytestring
"""
# replace only 1 instance of newline
# match longest line first (hence the reverse=True), we want to match "\r\n" rather than "\n" if we can
for n in sorted(new_lines_bytes, key=lambda x: len(x), reverse=True):
if l.endswith(n):
remove_new_line = slice(None, -len(n))
return l[remove_new_line]
return l
def _find_furthest_new_line(read_buffer):
"""Return -1 if read_buffer does not contain new line otherwise the position of the rightmost newline.
Args:
read_buffer (bytestring)
Returns:
int: The right most position of new line character in read_buffer if found, else -1
"""
new_line_positions = [read_buffer.rfind(n) for n in new_lines_bytes]
return max(new_line_positions)
def _is_partially_read_new_line(b):
"""Return True when b is part of a new line separator found at index >= 1, False otherwise.
Args:
b (bytestring)
Returns:
bool
"""
for n in new_lines_bytes:
if n.find(b) >= 1:
return True
return False
#!/usr/bin/env python
# -*- coding: utf-8 -*-
"""FileReadBackwards module."""
import io
import os
from .buffer_work_space import BufferWorkSpace
supported_encodings = ["utf-8", "ascii", "latin-1"] # any encodings that are backward compatible with ascii should work
class FileReadBackwards:
"""Class definition for `FileReadBackwards`.
A `FileReadBackwards` will spawn a `FileReadBackwardsIterator` and keep an opened file handler.
It can be used as a Context Manager. If done so, when exited, it will close its file handler.
In any mode, `close()` can be called to close the file handler..
"""
def __init__(self, path, encoding="utf-8", chunk_size=io.DEFAULT_BUFFER_SIZE):
"""Constructor for FileReadBackwards.
Args:
path: Path to the file to be read
encoding (str): Encoding
chunk_size (int): How many bytes to read at a time
"""
if encoding.lower() not in supported_encodings:
error_message = "{0} encoding was not supported/tested.".format(encoding)
error_message += "Supported encodings are '{0}'".format(",".join(supported_encodings))
raise NotImplementedError(error_message)
self.path = path
self.encoding = encoding.lower()
self.chunk_size = chunk_size
self.iterator = FileReadBackwardsIterator(io.open(self.path, mode="rb"), self.encoding, self.chunk_size)
def __iter__(self):
"""Return its iterator."""
return self.iterator
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
"""Closes all opened its file handler and propagates all exceptions on exit."""
self.close()
return False
def close(self):
"""Closes all opened it s file handler."""
self.iterator.close()
def readline(self):
"""Return a line content (with a trailing newline) if there are content. Return '' otherwise."""
try:
r = next(self.iterator) + os.linesep
return r
except StopIteration:
return ""
class FileReadBackwardsIterator:
"""Iterator for `FileReadBackwards`.
This will read backwards line by line a file. It holds an opened file handler.
"""
def __init__(self, fp, encoding, chunk_size):
"""Constructor for FileReadBackwardsIterator
Args:
fp (File): A file that we wish to start reading backwards from
encoding (str): Encoding of the file
chunk_size (int): How many bytes to read at a time
"""
self.path = fp.name
self.encoding = encoding
self.chunk_size = chunk_size
self.__fp = fp
self.__buf = BufferWorkSpace(self.__fp, self.chunk_size)
def __iter__(self):
return self
def next(self):
"""Returns unicode string from the last line until the beginning of file.
Gets exhausted if::
* already reached the beginning of the file on previous iteration
* the file got closed
When it gets exhausted, it closes the file handler.
"""
# Using binary mode, because some encodings such as "utf-8" use variable number of
# bytes to encode different Unicode points.
# Without using binary mode, we would probably need to understand each encoding more
# and do the seek operations to find the proper boundary before issuing read
if self.closed:
raise StopIteration
if self.__buf.has_returned_every_line():
self.close()
raise StopIteration
self.__buf.read_until_yieldable()
r = self.__buf.return_line()
return r.decode(self.encoding)
__next__ = next
@property
def closed(self):
"""The status of the file handler.
:return: True if the file handler is still opened. False otherwise.
"""
return self.__fp.closed
def close(self):
"""Closes the file handler."""
self.__fp.close()
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment