Package logilab :: Package common :: Module textutils
[frames] | no frames]

Module textutils

source code

Some text manipulation utility functions.
Functions
 
unormalize(ustring, ignorenonascii=None, substitute=None)
replace diacritical characters with their corresponding ascii characters
source code
str or unicode
normalize_rest_paragraph(text, line_len=80, indent='')
normalize a ReST text to display it with a maximum line size and optionally arbitrary indentation. Line jumps are normalized. The indentation string may be used top insert a comment mark for instance.
source code
 
splittext(text, line_len)
split the given text on space according to the given max line size
source code
 
split_url_or_path(url_or_path)
return the latest component of a string containing either an url of the form <scheme>://<path> or a local file system path
source code
 
text_to_dict(text)
parse multilines text containing simple 'key=value' lines and return a dict of {'key': 'value'}. When the same key is encountered multiple time, value is turned into a list containing all values.
source code
 
apply_units(string, units, inter=None, final=float, blank_reg=_BLANK_RE, value_reg=_VALUE_RE)
Parse the string applying the units defined in units (e.g.: "1.5m",{'m',60} -> 80).
source code
 
diff_colorize_ansi(lines, out=sys.stdout, style=DIFF_STYLE) source code
    text formatting
str or unicode
unquote(string)
remove optional quotes (simple or double) from the string
source code
str or unicode
normalize_text(text, line_len=80, indent='', rest=False)
normalize a text to display it with a maximum line size and optionally arbitrary indentation. Line jumps are normalized but blank lines are kept. The indentation string may be used to insert a comment (#) or a quoting (>) mark for instance.
source code
str or unicode
normalize_paragraph(text, line_len=80, indent='')
normalize a text to display it with a maximum line size and optionally arbitrary indentation. Line jumps are normalized. The indentation string may be used top insert a comment mark for instance.
source code
str or unicode
pretty_match(match, string, underline_char='^')
return a string with the match location underlined:
source code
str or unicode
colorize_ansi(msg, color=None, style=None)
colorize message by wrapping it with ansi escape codes
source code
    text manipulation
str or unicode
splitstrip(string, sep=',')
return a list of stripped string by splitting the string given as argument on sep (',' by default). Empty string are discarded.
source code
Variables
  linesep = '\n'
  MANUAL_UNICODE_MAP = {u'\xa1': u'!', u'\u0142': u'l', u'\u2044...
  get_csv = deprecated('get_csv is deprecated, use splitstrip')(...
  BYTE_UNITS = {"b": 1, "kb": 1024, "mb": 1024** 2, "gb": 1024**...
  TIME_UNITS = {"ms": 0.0001, "s": 1, "min": 60, "h": 60* 60, "d...
str ANSI_PREFIX = '\033['
ANSI terminal code notifying the start of an ANSI escape sequence
str ANSI_END = 'm'
ANSI terminal code notifying the end of an ANSI escape sequence
str ANSI_RESET = '\033[0m'
ANSI terminal code resetting format defined by a previous ANSI escape sequence
dict(str) ANSI_STYLES = {'reset': "0", 'bold': "1", 'italic': "3", 'unde...
dictionary mapping style identifier to ANSI terminal code
dict(str) ANSI_COLORS = {'reset': "0", 'black': "30", 'red': "31", 'gree...
dictionary mapping color identifier to ANSI terminal code
  DIFF_STYLE = {'separator': 'cyan', 'remove': 'red', 'add': 'gr...
Function Details

unormalize(ustring, ignorenonascii=None, substitute=None)

source code 

replace diacritical characters with their corresponding ascii characters

Convert the unicode string to its long normalized form (unicode character will be transform into several characters) and keep the first one only. The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents.

Parameters:
  • substitute (str) - replacement character to use if decomposition fails

See Also: Another project about ASCII transliterations of Unicode text http://pypi.python.org/pypi/Unidecode

unquote(string)

source code 
remove optional quotes (simple or double) from the string
Parameters:
  • string (str or unicode) - an optionally quoted string
Returns: str or unicode
the unquoted string (or the input string if it wasn't quoted)

normalize_text(text, line_len=80, indent='', rest=False)

source code 
normalize a text to display it with a maximum line size and optionally arbitrary indentation. Line jumps are normalized but blank lines are kept. The indentation string may be used to insert a comment (#) or a quoting (>) mark for instance.
Parameters:
  • text (str or unicode) - the input text to normalize
  • line_len (int) - expected maximum line's length, default to 80
  • indent (str or unicode) - optional string to use as indentation
Returns: str or unicode
the input text normalized to fit on lines with a maximized size inferior to line_len, and optionally prefixed by an indentation string

normalize_paragraph(text, line_len=80, indent='')

source code 
normalize a text to display it with a maximum line size and optionally arbitrary indentation. Line jumps are normalized. The indentation string may be used top insert a comment mark for instance.
Parameters:
  • text (str or unicode) - the input text to normalize
  • line_len (int) - expected maximum line's length, default to 80
  • indent (str or unicode) - optional string to use as indentation
Returns: str or unicode
the input text normalized to fit on lines with a maximized size inferior to line_len, and optionally prefixed by an indentation string

normalize_rest_paragraph(text, line_len=80, indent='')

source code 
normalize a ReST text to display it with a maximum line size and optionally arbitrary indentation. Line jumps are normalized. The indentation string may be used top insert a comment mark for instance.
Parameters:
  • text (str or unicode) - the input text to normalize
  • line_len (int) - expected maximum line's length, default to 80
  • indent (str or unicode) - optional string to use as indentation
Returns: str or unicode
the input text normalized to fit on lines with a maximized size inferior to line_len, and optionally prefixed by an indentation string

splittext(text, line_len)

source code 

split the given text on space according to the given max line size

return a 2-uple: * a line <= line_len if possible * the rest of the text which has to be reported on another line

splitstrip(string, sep=',')

source code 

return a list of stripped string by splitting the string given as argument on sep (',' by default). Empty string are discarded.

>>> splitstrip('a, b, c   ,  4,,')
['a', 'b', 'c', '4']
>>> splitstrip('a')
['a']
>>>
Parameters:
  • string (str or unicode) - a csv line
  • sep (str or unicode) - field separator, default to the comma (',')
Returns: str or unicode
the unquoted string (or the input string if it wasn't quoted)

apply_units(string, units, inter=None, final=float, blank_reg=_BLANK_RE, value_reg=_VALUE_RE)

source code 
Parse the string applying the units defined in units (e.g.: "1.5m",{'m',60} -> 80).
Parameters:
  • string (str or unicode) - the string to parse
  • units (dict (or any object with __getitem__ using basestring key)) - a dict mapping a unit string repr to its value
  • inter (type) - used to parse every intermediate value (need __sum__)
  • blank_reg (regexp) - should match every blank char to ignore.
  • value_reg (regexp with "value" and optional "unit" group) - match a value and it's unit into the

pretty_match(match, string, underline_char='^')

source code 

return a string with the match location underlined:

>>> import re
>>> print(pretty_match(re.search('mange', 'il mange du bacon'), 'il mange du bacon'))
il mange du bacon
   ^^^^^
>>>
Parameters:
  • match (_sre.SRE_match) - object returned by re.match, re.search or re.finditer
  • string (str or unicode) - the string on which the regular expression has been applied to obtain the match object
  • underline_char (str or unicode) - character to use to underline the matched section, default to the carret '^'
Returns: str or unicode
the original string with an inserted line to underline the match location

colorize_ansi(msg, color=None, style=None)

source code 
colorize message by wrapping it with ansi escape codes
Parameters:
  • msg (str or unicode) - the message string to colorize
  • color (str or None) - the color identifier (see ANSI_COLORS for available values)
  • style (str or None) - style string (see ANSI_COLORS for available values). To get several style effects at the same time, use a coma as separator.
Returns: str or unicode
the ansi escaped string
Raises:
  • KeyError - if an unexistent color or style identifier is given

Variables Details

MANUAL_UNICODE_MAP

Value:
{u'\xa1': u'!', u'\u0142': u'l', u'\u2044': u'/', u'\xc6': u'AE', u'\x\
a9': u'(c)', u'\xab': u'"', u'\xe6': u'ae', u'\xae': u'(r)', u'\u0153'\
: u'oe', u'\u0152': u'OE', u'\xd8': u'O', u'\xf8': u'o', u'\xbb': u'"'\
, u'\xdf': u'ss',}

get_csv

Value:
deprecated('get_csv is deprecated, use splitstrip')(splitstrip)

BYTE_UNITS

Value:
{"b": 1, "kb": 1024, "mb": 1024** 2, "gb": 1024** 3, "tb": 1024** 4,}

TIME_UNITS

Value:
{"ms": 0.0001, "s": 1, "min": 60, "h": 60* 60, "d": 60* 60* 24,}

ANSI_STYLES

dictionary mapping style identifier to ANSI terminal code
Type:
dict(str)
Value:
{'reset': "0", 'bold': "1", 'italic': "3", 'underline': "4", 'blink': \
"5", 'inverse': "7", 'strike': "9",}

ANSI_COLORS

dictionary mapping color identifier to ANSI terminal code
Type:
dict(str)
Value:
{'reset': "0", 'black': "30", 'red': "31", 'green': "32", 'yellow': "3\
3", 'blue': "34", 'magenta': "35", 'cyan': "36", 'white': "37",}

DIFF_STYLE

Value:
{'separator': 'cyan', 'remove': 'red', 'add': 'green'}