htmlcutstring python package

20 07 2009

I released php, javascript implementation for cut html string, these programs cut the html string by keeping html tags as it is. Now I released same in python with the name htmlcutstring. Check this at http://pypi.python.org/pypi/htmlcutstring/1.0 .

It is easy to extract an excerpt of a text string with a given length limit. But if you want to extract an excerpt from HTML, the tags that may exist in the text string make it more complicated.

This module provides a solution to extract excerpts from HTML documents with a given text length limit without counting the length of any HTML tags.

This module is used to cut the string which is having html tags. It does not count the html tags, it just count the string inside tags and keeps the tags as it is.

ex: If the string is “welcome to <b>Python World</b> <br> Python is bla”. and If we want to cut the string of 16 charaters then output will be “welcome to <b>Python</b>”.

Here while cutting the string it keeps the tags for the cutting string and skip the rest and without distorbing the div structure.

USAGE1:
obj = HtmlCutString("welcome to <b>Python World</b> <br> Python is",16)
newCutString = obj.cut()

USAGE2:
newCutString = cutHtmlString("welcome to <b>Python World</b> <br> Python is",16)
Advertisements

Actions

Information

5 responses

13 08 2009
che

Doesn’t work 😦

obj = h.HtmlCutString(‘welcome to Python World Python is’,16)

C:\Python25\lib\xml\dom\expatbuilder.py in parseString(self, string)
221 parser = self.getParser()
222 try:
–> 223 parser.Parse(string, True)
224 self._setup_subset(string)
225 except ParseEscape:

ExpatError: mismatched tag: line 1, column 52

15 08 2009
Prajwala

If the string is not proper html string this error will come. The example given in the http://pypi.python.org/pypi/htmlcutstring/1.0 location is having error the “” tag is not valid html tag. If I put “” this error will not appear.
I updated this. Thanks che for making a point.

17 08 2009
che

Not absolutely so. An error because of a tag <br> . Xml.dom.minidom considers as its open tag that from the point of view xml it isn’t correct, but in html – it is admissible. It is strange that for you such error does not arise… Or I not correctly understand something?

18 08 2009
Prajwala

I am also talking about <br> tag in the previous comment, but <br> is not appearing, It is my mistake. The correct way of specifying br tag is <br/>. If you specify like this you don’t get error.
Actually I should raise Exception when this type of errors occur with proper message. I will do that.

18 08 2009
che

Then htmlcutstring it is necessary to rename in xhtmlcutstring. As is already XHTML 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: