htmlcutstring python package

20 07 2009

I released php, javascript implementation for cut html string, these programs cut the html string by keeping html tags as it is. Now I released same in python with the name htmlcutstring. Check this at .

It is easy to extract an excerpt of a text string with a given length limit. But if you want to extract an excerpt from HTML, the tags that may exist in the text string make it more complicated.

This module provides a solution to extract excerpts from HTML documents with a given text length limit without counting the length of any HTML tags.

This module is used to cut the string which is having html tags. It does not count the html tags, it just count the string inside tags and keeps the tags as it is.

ex: If the string is “welcome to <b>Python World</b> <br> Python is bla”. and If we want to cut the string of 16 charaters then output will be “welcome to <b>Python</b>”.

Here while cutting the string it keeps the tags for the cutting string and skip the rest and without distorbing the div structure.

obj = HtmlCutString("welcome to <b>Python World</b> <br> Python is",16)
newCutString = obj.cut()

newCutString = cutHtmlString("welcome to <b>Python World</b> <br> Python is",16)



5 responses

13 08 2009

Doesn’t work 😦

obj = h.HtmlCutString(‘welcome to Python World Python is’,16)

C:\Python25\lib\xml\dom\ in parseString(self, string)
221 parser = self.getParser()
222 try:
–> 223 parser.Parse(string, True)
224 self._setup_subset(string)
225 except ParseEscape:

ExpatError: mismatched tag: line 1, column 52

15 08 2009

If the string is not proper html string this error will come. The example given in the location is having error the “” tag is not valid html tag. If I put “” this error will not appear.
I updated this. Thanks che for making a point.

17 08 2009

Not absolutely so. An error because of a tag <br> . Xml.dom.minidom considers as its open tag that from the point of view xml it isn’t correct, but in html – it is admissible. It is strange that for you such error does not arise… Or I not correctly understand something?

18 08 2009

I am also talking about <br> tag in the previous comment, but <br> is not appearing, It is my mistake. The correct way of specifying br tag is <br/>. If you specify like this you don’t get error.
Actually I should raise Exception when this type of errors occur with proper message. I will do that.

18 08 2009

Then htmlcutstring it is necessary to rename in xhtmlcutstring. As is already XHTML 😉

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: