TextPrettifier

TextPrettifier is a Python library for cleaning text data by removing HTML tags, URLs, numbers, special characters, contractions, and stopwords.

TextPrettifier Key Features

1. Removing Emojis

The remove_emojis method removes emojis from the text.

2. Removing Internet Words

The remove_internet_words method removes internet-specific words from the text.

3. Removing HTML Tags

The remove_html_tags method removes HTML tags from the text.

4. Removing URLs

The remove_urls method removes URLs from the text.

5. Removing Numbers

The remove_numbers method removes numbers from the text.

6. Removing Special Characters

The remove_special_chars method removes special characters from the text.

7. Expanding Contractions

The remove_contractions method expands contractions in the text.

8. Removing Stopwords

The remove_stopwords method removes stopwords from the text.

Additional Functionality

If is_lower and is_token are both True, the text is returned in lowercase and as a list of tokens.
If only is_lower is True, the text is returned in lowercase.
If only is_token is True, the text is returned as a list of tokens.
If neither is_lower nor is_token is True, the text is returned as is.

Installation

You can install TextPrettifier using pip:

pip install text-prettifier

from text_prettifier import TextPrettifier

Initialize TextPrettifier

text_prettifier = TextPrettifier()

Example: Remove Emojis

html_text = "Hi,Pythonogist! I ❤️ Python."
cleaned_html = text_prettifier.remove_emojis(html_text)
print(cleaned_html)

Output Hi,Pythonogist! I Python.

Example: Remove HTML tags

html_text = "<p>Hello, <b>world</b>!</p>"
cleaned_html = text_prettifier.remove_html_tags(html_text)
print(cleaned_html)

Output Hello,world!

Example: Remove URLs

url_text = "Visit our website at https://example.com"
cleaned_urls = text_prettifier.remove_urls(url_text)
print(cleaned_urls)

Output Visit our webiste at

Example: Remove numbers

number_text = "There are 123 apples"
cleaned_numbers = text_prettifier.remove_numbers(number_text)
print(cleaned_numbers)

Output There are apples

Example: Remove special characters

special_text = "Hello, @world!"
cleaned_special = text_prettifier.remove_special_chars(special_text)
print(cleaned_special)

Output Hello world

Example: Remove contractions

contraction_text = "I can't do it"
cleaned_contractions = text_prettifier.remove_contractions(contraction_text)
print(cleaned_contractions)

Output I cannot do it

Example: Remove stopwords

stopwords_text = "This is a test"
cleaned_stopwords = text_prettifier.remove_stopwords(stopwords_text)
print(cleaned_stopwords)

Output This test

Example: Apply all cleaning methods

all_text = "<p>Hello, @world!</p> There are 123 apples. I can't do it. This is a test."
all_cleaned = text_prettifier.sigma_cleaner(all_text)
print(all_cleaned)

Output Hello world 123 apples cannot test

If you are interested to tokenized and lower the cleaned text write the code

all_text = "<p>Hello, @world!</p> There are 123 apples. I can't do it. This is a test."
all_cleaned = text_prettifier.sigma_cleaner(all_text,is_token=True,is_lower=True)
print(all_cleaned)

Output ['Hello','world', '123','apples', 'cannot','test']

Note: I didn't include remove_numbers in sigma_cleaner because sometimes numbers carry useful information in term of NLP. If you want to remove number you can apply this method seperately on output of sigma_cleaner.

Contact Information

Feel free to reach out to me on social media:

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

TextPrettifier

TextPrettifier Key Features

1. Removing Emojis

2. Removing Internet Words

3. Removing HTML Tags

4. Removing URLs

5. Removing Numbers

6. Removing Special Characters

7. Expanding Contractions

8. Removing Stopwords

Additional Functionality

Installation

Initialize TextPrettifier

Example: Remove Emojis

Example: Remove HTML tags

Example: Remove URLs

Example: Remove numbers

Example: Remove special characters

Example: Remove contractions

Example: Remove stopwords

Example: Apply all cleaning methods

Contact Information

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

TextPrettifier

TextPrettifier Key Features

1. Removing Emojis

2. Removing Internet Words

3. Removing HTML Tags

4. Removing URLs

5. Removing Numbers

6. Removing Special Characters

7. Expanding Contractions

8. Removing Stopwords

Additional Functionality

Installation

Initialize TextPrettifier

Example: Remove Emojis

Example: Remove HTML tags

Example: Remove URLs

Example: Remove numbers

Example: Remove special characters

Example: Remove contractions

Example: Remove stopwords

Example: Apply all cleaning methods

Contact Information

License