How I Stop Users from Cursing

How did I end up in this life?

The decision to do primary text filtering on the client came suddenly: a back-end developer threw in an idea, I liked it, and the business responded positively.

Maybe tomatoes will be thrown at me now: it's expensive in terms of resources, and in general, this idea makes no sense. But most of our messages fly from the socket, where there is already text filtering from the backend. I decided to do a primary check of swearing on the client to avoid unpleasant surprises.

Regular expression

After spending N time on research, I found open-source regular expression for a Java machine. This regular expression takes all the popular and frequently used variations of obscenities (there are several syntax variations for one word) and flies into the basis of a filter, which I will show a little later. I rewrote it in Swift taking into account all the features of the language and corrected it a little.

I would like to note that this is a primary check for the most popular words and their varieties. No regular expression will cover 100% of cases, because the human factor comes into play: if the user wants to swear, believe me, he this_!will dono matter what protection you have. In our case, the primary filtering when sending messages to the chat will cover about 80-90% of obscene expressions.

Examination

As I said above, we will run the filter through the regular expression and look for matches only in user messages:

func filterSwearWords(in message: String) -> String {
        do {
            let regex = try NSRegularExpression(pattern: swearWordsPattern, options: [.caseInsensitive, .allowCommentsAndWhitespace])
            let range = NSRange(location: 0, length: message.count)
            let matches = regex.matches(in: message, options: [], range: range)
            
            var filteredMessage = message as NSString
            
            // reversed() is to ensure that earlier replacements do not affect the positions of later matches.
            for match in matches.reversed() {
                let matchedWord = filteredMessage.substring(with: match.range)
                let isFirstWord = message.starts(with: matchedWord)
                
                guard matchedWord.count > 2,
                      let firstCharacter = isFirstWord ? matchedWord.first?.uppercased() : matchedWord.first?.lowercased(),
                      let lastCharacter = matchedWord.last else { return message }
                
                let replacement = "\(firstCharacter)***\(lastCharacter)"
                filteredMessage = filteredMessage.replacingCharacters(in: match.range, with: replacement) as NSString
            }
            
            return filteredMessage as String
            
        } catch {
            print("Creating regex error: \(error.localizedDescription)")
            return message
        }
    }

Pay attention to the line where we enter the loop. I put reversed() there because I caught an interesting bug: early replacements affect the positions of later matches.

Then a substring with our match is created. Its contents are checked for the position of this word in the context of the whole message (at the beginning or not).

If a word satisfies the regular expression requirement and has more than two letters, I take the first and last letters and insert *** between them. I decided not to adjust the number of asterisks to the number of characters being replaced. It's a matter of taste, but the fixed option seemed the most suitable.

Well, that's it! Now we just call this method at the moment of sending a message, where we feed the text from our text field, and watch the magic:

Hidden text

Conclusions

Ideally, such a regular expression can and should be received from the backend, so as not to store it locally. It will cope with its task and make communication in the chat “cleaner”. That's all from me, I hope this article was f***ing useful to you!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *