Hey, r/LocalLLaMA !
I am finally back with a new model: 🛡️ Shield 82M
It's a finetuned version of distilroberta-base and it's able to filter out all types of PII (Personally identifiable information) of texts in any language.
Here are some examples:
1) Test with name ,email and phone:
Original: My name is John Doe. Email: john@example.com. Phone: +49 123 45678.
Protected: My name is [PERSON]. Email: [EMAIL]. Phone: [PHONE].
2) basic test:
Original: I live in Cambridge
Protected: I live in [ADDRESS]
3) French test (multilingual):
Original: Mon e-mail est [jean.dupont@example.fr](mailto:jean.dupont@example.fr) et mon téléphone est +33 6 12 34 56 78.
Protected: Mon e-mail est [EMAIL] et mon téléphone est [PHONE].
So, we see that this model performs really well with a total accuracy of ~96%.
And: it's completely open-source like all my models. :D
If you want to try it out: https://huggingface.co/LH-Tech-AI/Shield-82M
Have fun with it. :-)
See you in the comments. Would really like to get some feedback from you.
[link] [comments]




