我喜欢以有效的方式用相应的替换字符替换字符串的某组字符.
例如:
String sourceCharacters = "šđćčŠĐĆČžŽ"; String targetCharacters = "sdccSDCCzZ"; String result = replaceChars("Gračišće",sourceCharacters,targetCharacters ); Assert.equals(result,"Gracisce") == true;
有没有比使用String类的replaceAll方法更有效的方法?
我的第一个想法是:
final String s = "Gračišće"; String sourceCharacters = "šđćčŠĐĆČžŽ"; String targetCharacters = "sdccSDCCzZ"; // preparation final char[] sourceString = s.tocharArray(); final char result[] = new char[sourceString.length]; final char[] targetCharactersArray = targetCharacters.tocharArray(); // main work for(int i=0,l=sourceString.length;i<l;++i) { final int pos = sourceCharacters.indexOf(sourceString[i]); result[i] = pos!=-1 ? targetCharactersArray[pos] : sourceString[i]; } // result String resultString = new String(result);
有任何想法吗?
顺便说一句,UTF-8字符引起麻烦,US_ASCII可以正常工作.
解决方法
您可以使用
java.text.Normalizer
和正则表达式来摆脱
diacritics,其中存在的远远超过您收集的数量.
这是一个SSCCE,在Java 6上复制’n’paste’n’run它:
package com.stackoverflow.q2653739; import java.text.Normalizer; import java.text.Normalizer.Form; public class Test { public static void main(String... args) { System.out.println(removeDiacriticalMarks("Gračišće")); } public static String removeDiacriticalMarks(String string) { return Normalizer.normalize(string,Form.NFD) .replaceAll("\\p{InCombiningDiacriticalMarks}+",""); } }
这应该产生
Gracisce
至少,它在Eclipse中将控制台字符编码设置为UTF-8(Window> Preferences> General> Workspace> Text File Encoding).确保在您的环境中也设置了相同的设置.
作为替代方案,维护一个Map< Character,Character>:
Map<Character,Character> charReplacementMap = new HashMap<Character,Character>(); charReplacementMap.put('š','s'); charReplacementMap.put('đ','d'); // Put more here. String originalString = "Gračišće"; StringBuilder builder = new StringBuilder(); for (char currentChar : originalString.tocharArray()) { Character replacementChar = charReplacementMap.get(currentChar); builder.append(replacementChar != null ? replacementChar : currentChar); } String newString = builder.toString();