如何检查第一个字符是否为字母

How to check if the first character is a letter

本文关键字:是否 字符 第一个 何检查 检查      更新时间:2023-09-26

如何确定字符串的第一个字符是否是字母,而不考虑字母表?

regExpIsLetter  = /^'w/;
regExpIsLetter.test('Å') -> false
regExpIsLetter.test('ç') -> false
regExpIsLetter.test('A') -> true
regExpIsLetter.test('š') -> false !! (czech)
regExpIsLetter.test('ф') -> false !! (cyrilic)
regExpIsLetter.test('ڂ') -> false !! (arabic)

=====================

根据答案,我找到了"解决方案"。这是"简单"正则表达式:

regExpIsLetter = /^['u0041-'u005a'u0061-'u007a'u00aa-'u00aa'u00b5-'u00b5'u00ba-'u00ba'u00c0-'u00d6'u00d8-'u00f6'u00f8-'u0236'u0250-'u02c1'u02c6-'u02d1'u02e0-'u02e4'u02ee-'u02ee'u037a-'u037a'u0386-'u0386'u0388-'u038a'u038c-'u038c'u038e-'u03a1'u03a3-'u03ce'u03d0-'u03f5'u03f7-'u03fb'u0400-'u0481'u048a-'u04ce'u04d0-'u04f5'u04f8-'u04f9'u0500-'u050f'u0531-'u0556'u0559-'u0559'u0561-'u0587'u05d0-'u05ea'u05f0-'u05f2'u0621-'u063a'u0640-'u064a'u066e-'u066f'u0671-'u06d3'u06d5-'u06d5'u06e5-'u06e6'u06ee-'u06ef'u06fa-'u06fc'u06ff-'u06ff'u0710-'u0710'u0712-'u072f'u074d-'u074f'u0780-'u07a5'u07b1-'u07b1'u0904-'u0939'u093d-'u093d'u0950-'u0950'u0958-'u0961'u0985-'u098c'u098f-'u0990'u0993-'u09a8'u09aa-'u09b0'u09b2-'u09b2'u09b6-'u09b9'u09bd-'u09bd'u09dc-'u09dd'u09df-'u09e1'u09f0-'u09f1'u0a05-'u0a0a'u0a0f-'u0a10'u0a13-'u0a28'u0a2a-'u0a30'u0a32-'u0a33'u0a35-'u0a36'u0a38-'u0a39'u0a59-'u0a5c'u0a5e-'u0a5e'u0a72-'u0a74'u0a85-'u0a8d'u0a8f-'u0a91'u0a93-'u0aa8'u0aaa-'u0ab0'u0ab2-'u0ab3'u0ab5-'u0ab9'u0abd-'u0abd'u0ad0-'u0ad0'u0ae0-'u0ae1'u0b05-'u0b0c'u0b0f-'u0b10'u0b13-'u0b28'u0b2a-'u0b30'u0b32-'u0b33'u0b35-'u0b39'u0b3d-'u0b3d'u0b5c-'u0b5d'u0b5f-'u0b61'u0b71-'u0b71'u0b83-'u0b83'u0b85-'u0b8a'u0b8e-'u0b90'u0b92-'u0b95'u0b99-'u0b9a'u0b9c-'u0b9c'u0b9e-'u0b9f'u0ba3-'u0ba4'u0ba8-'u0baa'u0bae-'u0bb5'u0bb7-'u0bb9'u0c05-'u0c0c'u0c0e-'u0c10'u0c12-'u0c28'u0c2a-'u0c33'u0c35-'u0c39'u0c60-'u0c61'u0c85-'u0c8c'u0c8e-'u0c90'u0c92-'u0ca8'u0caa-'u0cb3'u0cb5-'u0cb9'u0cbd-'u0cbd'u0cde-'u0cde'u0ce0-'u0ce1'u0d05-'u0d0c'u0d0e-'u0d10'u0d12-'u0d28'u0d2a-'u0d39'u0d60-'u0d61'u0d85-'u0d96'u0d9a-'u0db1'u0db3-'u0dbb'u0dbd-'u0dbd'u0dc0-'u0dc6'u0e01-'u0e30'u0e32-'u0e33'u0e40-'u0e46'u0e81-'u0e82'u0e84-'u0e84'u0e87-'u0e88'u0e8a-'u0e8a'u0e8d-'u0e8d'u0e94-'u0e97'u0e99-'u0e9f'u0ea1-'u0ea3'u0ea5-'u0ea5'u0ea7-'u0ea7'u0eaa-'u0eab'u0ead-'u0eb0'u0eb2-'u0eb3'u0ebd-'u0ebd'u0ec0-'u0ec4'u0ec6-'u0ec6'u0edc-'u0edd'u0f00-'u0f00'u0f40-'u0f47'u0f49-'u0f6a'u0f88-'u0f8b'u1000-'u1021'u1023-'u1027'u1029-'u102a'u1050-'u1055'u10a0-'u10c5'u10d0-'u10f8'u1100-'u1159'u115f-'u11a2'u11a8-'u11f9'u1200-'u1206'u1208-'u1246'u1248-'u1248'u124a-'u124d'u1250-'u1256'u1258-'u1258'u125a-'u125d'u1260-'u1286'u1288-'u1288'u128a-'u128d'u1290-'u12ae'u12b0-'u12b0'u12b2-'u12b5'u12b8-'u12be'u12c0-'u12c0'u12c2-'u12c5'u12c8-'u12ce'u12d0-'u12d6'u12d8-'u12ee'u12f0-'u130e'u1310-'u1310'u1312-'u1315'u1318-'u131e'u1320-'u1346'u1348-'u135a'u13a0-'u13f4'u1401-'u166c'u166f-'u1676'u1681-'u169a'u16a0-'u16ea'u1700-'u170c'u170e-'u1711'u1720-'u1731'u1740-'u1751'u1760-'u176c'u176e-'u1770'u1780-'u17b3'u17d7-'u17d7'u17dc-'u17dc'u1820-'u1877'u1880-'u18a8'u1900-'u191c'u1950-'u196d'u1970-'u1974'u1d00-'u1d6b'u1e00-'u1e9b'u1ea0-'u1ef9'u1f00-'u1f15'u1f18-'u1f1d'u1f20-'u1f45'u1f48-'u1f4d'u1f50-'u1f57'u1f59-'u1f59'u1f5b-'u1f5b'u1f5d-'u1f5d'u1f5f-'u1f7d'u1f80-'u1fb4'u1fb6-'u1fbc'u1fbe-'u1fbe'u1fc2-'u1fc4'u1fc6-'u1fcc'u1fd0-'u1fd3'u1fd6-'u1fdb'u1fe0-'u1fec'u1ff2-'u1ff4'u1ff6-'u1ffc'u2071-'u2071'u207f-'u207f'u2102-'u2102'u2107-'u2107'u210a-'u2113'u2115-'u2115'u2119-'u211d'u2124-'u2124'u2126-'u2126'u2128-'u2128'u212a-'u212d'u212f-'u2131'u2133-'u2139'u213d-'u213f'u2145-'u2149'u3005-'u3006'u3031-'u3035'u303b-'u303c'u3041-'u3096'u309d-'u309f'u30a1-'u30fa'u30fc-'u30ff'u3105-'u312c'u3131-'u318e'u31a0-'u31b7'u31f0-'u31ff'u3400-'u4db5'u4e00-'u9fa5'ua000-'ua48c'uac00-'ud7a3'uf900-'ufa2d'ufa30-'ufa6a'ufb00-'ufb06'ufb13-'ufb17'ufb1d-'ufb1d'ufb1f-'ufb28'ufb2a-'ufb36'ufb38-'ufb3c'ufb3e-'ufb3e'ufb40-'ufb41'ufb43-'ufb44'ufb46-'ufbb1'ufbd3-'ufd3d'ufd50-'ufd8f'ufd92-'ufdc7'ufdf0-'ufdfb'ufe70-'ufe74'ufe76-'ufefc'uff21-'uff3a'uff41-'uff5a'uff66-'uffbe'uffc2-'uffc7'uffca-'uffcf'uffd2-'uffd7'uffda-'uffdc]/
regExpIsLetter  = /^'w/;
regExpIsLetter.test('Å') -> true
regExpIsLetter.test('ç') -> true
regExpIsLetter.test('A') -> true
regExpIsLetter.test('š') -> true
regExpIsLetter.test('ф') -> true
regExpIsLetter.test('ڂ') -> true
regExpIsLetter.test('4') -> false

生成正则表达式的Groovy代码:

def range = false;
def rangeStart = 0;
System.err.printf('[');
for(int i = 0; i < 65536; ++i) {
    def isLetter = Character.isLetter(i);
    if (!range && isLetter) {
        rangeStart = i;
        range = true;
    }
    else if (range && !isLetter) {
        range = false;
        if( rangeStart != i - 1 ) {
            System.err.printf('''u%04x-''u%04x', rangeStart, i - 1);
        }
        else {
            System.err.printf('''u%04x', rangeStart);
        }
    }
}
System.err.printf(']');

谢谢。

根据ECMA-262,许多JavaScript的字符类都不支持Unicode。看看http://blog.stevenlevithan.com/archives/javascript-regex-and-unicode了解更多信息。

对于出色的JavaScript/Unicode资源,对于一个稍微不同的问题,我的答案再好不过了。

EDIT:使用PHP进行检查,PHP具有/u修饰符,用于将字符串检查为UTF-8,上面的所有示例都通过:

var_dump(preg_match("/^'w/u", "ڂ")); // etc

Javascript中没有内置的对unicode regex的支持。但是它确实支持类似'unnnn-'ummmm的字符。对于你的代码,你可以这样做:

regExpIsLetter = /^['u0000-'u007F'u0080-'u00FF'u0100-'u017F'u0180-'u024F'u0370-'u03FF'u0400-'u04FF'u0500-'u052F'u0590-'u05FF'u0600-'u06FF'u0750-'u077F]/;

这也将匹配标点符号ascii字符,如$#@等。

若要建立您的自定义范围,您可以访问此页面。