How secure is a password that uses Chinese characters?

A password that uses Chinese characters can be very secure if the characters are chosen randomly, the password is long enough, and the website or application handles Unicode correctly. Chinese characters do not automatically make a password strong. Predictable phrases, dates, names, and reused passwords remain vulnerable regardless of the character set they draw from.

The question matters because the answer is genuinely split. The math favors Chinese characters – a larger character pool raises theoretical entropy per character. The real-world data tells a more complicated story. A 2019 USENIX Security study analyzed 73.1 million Chinese web passwords and found that many were weaker against online guessing attacks than their English counterparts. This article works through both sides: the entropy math, the behavioral evidence, the Unicode implementation risks, and what IT teams should actually do with this information.

Key takeaways

A larger character set raises theoretical entropy but only when characters are chosen randomly. CJK characters cover tens of thousands of Unicode code points compared to 95 for printable ASCII. That gap is real on paper. It disappears the moment a human picks a recognizable phrase instead of a random string.
Human-chosen Chinese passwords are often weaker than they appear. A 2019 USENIX Security study analyzed 73.1 million real-world Chinese web passwords and found they were more vulnerable to online guessing attacks than English passwords. Pinyin sequences, culturally common digit strings, and familiar phrases are well-represented in language-specific attack dictionaries.
Unicode compatibility is an unsolved problem on many systems. Authentication systems built around ASCII assumptions can reject non-ASCII input, apply inconsistent normalization, count bytes instead of characters, or silently truncate passwords. A password that works at account creation may fail at login, recovery, or on a mobile device.
Length and randomness matter more than which characters you use. NIST, OWASP, and CISA all point to the same foundation: long, unique, randomly generated passwords stored in a password manager, paired with MFA. Character category is a secondary consideration.
Password complexity does not address phishing, credential stuffing, or session theft. Four of the five largest U.S. mega-breaches in 2024 involved stolen or compromised passwords. The character set used was irrelevant. MFA, unique passwords per account, and breached-password blocklists are the controls that reduce real-world risk.

Are Chinese-character passwords actually more secure?

They can be, but not automatically. The security of any password depends on how unpredictable it is to an attacker. A larger character set raises the theoretical number of possible passwords. CJK characters in Unicode cover tens of thousands of code points – compared to 95 for printable ASCII. On paper, that gap is significant.

The problem is that theoretical strength assumes random selection. Human-chosen passwords don't work that way. A password built from a recognizable Chinese phrase, a character sequence tied to a name or date, or a culturally common pattern gives an attacker a much smaller target than the full character set suggests. A language-aware dictionary built from real Chinese passwords can crack 我的密码 (Chinese for "my password") in seconds – regardless of how large the CJK pool technically is.

So the character set matters, but only when the password is generated randomly. A meaningful Chinese phrase and a random CJK string are not the same security proposition.

💡

CJK (Chinese, Japanese, Korean) characters in Unicode cover tens of thousands of code points. That is a meaningful advantage in theory. In practice, the advantage only materializes when the password is generated randomly and the system handles Unicode correctly.

What is character set?

Character set — The collection of distinct characters a password can be drawn from. Standard printable ASCII has 95 characters; a common CJK subset has around 20,000. A larger character set increases the theoretical number of possible passwords for a given length, which raises the cost of a brute-force attack — but only when characters are chosen randomly.

What is a dictionary attack?

Dictionary attack — A method of cracking passwords by systematically testing a pre-built list of likely candidates: common words, names, phrases, keyboard patterns, and known leaked passwords. Unlike brute-force attacks that try every possible combination, dictionary attacks exploit predictable human choices. Language-specific dictionaries — including pinyin sequences and culturally common Chinese phrases — make this attack effective against non-ASCII passwords too.

The entropy math: why CJK characters can add strength

Password entropy measures how many guesses an attacker would need to exhaust all possible passwords of a given type. The standard model is: entropy (in bits) = log₂(character set size) × password length. A higher number means a harder brute-force problem.

The table below shows how different character pools compare under this model. Every figure assumes the password is generated randomly – a condition that human-chosen passwords rarely meet.

Password model	Assumed character pool	Bits per character	Notes
Printable ASCII	95 characters	6.57	Broadly compatible; easy for password managers to generate and autofill.
20,000-character CJK subset	20,000 characters	14.29	Higher theoretical entropy per character; input and system support are harder.
90,000-character CJK/Han-like set	90,000 characters	16.46	Illustrative upper bound; not a practical daily input pool.
Common Chinese phrase	Human-chosen words	Not safely calculable	Vulnerable to language-specific dictionaries regardless of character count.

The numbers look compelling for CJK characters. A randomly chosen character from a 20,000-character pool carries more than twice the entropy of a randomly chosen printable ASCII character. A five-character random CJK password could theoretically match the entropy of a ten-character random ASCII password.

Two caveats apply:

Random selection. The formula assumes every character is chosen with equal probability. A human picking Chinese characters does not behave like a random number generator.
System support. Higher entropy per character does not help if the system rejects, truncates, or mishandles the input. Theoretical strength and practical security are not the same thing.

Unicode 17.0, released in 2025, defines a total of 159,801 characters across all scripts (Unicode Consortium, 2025). That figure is often cited to suggest an enormous password space. It is worth noting that 159,801 is the size of the entire Unicode repertoire – not a realistic pool of characters a user would draw from when creating a password. The practical CJK character pool for most users is the roughly 20,000 characters in common use, not the full Unicode inventory.

The real-world caveat: Chinese users often choose predictable passwords

The most important empirical evidence on this topic comes from a 2019 USENIX Security study by Ding Wang and colleagues at Peking University, Wuhan University, and the University of Virginia. The researchers analyzed 73.1 million real-world Chinese web passwords and 33.2 million English web passwords from nine services, covering social forums, gaming platforms, e-commerce sites, and programmer communities.

Their key finding was what they called bifacial security: Chinese passwords were weaker against online guessing attacks (up to 10,000 guesses) than English passwords, but the passwords that survived those initial guesses were stronger against high-volume offline attacks. At 10 million guesses, their improved cracking algorithm succeeded against 33.2% to 49.8% of the Chinese datasets -- cracking between 92% and 188% more passwords than the prior state of the art. As the IEEE Spectrum summary of the research notes, a password that looks strong by English-language assumptions can be immediately obvious to a Mandarin speaker.

The patterns attackers exploit include:

Pinyin sequences – romanized Chinese, such as "woaini" ("I love you"), which password strength meters at major services rated as "strong" despite being trivially guessable by Mandarin speakers.
Culturally common digit strings – "5201314" sounds like "I love you forever" in Chinese; "520" alone is a common shorthand.
Phone-number fragments – Chinese users include mobile numbers in passwords more often than English-speaking users.
Birthday and date formats – embedded in passwords at higher rates than in English-language datasets.
Digit-only strings – "123456," "111111," "123321," and similar sequences appear at high frequency.
Interleaved patterns – alternating letters and digits in formats like "a12345" or "12345a".

None of this means Chinese-speaking users are less security-conscious. It means that any language community develops predictable patterns, and attackers build dictionaries to match. The practical lesson: using Chinese characters does not bypass dictionary attacks. It shifts which dictionary the attacker reaches for.

Passwork's password generator creates long, random credentials that avoid all of these patterns — regardless of which character set you're working with. See how it works

Unicode compatibility risks: why some sites reject or break these passwords

Many authentication systems were built around ASCII assumptions and have never been fully updated. The result is a set of failure modes that can lock users out, silently weaken their passwords, or make recovery impossible.

A few definitions help here. UTF-8 is the most common encoding for Unicode text on the web – it represents each Unicode code point as one to four bytes. A Unicode code point is the unique number assigned to each character. Unicode normalization is the process of converting visually equivalent character sequences into a canonical form; NFC (Normalization Form Composed) is the most common standard for text storage. Visually similar characters are different code points that look identical on screen which can cause login failures if the stored and entered forms differ.

Risk	Why it matters	Advice for users	Advice for IT teams
Rejection of non-ASCII input	The password may not be accepted at all.	Test account creation, login, recovery, and mobile access before committing to it.	Remove character bans that have no specific technical justification.
Inconsistent normalization	The same visible password may hash differently depending on how the system normalizes input.	Avoid combining character sequences for important accounts.	Define and document normalization behavior; apply it consistently at every input point.
Silent truncation	Characters beyond a byte or character limit may be silently dropped.	Avoid systems that truncate without warning; test with a long password.	Never truncate silently; enforce a clear maximum and return an explicit error.
Input-method dependency	Users may not be able to type the password on every device or keyboard layout.	Confirm access from mobile devices, emergency recovery flows, and any device you might use in a crisis.	Test Unicode input across web, mobile, SSO, API, and helpdesk recovery paths.

The input-method problem deserves emphasis. A password typed with an IME (input method editor) on a desktop may be impossible to reproduce on a locked-down corporate device, a hotel computer, or a phone with a different keyboard app. For a master password or a recovery credential, that is a serious usability risk.

What modern password guidance says about Unicode characters

OWASP's Authentication Cheat Sheet is direct: allow all characters, including Unicode and whitespace. It recommends against composition rules that restrict character types, sets a minimum password length tied to whether MFA is enabled (8 characters with MFA, 15 without, per NIST SP 800-63B), and requires a maximum of at least 64 characters with no silent truncation. It also recommends blocking passwords that appear in breached-password datasets.

CISA's strong-password guidance recommends passwords that are at least 16 characters long, random, and unique per account – stored in a password manager and paired with phishing-resistant MFA. The guidance does not restrict character sets.

NIST's user-facing guidance frames passwords as inherently insecure and recommends moving toward MFA and passkeys wherever possible. It notes that offline attacks can attempt an enormous number of guesses – making password length and randomness the primary defenses against cracking, not character category.

The consistent thread across all three sources: length and randomness matter more than which characters you use. Unicode characters are permitted and can help, but they are not a substitute for length, uniqueness, and a password manager.

Should you use Chinese characters in your own password?

For most accounts, the answer is: let your password manager decide. A randomly generated 20-character ASCII password from a password manager has high entropy, works on every system, and requires no manual typing. That is the baseline.

Chinese characters make sense in a narrower set of circumstances: the user can type them reliably on every device they use, the service demonstrably supports Unicode at every touchpoint (login, recovery, mobile, API), and the resulting password is long, unique, and not a recognizable phrase.

Scenario	Recommended approach	Reason
Password manager can generate and autofill	Long random password, usually ASCII-compatible	High entropy and broad compatibility with no manual typing required.
Password must be memorized	Long passphrase of unrelated words	Easier to type across devices; less dependent on Unicode support.
User wants to use Chinese characters	Use them as part of a longer unique password only after testing Unicode support end-to-end	Adds possible entropy but introduces compatibility risks.
Enterprise account	Follow policy: minimum 16 characters, unique, MFA required, breached-password blocklist active	Reduces real-world account compromise risk across the organization.
High-risk account	Strong unique password plus MFA or passkeys	Complexity alone does not protect against phishing or stolen credentials.

The one scenario where Chinese characters clearly add value: a password that an attacker could not realistically include in any dictionary, generated randomly, used on a system with verified Unicode support. Outside that scenario, the compatibility costs often outweigh the entropy gains.

What Chinese characters cannot protect against

Entropy is a defense against guessing and cracking. It does not address the other ways credentials get compromised.

The ITRC's 2024 Annual Data Breach Report recorded 3,158 U.S. data compromises and 1,350,835,988 breach notices in 2024 – a 211% increase in notices from 2023. Four of the five largest mega-breaches involved stolen or compromised passwords. Attacks against Ticketmaster, AT&T, and Change Healthcare, among others, could have been blocked with MFA or passkeys. The character complexity of those passwords was irrelevant.

The threats that password complexity cannot address:

Phishing -- a convincing fake login page captures the password regardless of how it was constructed
Keylogging and malware -- credentials are captured at input before encryption applies
Session theft -- an attacker who steals an authenticated session token bypasses the password entirely
Credential stuffing -- reused passwords from one breach are tested against other services; uniqueness is the only defense
Password reuse -- a strong Chinese-character password used across five accounts is five times as exposed
Social engineering -- an attacker who convinces a help desk to reset an account never touches the password
Compromised password manager vault -- if the vault is breached and the master password is weak, all stored credentials are at risk

The controls that address these threats are MFA, passkeys, unique passwords per account, breached-password blocklists, phishing-resistant authentication, and regular security audits. A more complex password is one layer. It is not a substitute for the others.

Conclusion

Chinese characters can improve a password's theoretical strength – but only under the same conditions that make any password strong: sufficient length, genuine randomness, uniqueness across accounts, and a system that handles Unicode correctly. A meaningful Chinese phrase, a pinyin sequence, or a culturally familiar number string does not meet those conditions. The USENIX research on 73.1 million Chinese web passwords makes that clear.

For most users, the practical answer is a password manager generating long, random credentials, paired with MFA or passkeys on any account that supports them. For IT teams, the priority is building authentication systems that allow Unicode without breaking it -- and enforcing length, uniqueness, and breached-password checks as the foundation of any password policy.

For organizations managing credentials across teams and systems, a corporate password manager such as Passwork helps generate, store, share, and audit unique credentials while giving administrators the controls they need to enforce consistent password practices.

Strong credentials are one layer of a working security posture. Passwork gives IT teams the infrastructure to manage that layer at scale — self-hosted or cloud, auditable, and built for enterprise environments. Try Passwork free

FAQ

Are Chinese characters better than special characters in passwords?

Chinese characters can offer a larger theoretical character pool than the standard set of special characters, which gives higher entropy per randomly chosen character. In practice, randomness and length matter more than which category of character you use. A long random password using printable ASCII is stronger than a short meaningful Chinese phrase.

Is a short Chinese password secure?

Not reliably. A short password from a large character set can have reasonable theoretical entropy if chosen randomly, but short passwords remain vulnerable to offline cracking as hardware improves. A five-character random CJK password is not a substitute for a 16-character or longer password. Length and randomness together determine real-world strength.

Can I use pinyin as a password?

Pinyin alone is a poor choice. Romanized Chinese is a well-known pattern, and attackers build language-specific dictionaries that include common pinyin sequences, names, and phrases. The USENIX research found that pinyin-based passwords were among the most successfully cracked in the Chinese dataset. Pinyin combined with other random elements in a longer password is less predictable, but a password manager-generated credential is safer.

Do all websites allow Chinese characters in passwords?

No. Many systems reject non-ASCII input, apply inconsistent Unicode normalization, count bytes instead of characters, or silently truncate long strings. Before relying on Chinese characters for any important account, test the full authentication flow: account creation, login, password change, recovery, and mobile access. If any step fails, use a compatible password instead.

Are emojis safer than Chinese characters?

Emojis carry the same Unicode compatibility risks as CJK characters and introduce additional problems: emoji code points can change across Unicode versions, rendering varies across platforms, and input on many devices is slow and unreliable. They are not automatically more secure. The same conditions apply -- randomness, length, and verified system support.

Should a password manager generate Chinese characters?

Most password managers default to ASCII-compatible character sets for good reason: broad compatibility, reliable autofill, and no input-method dependency. If you want to include CJK characters, verify that the target service handles Unicode correctly end-to-end before enabling it. For most accounts, a long random ASCII password is the safer and more practical choice.

Do Chinese characters stop credential stuffing?

No. Credential stuffing attacks replay passwords stolen from one breach against other services. The defense is uniqueness -- one password per account -- not complexity. A unique 16-character ASCII password stops credential stuffing just as effectively as a unique Chinese-character password. Breached-password blocklists and MFA add further protection.

What is the best practical recommendation?

Use a password manager to generate long, unique, random passwords for every account. Enable MFA or passkeys wherever the service supports it. If you want to use Chinese characters, verify Unicode support on every authentication path first. The combination of unique passwords, a password manager, and MFA addresses the full range of real-world credential threats.