Back

Cryptography

Latest — Jul 14, 2025
Private password breach checking: A new algorithm for secure password validation

Table of contents

Introduction

Data breaches have become routine: millions of users worldwide face the consequences of compromised passwords. The scale is staggering: billions of credentials are exposed, fueling automated attacks and credential stuffing on a massive scale. Services like "Have I Been Pwned" now track over 12 billion breached accounts, and that number keeps growing.

Security professionals and users face a direct challenge: how can we check if a password has been compromised in a data breach without revealing the password itself to the checking service? The task sounds simple, but in reality, it requires a delicate balance between privacy, security, and performance.

Traditional approaches force a trade-off. Direct hash lookups are fast but unsafe: they expose the full hash, risking password leaks. More sophisticated cryptographic protocols offer strong privacy guarantees but come with significant computational overhead and implementation complexity that makes them impractical for many real-world applications.

We’re introducing a solution that bridges this gap: Private password breach checking using obfuscated deterministic bloom filter indices. This innovative approach provides strong privacy guarantees while maintaining the efficiency needed for practical deployment in password managers, authentication systems, and enterprise security infrastructure.

Existing solutions and their tradeoffs

To understand the significance of our new approach, it's important to examine the current methods for password breach checking and their inherent limitations.

Direct hash lookup: Simple but insecure

The earliest password breach checking services, such as LeakedSource, employed a straightforward approach: users would submit the SHA-1 hash of their password, and the service would check if that exact hash appeared in their breach database. Although simple to deploy and very fast to apply, this method is insecure and prone to potential attacks.

When a user submits their password hash directly, they're essentially handing over a cryptographic fingerprint of their password to the service. This creates several attack vectors: malicious actors could perform rainbow table attacks against the submitted hash, launch focused dictionary attacks targeting that specific hash, or correlate the same password across multiple services. The fundamental problem is that the hash itself becomes a valuable piece of information that can be exploited.

K-anonymity: A step forward with remaining vulnerabilities

Recognizing the security issues with direct hash submission, Troy Hunt introduced the k-anonymity approach for the "Have I Been Pwned" service, which has since been adopted by major companies including Cloudflare and Microsoft. This method represents a significant improvement in privacy protection while maintaining reasonable performance characteristics.

In the k-anonymity approach, instead of sending the full password hash, the client computes the SHA-1 hash of their password and sends only the first 5 hexadecimal characters (representing 20 bits) to the server. The server then returns all hashes in its database that begin with that prefix, typically between 400 and 800 hashes. The client then checks locally whether their full hash appears in the returned list.

This approach offers several advantages: it's simple to implement, provides reasonable privacy protection, and uses bandwidth efficiently. However, recent security analysis has revealed significant vulnerabilities. The method still leaks 20 bits of entropy about the password, and research has demonstrated that this partial information can increase password cracking success rates by an order of magnitude when attackers have access to the leaked prefixes. The approach is particularly vulnerable to targeted attacks against highvalue accounts, where even partial information can be valuable to sophisticated adversaries.

Cryptographic protocols: Strong privacy at a high cost

At the other end of the spectrum, advanced cryptographic protocols offer robust privacy guarantees but come with substantial implementation and performance costs. Two primary approaches have emerged in this category: Oblivious Pseudorandom Functions (OPRF) and Private Set Intersection (PSI).

The OPRF approach, used in Google's Password Checkup service and Apple's iCloud Keychain, employs a sophisticated cryptographic dance. The client first "blinds" its password hash using a random value, creating a masked version that reveals nothing about the original password. The server then applies a pseudorandom function to this blinded value without learning anything about the underlying password. Finally, the client "unblinds" the result and checks if the final value exists in a pre-downloaded set of breached identifiers.

Private Set Intersection protocols take a different approach, using advanced cryptographic techniques like homomorphic encryption or garbled circuits. These protocols allow a client to learn the intersection of its password set and the server's breach database without either party revealing their complete set to the other.

While these cryptographic approaches provide excellent privacy guarantees with no information leakage, they come with significant drawbacks. They require complex implementations involving elliptic curve cryptography, impose high computational costs that can be 100 to 1000 times slower than simple hash operations, and in some PSI protocols, require substantial bandwidth for large breach sets. These factors make them impractical for many real-world applications, particularly those requiring real-time password validation or deployment on resource-constrained devices.

Local and offline approaches: Perfect privacy with practical limitations

Some organizations have opted for local or offline approaches to achieve perfect privacy. There are services like "Have I Been Pwned" that offer downloadable password lists, allowing organizations to download the entire breach database (approximately 25GB uncompressed, 11GB compressed) and perform searches locally. Organizations can also build local Bloom filters from these datasets, reducing storage requirements to around 860MB for 500 million passwords with a 0.1% false positive rate.

While local approaches provide perfect privacy since no network communication is required, they present their own challenges. Storage requirements can be prohibitive, especially for mobile applications. Keeping the local database synchronized with new breaches requires regular updates, and the approach is generally impractical for most enduser applications, particularly on mobile devices with limited storage capacity.

Our innovation: Obfuscated deterministic bloom filter indices

Our new algorithm represents a fundamental breakthrough in password breach checking by introducing a new approach that combines the efficiency of Bloom filters with sophisticated obfuscation techniques. The result is a system that provides strong privacy guarantees while maintaining the performance characteristics needed for real-world deployment.

Understanding bloom filters: The foundation

To understand our approach, it's helpful to first grasp the concept of a Bloom filter. A Bloom filter is a space-efficient probabilistic data structure designed to test whether an element is a member of a set. Think of it as a highly compressed representation of a large dataset that can quickly answer the question "Is this item definitely not in the set?" or "This item might be in the set."

The beauty of Bloom filters lies in their efficiency. Instead of storing the actual password hashes, a Bloom filter represents the breach database as a large array of bits. When a password hash is added to the filter, multiple hash functions are applied to generate several index positions in the bit array, and those positions are set to 1. To check if a password might be compromised, the same hash functions are applied to generate the same index positions, and if all those positions contain 1, the password might be in the breach database.

The probabilistic nature of Bloom filters means they can produce false positives (indicating a password might be breached when it actually isn't) but never false negatives (they will never miss a password that is actually breached). This characteristic makes them perfect for security applications where it's better to err on the side of caution.

The core innovation: Deterministic obfuscation

The key insight behind our algorithm is that while Bloom filters are efficient, directly querying specific bit positions would still reveal information about the password being checked. Our solution introduces a sophisticated obfuscation mechanism that hides the real query among carefully crafted noise.

The algorithm operates on a simple but powerful principle: when checking a password, instead of requesting only the bit positions that correspond to that password, the client also requests additional "noise" positions that are generated deterministically but appear random to the server. This creates a situation where the server cannot distinguish between the real query positions and the fake ones, effectively hiding the password being checked.

What makes this approach particularly elegant is the use of deterministic noise generation. Unlike random noise, which would create different query patterns each time the same password is checked, our deterministic approach ensures that checking the same password always generates the same set of noise positions. This consistency is crucial for both security and efficiency reasons.

How the algorithm works: A three-phase process

Our algorithm operates through three distinct phases, each designed to maintain privacy while ensuring efficient operation.

Phase 1: Server setup
The server begins by taking a comprehensive set of compromised password hashes from known data breaches. These hashes are then used to populate a large Bloom filter bit array. For each compromised password hash, multiple hash functions are applied to generate several index positions in the bit array, and those positions are marked as 1. The result is a compact representation of millions or billions of compromised passwords that can be queried efficiently.

Phase 2: Client query generation
When a client wants to check a password, the process begins by computing a cryptographic hash of the password. The client then generates two sets of indices: the "true indices" that correspond to the password being checked, and "noise indices" that serve as decoys.

The true indices are generated by applying the same hash functions used by the server to the password hash. These are the positions in the Bloom filter that would need to be checked to determine if the password is compromised.

The noise indices are generated using a pseudorandom function keyed with a secret that only the client knows. This secret ensures that the noise appears random to the server but is deterministic for the client. The number of noise indices is carefully chosen to provide strong privacy guarantees while maintaining efficiency.

Once both sets of indices are generated, they are combined and shuffled in a deterministic but unpredictable manner. This shuffling ensures that the server cannot distinguish between real and fake indices based on their position in the query.

Phase 3: Query processing and response
The client sends the shuffled set of indices to the server, which responds with the bit values at each requested position. The server has no way to determine which indices correspond to the actual password being checked and which are noise.

Upon receiving the response, the client examines only the bit values corresponding to the true indices. If any of these positions contains a 0, the password is definitively not compromised. If all true index positions contain 1, the password may be compromised, though there's a small possibility of a false positive due to the probabilistic nature of Bloom filters.

The power of deterministic noise

The deterministic nature of our noise generation provides several crucial advantages over alternative approaches. When the same password is checked multiple times, the exact same query is sent to the server each time. This consistency prevents correlation attacks where an adversary might try to identify patterns across multiple queries for the same password.

In contrast, if random noise were used, repeated queries for the same password would generate different noise patterns each time. A sophisticated adversary could potentially analyze multiple queries and identify the common elements, gradually narrowing down the true indices. Our deterministic approach eliminates this vulnerability entirely.

The deterministic noise also provides computational efficiency benefits. Since the same password always generates the same query, clients can cache results, and the system can optimize for repeated queries without compromising security.

Key benefits: Bridging the privacy-performance gap

Our algorithm delivers a unique combination of benefits that address the fundamental challenges in password breach checking, offering a practical solution that doesn't force users to choose between privacy and performance.

Strong privacy guarantees

The algorithm provides robust privacy protection through several mechanisms. The deterministic obfuscation ensures that queries for different passwords are computationally indistinguishable to the server. Even with access to vast computational resources and knowledge of common passwords, an adversarial server cannot determine which password is being checked based solely on the query pattern.

The system is specifically designed to resist correlation attacks, where an adversary attempts to learn information by analyzing multiple queries over time. Because the same password always generates the same query pattern, repeated checks don't provide additional information that could compromise privacy. This stands in stark contrast to systems using random noise, where multiple queries for the same password would eventually reveal the true query pattern.

Operating under an honest-but-curious threat model, the algorithm assumes the server will follow the protocol yet may attempt to extract information from observed queries. Our approach ensures that even a sophisticated adversary with access to public breach databases and the ability to store and analyze all queries over time cannot extract meaningful information about the passwords being checked.

Exceptional performance characteristics

One of the most compelling aspects of our algorithm is its performance profile. Experimental evaluation demonstrates that the system achieves sub-millisecond query times, making it suitable for real-time password validation scenarios. This performance is achieved through the efficient nature of Bloom filter operations and the streamlined query process.

The bandwidth overhead is minimal, typically requiring less than 1KB per query. This efficiency makes the algorithm practical for mobile applications and environments with limited network connectivity. The low bandwidth requirements also reduce server costs and improve scalability for service providers.

The computational overhead on both client and server sides is minimal. Clients need only perform basic cryptographic hash operations and simple bit manipulations. Servers can respond to queries with straightforward bit array lookups. This simplicity stands in stark contrast to cryptographic protocols that require complex elliptic curve operations or homomorphic encryption computations.

Scalability and practical deployment

Built for real-world deployment, the algorithm ensures that server-side infrastructure can efficiently process millions of concurrent queries while keeping response times consistent. The Bloom filter representation allows for compact storage of massive breach databases, making it economically feasible to maintain comprehensive breach checking services.

The system supports easy updates as new breaches are discovered. New compromised passwords can be added to the Bloom filter without requiring changes to the client-side implementation or forcing users to update their software. This flexibility is crucial for maintaining up-to-date protection against emerging threats.

Robust resistance to denial-of-service attacks is another advantage. The lightweight nature of query processing means that servers can handle high query volumes without significant resource consumption. Because queries are deterministic, effective caching can further boost performance and reduce server load.

Compatibility and integration

Our approach is designed to integrate seamlessly with existing security infrastructure. The algorithm can be implemented as a drop-in replacement for existing password breach checking mechanisms without requiring significant changes to client applications. Password managers, authentication systems, and enterprise security tools can adopt the algorithm with minimal modification to their existing codebases.

The system is compatible with various deployment models, from cloud-based services to on-premises installations. Organizations can choose to operate their own breach checking infrastructure using our algorithm while maintaining the same privacy and performance benefits.

The algorithm also supports various customization options to meet specific security requirements. Organizations can adjust the noise levels, Bloom filter parameters, and other configuration options to balance privacy, performance, and storage requirements according to their specific needs.

Real-world applications: Transforming password security

The practical benefits of our algorithm translate into significant improvements across a wide range of security applications and use cases. The combination of strong privacy guarantees and high performance opens up new possibilities for password security that were previously impractical or impossible.

Password managers: Enhanced security without compromise

Password managers represent one of the most compelling applications for our algorithm. These tools are responsible for generating, storing, and managing passwords for millions of users, making them a critical component of modern digital security. However, traditional password managers have faced challenges in implementing comprehensive breach checking due to privacy and performance constraints.

With our algorithm, password managers can now offer real-time breach checking for all stored passwords without compromising user privacy. When users save a new password or during periodic security audits, the password manager can instantly verify whether the password has appeared in known data breaches. This capability enables password managers to provide immediate feedback to users, encouraging them to change compromised passwords before they can be exploited.

The low latency and minimal bandwidth requirements make it practical to check passwords in real-time as users type them during password creation. This immediate feedback can guide users toward stronger, uncompromised passwords without creating friction in the user experience. The privacy guarantees ensure that even the password manager service provider cannot learn about the specific passwords being checked, maintaining the trust that is essential for these security tools.

Authentication systems: Proactive security measures

Modern authentication systems can leverage our algorithm to implement proactive security measures that protect users from credential-based attacks. During login attempts, authentication systems can check submitted passwords against breach databases in realtime, identifying potentially compromised credentials before they can be used maliciously.

This capability enables authentication systems to implement adaptive security policies. For example, if a user attempts to log in with a password that has been found in a data breach, the system can require additional authentication factors, prompt for a password change, or temporarily restrict account access until the user updates their credentials. These measures can significantly reduce the success rate of credential stuffing attacks and other passwordbased threats.

The algorithm's performance characteristics make it suitable for high-volume authentication scenarios, such as enterprise login systems or consumer web services with millions of users. The sub-millisecond query times ensure that breach checking doesn't introduce noticeable delays in the authentication process, maintaining a smooth user experience while enhancing security.

Enterprise security infrastructure: Comprehensive protection

Large organizations face unique challenges in password security due to the scale and complexity of their IT environments. Our algorithm provides enterprise security teams with powerful tools for implementing comprehensive password security policies across their organizations.

Enterprise security systems can use the algorithm to continuously monitor employee passwords against breach databases, identifying compromised credentials before they can be exploited by attackers. This monitoring can be integrated with existing identity and access management systems, automatically triggering password reset requirements when compromised credentials are detected.

The algorithm also supports compliance requirements by providing organizations with the ability to demonstrate that they are actively monitoring for compromised credentials. Many regulatory frameworks and security standards require organizations to implement measures for detecting and responding to credential compromise, and our algorithm provides a practical, privacy-preserving solution for meeting these requirements. For organizations with strict data privacy requirements, the algorithm's privacy guarantees ensure that sensitive password information never leaves the organization's control. This capability is particularly important for organizations in regulated industries or those handling sensitive personal information.

Consumer applications: Democratizing security

The efficiency and simplicity of our algorithm make it practical to implement in consumer applications that previously couldn't afford the overhead of comprehensive breach checking. Mobile applications, web browsers, and other consumer software can now offer enterprise-grade password security features without requiring significant computational resources or complex cryptographic implementations.

Web browsers can integrate the algorithm to provide real-time feedback when users create or update passwords on websites. This integration can help users avoid reusing compromised passwords across multiple sites, reducing their exposure to credential stuffing attacks. The low bandwidth requirements make this practical even on mobile networks with limited connectivity.

Consumer applications can also use the algorithm to implement security dashboards that help users understand and improve their overall password security posture. By checking all of a user's passwords against breach databases, these applications can provide personalized recommendations for improving security without compromising the privacy of individual passwords.

Service providers: Enabling privacy-preserving security services

Our algorithm creates new opportunities for service providers to offer privacy-preserving security services. Companies can build breach checking services that provide strong privacy guarantees to their customers, enabling new business models and service offerings that were previously impractical due to privacy concerns.

The algorithm's efficiency makes it economically viable to operate large-scale breach checking services. The low computational and bandwidth requirements reduce operational costs, making it possible to offer these services at scale while maintaining reasonable pricing. The ability to handle high query volumes also enables service providers to serve large customer bases without significant infrastructure investments.

Service providers can also offer the algorithm as a component of broader security platforms, integrating breach checking with other security services such as threat intelligence, vulnerability management, and security monitoring. This integration can provide customers with comprehensive security solutions that address multiple aspects of cybersecurity while maintaining strong privacy protections.

Conclusion: A new era in password security

The introduction of our Private password breach checking algorithm using obfuscated deterministic bloom filter indices represents a significant advancement in the field of password security. By successfully bridging the gap between privacy and performance, we have created a solution that makes comprehensive password breach checking practical for a wide range of applications and use cases.

The algorithm's key innovations — deterministic noise generation, efficient Bloom filter operations, and sophisticated obfuscation techniques — combine to deliver a system that provides strong privacy guarantees while maintaining the performance characteristics needed for real-world deployment. With sub-millisecond query times and minimal bandwidth overhead, the algorithm makes it possible to implement real-time password breach checking in applications ranging from consumer password managers to enterprise authentication systems.

The privacy guarantees provided by our algorithm are particularly significant in today's regulatory environment, where data protection and user privacy are increasingly important considerations. By ensuring that password information never needs to be revealed to checking services, our algorithm enables organizations to implement comprehensive security measures while maintaining compliance with privacy regulations and user expectations.

The practical impact of this technology extends far beyond technical improvements. By making privacy-preserving password breach checking accessible and efficient, we are enabling a new generation of security tools and services that can better protect users from the growing threat of credential-based attacks. The algorithm's compatibility with existing infrastructure and ease of implementation mean that these benefits can be realized quickly and broadly across the security ecosystem.

As cyber threats continue to evolve and data breaches become increasingly common, the need for effective password security measures will only grow. Our algorithm provides a foundation for building more secure, privacy-preserving systems that can adapt to meet these challenges while maintaining the usability and performance that users expect.

The development of this algorithm represents just the beginning of our work in privacypreserving security technologies. We are committed to continuing research and development in this area, exploring new applications and improvements that can further enhance the security and privacy of digital systems.

We believe that the future of cybersecurity lies in solutions that don't force users to choose between security and privacy. Our Private password breach checking algorithm demonstrates that it is possible to achieve both goals simultaneously, providing a model for future innovations in security technology.

For organizations and developers interested in implementing this technology, we encourage you to explore the detailed technical specifications and implementation guidance provided in our comprehensive research paper. The paper includes formal security analysis, detailed implementation recommendations, and comprehensive performance evaluations that provide the foundation for successful deployment of this algorithm in production environments.

For complete technical details, implementation guidance, and formal security analysis, please refer to our full research paper: Private password breach-checking using obfuscated deterministic bloom filter indices.
* The research paper includes detailed mathematical proofs, comprehensive performance benchmarks, and complete implementation examples for developers interested in integrating this technology into their applications.

Private password breach checking: A new algorithm for secure password validation

Jul 19, 2023 — 3 min read

Symmetric algorithms, forming the backbone of modern cryptography, offer a secure method of encrypting and decrypting data utilizing a single shared key. They have been widely adopted for their unmatched speed and efficiency. Like any other technology, symmetric algorithms come with their own set of benefits and drawbacks. This article seeks to offer a comprehensive review of the pros and cons of symmetric algorithms, providing a deeper understanding of their integral role in data security and the potential challenges they entail.

Pros of symmetric algorithms

Unrivaled efficiency

Symmetric algorithms are best known for their superior efficiency in handling large volumes of data for encryption and decryption. The use of a single key significantly reduces the demand for computational resources, setting symmetric algorithms apart from their asymmetric counterparts. This makes them an excellent fit for applications that demand high-speed data processing, including secure communication channels and real-time data transfers.

Impressive speed

Symmetric algorithms, by virtue of their simplicity, can process data at a much faster rate than asymmetric algorithms. Without the need for complex mathematical operations, such as prime factorization or modular arithmetic, symmetric algorithms can encrypt and decrypt data rapidly, reducing latency. This speed advantage is particularly beneficial for applications requiring swift data encryption, including secure cloud storage and virtual private networks (VPNs).

Key distribution

Symmetric algorithms simplify the key distribution process. Given that both the sender and receiver utilize the same key, they only need to execute a secure key exchange once. This offers increased convenience in scenarios where multiple parties need to communicate securely, such as within large organizations, military operations, or corporate communications.

Computational simplicity

Symmetric algorithms are relatively straightforward to implement due to their computational simplicity. This allows for efficient coding, making them ideally suited for resource-constrained devices that possess limited computational capabilities, such as embedded systems or Internet of Things (IoT) devices. This simplicity also contributes to easier maintenance and debugging, reducing the potential for implementation errors that could compromise security.

Cons of symmetric algorithms

Complex key management

The management and distribution of shared keys are significant challenges inherent to symmetric algorithms. The security of these algorithms is closely tied to the confidentiality of the key. Any unauthorized access or compromise of the key can lead to a total breach of data security. Consequently, robust key management protocols are essential, including secure storage, key rotation, and secure key exchange mechanisms, to mitigate this risk.

Lack of authentication

Symmetric algorithms do not inherently provide authentication mechanisms. The absence of additional measures, such as digital signatures or message authentication codes, can make it challenging to verify the integrity and authenticity of the encrypted data. This opens the door for potential data tampering or unauthorized modifications, posing a considerable security risk.

Scalability

Symmetric algorithms face challenges when it comes to scalability. Since each pair of communicating entities requires a unique shared key, the number of required keys increases exponentially with the number of participants. This can be impractical for large-scale networks or systems that involve numerous users, as managing a vast number of keys becomes complex and resource-intensive.

Lack of perfect forward secrecy

Symmetric algorithms lack perfect forward secrecy, meaning that if the shared key is compromised, all previous and future communications encrypted with that key become vulnerable. This limitation makes symmetric algorithms less suitable for scenarios where long-term confidentiality of data is crucial, such as secure messaging applications.

An in-depth analysis of symmetric algorithms

Symmetric algorithms, including the widely adopted AES, DES, and Blowfish, are favored for their speed and efficiency. However, their robustness is largely dependent on the size of the key and the security of the key during transmission and storage. While larger keys can enhance security, they also increase the computational load. Thus, selecting the appropriate key size is a critical decision that requires a careful balance between security and performance requirements.

One of the standout strengths of symmetric encryption is its application in bulk data encryption. Because of their speed, symmetric algorithms are ideally suited for scenarios where large amounts of data need to be encrypted quickly. However, they may not always be the best solution. In many cases, asymmetric encryption algorithms, despite their higher computational demands, are preferred because of their additional security benefits.

It's also crucial to note that cryptographic needs often go beyond just encryption and decryption. Other security aspects, such as data integrity, authentication, and non-repudiation, are not inherently provided by symmetric algorithms. Therefore, a comprehensive security scheme often uses symmetric algorithms in conjunction with other cryptographic mechanisms, such as hash functions and digital signatures, to provide a full suite of security services.

Final thoughts

Symmetric algorithms occupy a pivotal place in the realm of cryptography. Their efficiency and speed make them an invaluable asset for many applications, especially those involving large-scale data encryption. However, the limitations inherent in symmetric algorithms, including key management complexities, lack of authentication, and absence of perfect forward secrecy, necessitate meticulous implementation and the incorporation of additional security measures. Therefore, the decision to utilize symmetric algorithms should be made based on a thorough understanding of these pros and cons, as well as the specific requirements of the system in question.

Pros and cons of symmetric algorithms: Ensuring security and efficiency

Mar 25, 2022 — 5 min read

If you've heard of ‘SHA’ in various forms but aren't sure what it stands for or why it's essential — you’re in luck! We'll attempt to shed some light on the family of cryptographic hash algorithms today.

But, before we get into SHA, let's go over what a hash function is and how it works. Before you can comprehend what SHA-1 and SHA-2 are, you must first grasp these principles.

Let's get started.

What Is a Hash Function?

A hash function relates to a set of characters (known as a key) of a certain length. The hash value is a representation of the original string of characters, however, it is usually smaller.

Because the shorter hash value is simpler to search for than the lengthier text, hashing is used for indexing and finding things in databases. Encryption employs hashing as well.

SHA-1, SHA-2, SHA-256… What’s this all about?

There are three types of secure hash algorithms: SHA-1, SHA-2, and SHA-256. The initial iteration of the algorithm was SHA-1, which was followed by SHA-2, an updated and better version of the first. The SHA-2 method produces a plethora of bit-length variables, which are referred to as SHA-256. Simply put, if you see “SHA-2,” “SHA-256” or “SHA-256 bit,” those names are referring to the same thing.

The NIST's Formal Acceptance

FIPS 180-4, published by the National Institute of Standards and Technology, officially defines the SHA-256 standard. Moreover, a set of test vectors is included with standardization and formalization to confirm that developers have correctly implemented the method.

Let’s break down the algorithm and how it works:

1. Append padding bits

The first step in our hashing process is to add bits to our original message to make it the same length as the standard length needed for the hash function. To accomplish so, we begin by adding a few details to the message we already have. The amount of bits we add is determined so that the message's length is precisely 64 bits less than a multiple of 512 after these bits are added. This can be expressed mathematically in the following way:

n x 512 = M + P + 64

M is the original message's length.
P stands for padded bits.

2. Append length bits

Now that we've added our padding bits to the original message, we can go ahead and add our length bits, which are equal to 64 bits, to make the whole message an exact multiple of 512.

We know we need to add 64 extra bits, so we'll compute them by multiplying the modulo of the original message (the one without the padding) by 232. We add those lengths to the padded bits in the message and get the complete message block, which must be a multiple of 512.

3. Initialize the buffers

We now have our message block, on which we will begin our calculations in order to determine the final hash. Before we get started, I want to point out that we'll need certain default settings to get started with the steps we'll be taking.

a = 0x6a09e667
b = 0xbb67ae85
c = 0x3c6ef372
d = 0xa54ff53a
e = 0x510e527f
f = 0x9b05688c
g = 0x1f83d9ab
h = 0x5be0cd19

Keep these principles in the back of your mind for now; all will fit together in the following phase. There are a further 64 variables to remember, which will operate as keys and are symbolized by the letter 'k.'

Let's go on to the portion where we calculate the hash using these data.

4. Compression Function

As a result, here is where the majority of the hashing algorithm is found. The whole message block, which is 'n x 512' bits long, is broken into 'n' chunks of 512 bits, each of which is then put through 64 rounds of operations, with the result being provided as input for the next round of operations.

The 64 rounds of operation conducted on a 512-bit message are plainly visible in the figure above. We can see that we send in two inputs: W(i) and K(i). During the first 16 rounds, we further break down the 512-bit message into 16 pieces, each consisting of 32 bits. Indeed, we must compute the value for W(i) at each step.

W(i) = Wⁱ⁻¹⁶ + σ⁰ + Wⁱ⁻⁷ + σ¹
where,
σ⁰ = (Wⁱ⁻¹⁵ ROTR⁷(x)) XOR (Wⁱ⁻¹⁵ ROTR¹⁸(x)) XOR (Wⁱ⁻¹⁵ SHR³(x))
σ¹ = (Wⁱ⁻² ROTR¹⁷(x)) XOR (Wⁱ⁻² ROTR¹⁹(x)) XOR (Wⁱ⁻² SHR¹⁰(x))
ROTRⁿ(x) = Circular right rotation of 'x' by 'n' bits
SHRⁿ(x)  = Circular right shift of 'x' by 'n' bits

5. Output

Every round's output is used as an input for the next round, and so on until just the final bits of the message are left, at which point the result of the last round for the nth portion of the message block will give us the result, i.e. the hash for the whole message. The output has a length of 256 bits.

Conclusion

In a nutshell, the whole principle behind SHA would sound something like this:

We determine the length of the message to be hashed, then add a few bits to it, beginning with '1' and continuing with '0' and then ‘1’ again until the message length is precisely 64 bits less than a multiple of 512. By multiplying the modulo of the original message by 232, we may add the remaining 64 bits. The complete message block may be represented as 'n x 512' bits after the remaining bits are added. Now, we split each of these 512 bits into 16 pieces, each of 32 bits, using the compression function, which consists of 64 rounds of operations. For the first 16 rounds, these 16 sections, each of 32 bits, operate as input, and for the next 48 rounds, we have a technique to compute the W(i). We also include preset buffer settings and 'k' values for each of the 64 rounds. We can now begin computing hashes since we have all of the necessary numbers and formulae. The hashing procedure is then repeated 64 times, with the result of the i round serving as the input for the i+1 round. As a result, the output of the 64th operation of the nth round will be the output, which is the hash of the whole message.

The SHA-256 hashing algorithm is now one of the most extensively used hashing algorithms since it has yet to be cracked and the hashes are generated rapidly when compared to other safe hashes such as the SHA-512. It is well-established, but the industry is working to gradually transition to SHA-512, which is more secure, since experts believe SHA-256 may become susceptible to hacking in the near future.

How SHA-256 works

Feb 10, 2022 — 4 min read

If the concept of ‘quantum cryptography' sounds complicated to you, you're right. That’s why this ‘encryption tutorial for dummies’ shall demystify the concept and provide an explanation in layman’s terms.

Quantum cryptography, which has been around for a few decades, is becoming more and more important to our daily lives because of its ability to protect essential data in a manner that conventional encryption techniques cannot.

What is it?

Cryptography, as we all know, is a technique that aims to encrypt data by scrambling plain text so that only those with the appropriate ‘key’ can read it. By extension, quantum cryptography encrypts data and transmits it in an unhackable manner using the principles of quantum mechanics.

While such a concept seems straightforward, the intricacy resides in the quantum mechanics that underpin quantum cryptography. For example:

  • The particles that make up the cosmos are fundamentally unpredictable, and they may exist in several places or states of existence at the same time;
  • A quantum attribute cannot be measured without causing it to change or be disturbed;
  • Some quantum attributes of a particle can be cloned, but not the whole particle.

How does it work?

Theoretically, quantum cryptography operates by following a model that was first published in 1984.

Assume there are two people called Alice and Bob who want to communicate a message in a safe manner, according to the model of quantum cryptography. Alice sends Bob a key, which serves as the signal for the communication to begin. One of the most important components is a stream of photons that go in just one direction. Each photon corresponds to a single bit of data — either a 0 or a 1 — in the computer's memory. However, in addition to traveling in a straight path, these photons are oscillating, or vibrating, in a certain fashion as they move.

The photons pass via a polarizer before reaching Alice, the sender, who then commences the transmission. When some photons pass through a polarizer with the same vibrations as before, and when others pass through with different vibrations, the filter is said to be ‘polarized’. There are many polarization states to choose from, including vertical (1 bit), horizontal (0 bit), 45 degrees right (1 bit) and 45 degrees left (0 bit). In whatever system she employs, the broadcast has one of two polarizations, each encoding a single bit, which is either 0 or 1.

From the polarizer to the receiver, the photons are now traveling via optical fiber to Bob. Each photon is analyzed using a beam splitter, which determines the polarization of each photon. After receiving the photon key, Bob does not recognize the right polarization of the photons, so he chooses one polarization at random from a pool of available options. Alice now compares the polarizers Bob used to polarize the key and informs Bob of the polarizer she used to deliver each photon to the receiver. Bob checks to see whether he used the right polarizer at this point. The photons that were read with the incorrect splitter are then eliminated, and the sequence that is left is deemed the key sequence.

Let's pretend there is an eavesdropper present, who goes by the name of Eve. Eve seeks to listen in and has the same tools as Bob in order to do so successfully. However, Bob has the benefit of being able to converse with Alice in order to check which polarizer type was used for each photon, but Eve does not. Eve is ultimately responsible for rendering the final key.

Alice and Bob would also be aware if Eve was listening in on their conversation. After Eve observes the flow of photons, the photon locations that Alice and Bob anticipate to see will be altered as a result of her observations.

Well, that’s all pretty mind-blowing, but for us, the general public, the biggest question is…

Is it really used?

Although the model described above has not yet been fully developed, there have been successful implementations of it, including the following:

  • The University of Cambridge and the Toshiba Corporation collaborated to develop a high-bit-rate quantum key distribution system based on the BB84 quantum cryptography protocol;
  • DARPA's Quantum Network, which operated from 2002 to 2007, was a 10-node QKD (Quantum Key Distribution) network constructed by Boston University, Harvard University, and IBM Research. It was operated by the Defense Advanced Research Projects Agency;
  • Quantum Xchange created the first quantum network in the United States, which is comprised of over 1,000 kilometers of optical fiber;
  • The development of commercial QKD systems was also carried out by commercial businesses such as ID Quantique, Toshiba, Quintessence Labs, and MagiQ Technologies Inc.

As you can see, these rare implementations are pretty far from what you’d expect to use every day. But hopefully, that will change in the near future.

The pros and cons of quantum cryptography

As with any developing technology, the state of it now (2022), may be very different to its state in the future. Thus, the following table may change dramatically. We do believe, however, that we’ll see fewer points in the ‘Limitations’ column as the years go on.

The need for unbreakable encryption is right there staring us down. The development of quantum computers is on the horizon, and the security of encrypted data is now in jeopardy due to the threat of quantum computing. We are fortunate in that quantum cryptography, in the form of QKD, provides us with the answer we need to protect our information long into the future — all while adhering to the difficult laws of quantum physics.

What is quantum cryptography?

Jan 12, 2022 — 4 min read

End-to-end encryption has been introduced by many communication providers in recent years, notably WhatsApp and Zoom. Although those companies have tried to explain the concept to their user base several times, we believe they failed. Whilst it's clear that these platforms have increased security, most don’t know how or why. Well, encryption is a rather simple concept to understand: It converts data into an unreadable format. But what exactly does "end-to-end" imply? What are the advantages and disadvantages of this added layer of security? We'll explain this as simply as possible without diving too much into the underlying math and technical terminology.

What is end-to-end encryption?

End-to-end encryption (E2EE) is a state-of-the-art protocol for communication security. Only the sender and the intended recipient(s) have access to the data in an end-to-end encrypted system. The encrypted data on the server is inaccessible to both hackers and undesirable third parties.

End-to-end encryption is best understood when compared to the encryption-in-transit approach, so let’s perform a quick recap. If a service employs encryption-in-transit, it is usually encrypted on your device before being delivered to the server. It’s then decrypted for processing on the server before it’s re-encrypted and routed to its final destination. When the data is in transit, it’s encrypted, but when it’s ‘at rest’, it’s decrypted. This safeguards the data during the most dangerous stage of the journey, transit — when it’s most exposed to hackers, interception, and theft.

End-to-end encryption, on the other hand, is the process of encrypting data on your device and not decrypting it until it reaches its destination. When your message travels through the server, not even the service that is delivering the data can view the content of your message.

In practice, this means that messengers using 'real' end-to-end encryption, like Signal, know only your phone number and the date of your last login – nothing more.

This is important for users that want to be sure their communication is kept secure from prying eyes. There are also some real-life examples that utilize end-to-end encryption for financial transactions and commercial communication.

How does it work?

The generation of a public-private key pair ensures the security of end-to-end encryption. This method, also known as asymmetric cryptography, encrypts and decrypts the message using distinct cryptographic keys. Public keys are widely distributed and are used to encrypt or ‘lock’ messages. Only the owner has access to the private keys, which are needed to unlock or decrypt the communication.

Whenever the user takes part in any end-to-end encrypted communication, the system automatically generates dedicated public and private keys.

If this sounds too complicated, here is a very simple metaphor:

You just bought a new Rolex for your buddy, who lives in Australia. Now, it’s already in a fancy green leather box, so you decide to put the stamp directly on it and send it. There is nothing wrong with that approach as long as you trust that the postal workers won’t steal it.

However, if you decide to put the Rolex box inside another box, hiding the nature of the gift from all interacting parties along the way, then you’ve effectively ensured (for all intents and purposes) that the Rolex is only visible to the intended recipient; when your mate from down under gets a hold of the box, he takes his pair of scissors and ‘decrypts’ the present. Indeed, you’ve ensured ‘end-to-end’ encryption.

You’re already using end-to-end encryption, daily

As we mentioned before, during an E2EE interaction, the server that delivers encrypted data between one "end" and the other "end" is unable to decode and read the data it sends. Even the servers' owners are unable to access the information since it is not saved on the servers themselves, only the "endpoints" (or the devices) of the discussion can decode the data.

If you’re daily using messengers like WhatsApp, iMessage, and Signal (where E2EE is enabled by default) or Telegram, Allo, and Facebook's ‘Secret Conversation’ function (where E2EE can be manually activated) – you’re already using end-to-end encryption.

What's more fascinating is that E2EE communication providers don't require you to trust them. And that’s great!

The fact that their systems can be hacked makes no difference to you because the transported data is encrypted and can only be read by the sender and receiver, which has enraged several organizations. There are known cases when such agencies asked for special ‘backdoors’ that would allow them to decrypt messages.

Why isn’t everything end-to-end encrypted?

End-to-end encryption is theoretically sound, but it lacks flexibility, thus it can't be utilized when the "two ends" that communicate data don't exist, such as with cloud storage.

This is why Zero-Knowledge Encryption was created, a solution that overcomes the problem by hiding the encryption key, even from the storage provider, resulting in an authentication request without the requirement for password exchange.

Moreover, end-to-end encryption does not hide information about the message, such as the date and time it was sent or the people who participated in the conversation. This metadata might provide indications on where the 'end-point' might be – not great if you are the target of a hacker.

The biggest problem, however, is that in reality, we never know whether the communication is end-to-end encrypted. Providers may claim to provide end-to-end encryption when what they truly deliver is encryption-in-transit. The information might be kept on a third-party server that can be accessed by anybody who has access to the server.

Conclusion

While it’s obvious that you shouldn’t be shipping Dave’s Rolex in its fancy green box, the reality is, if you’ve nothing to hide and you’re not transporting something incredibly valuable, encryption-in-transit is up to the job.

End-to-end encryption is a wonderful technology that enables a high level of security when properly implemented. But it doesn't really tackle the main issue – the end-user, still, to this day, needs to trust the system that they’re using to communicate. We hope that the next generation of encryption technologies such as ZKP will be able to change that.

What is End-to-end encryption?

Jan 10, 2022 — 4 min read

In this year of our lord, 2022, the term ‘Zero-Knowledge Encryption’ equates to best-in-class data insurance. We’ve already written an article named “What is Zero-Knowledge Proof?”, so we’re not going to look at definitions here, but rather, we’re going to explore the pros and cons of Zero-Knowledge proof encryption when compared to other technologies.

But for those who don’t want to dive deep into technical details, here’s an explanation of what Zero-Knowledge Encryption means:

It simply implies that no one else (not even the service provider) has access to your password-protected data.

This is important because even if your files are completely encrypted, if the server has access to the keys, a centralized hacker attack can result in a data breach.

In order to gain a better understanding of the factors that led to the development of Zero-Knowledge Encryption, we've decided to present a succinct, yet comprehensive, assessment of the advantages and disadvantages of three existing options:

Encryption-in-transit

Data in-transit, also known as data in motion, is data that is actively flowing from one point to another, such as that over the internet or over a private network. Data protection in transit refers to the security of data while it is being transferred from one network to another or from a local storage device to a cloud storage device. Effective data protection measures for in-transit data are critical because data is often considered less secure while in transit. Think of it like hiring security guards to accompany your cash-in-transit vehicle’s trip to the bank.

This means that, while using this approach, stored docs are 100% decryptable, so vulnerable.

As for our everyday life, the following technologies use the ‘encryption-in-transit’ approach:

Encryption-at-rest

Any data encryption is the process of converting one type of data into another that cannot be decrypted by unauthorized users. For example, you may have saved a copy of your passport. You obviously don't want this data to be easily accessed. If you store encrypted data on your server, it’s effectively "resting" there (which is why it’s called encryption-at-rest). This is usually accomplished by the use of an algorithm that is incomprehensible to a user who does not have access to the encryption key needed to decode it. Only an authorized person will be able to access the file, ensuring that your data is kept safe.

The Advanced Encryption Standard (AES) is often used to encrypt data at rest.

But, in order to access the data, you need a key — and that’s where the potential vulnerability lies.


Encryption-at-rest is like storing your data in a secret vault, encryption-in-transit is like putting it in an armored vehicle with security guards for transport.

End-to-end Encryption

End-to-end encryption is the act of applying encryption to messages on one device so that only the device to which it is sent can decrypt it. The message travels all the way from the sender to the recipient in encrypted form.

In practice, it means that only the communicating users (who have the key) can read the messages.

End-to-end encryption has created an impregnable fortress for communication services (for example, messengers), going beyond the security "façade" of encryption-in-transit and encryption-at-rest solutions.

This is the most common approach when protecting oneself against data breaches nowadays, but it only works from "one end to the other," as the term implies. Even though this all sounds great, end-to-end encryption can only be used for a "communication system" like Whatsapp or Telegram.

While theoretically sound, end-to-end encryption lacks flexibility, so it can’t be used when the "two ends" that share data don't exist, such as for cloud storage.

This is the motivation behind the development of Zero-Knowledge Encryption, a method that solves the problem by hiding the encryption key, even from the storage provider, resulting in an authentication request without the need for password exchange.

Zero-Knowledge Encryption

To log in to an account, you usually have to type in the exact password. In today's hyperconnected world, it's normal practice to tell the server your secret key ahead of time and test whether it matches.

Instead, there is another, more secure way, to manage this delicate process and that’s called Zero-Knowledge Encryption.

Without diving deep, The Zero-Knowledge relies on three main requirements:

  1. Completeness — an honest prover will be able to convince the verifier that he has the password by completing some process in the required way;
  2. Soundness — the verifier will almost certainly discover when the prover is lying;
  3. Zero-knowledge — if the prover has a password, the verifier receives no more information other than the fact that the statement is true.

Essentially, the system will check to see if you can demonstrate your knowledge several times by responding to various conditions. It’s like a brute force attack carried out backwards — you perform the same action many times in order to make sure that the prover isn’t lying.

Instead of concluding, let’s round up the pros and cons of Zero-Knowledge proof encryption when compared to the alternatives:


The con here is a clear example of the exceptional security provided by the Zero-Knowledge Encryption solution, which prevents even system administrators from recovering your password. This is why we, at Passwork, rely on this technology in our products. Ultimately, that’s why you can rely on us too.

Why Zero-Knowledge Encryption is the best

Dec 20, 2021 — 4 min read

It is rare for technologies to be born from ambitious philosophical concepts or mind games. But, when it comes to security and cryptography – everything is a riddle.

One of such riddles is ‘How can you prove that you know a secret without giving it away?’. Or in other words, ‘how can you tell someone you love them without saying that you love them?’.

The Zero-Knowledge Proof technique, as suggested by the name, uses cryptographic algorithms to allow several parties to verify the authenticity of a piece of information without having to share the material that makes it up. But how is it possible to prove something without supporting evidence? In this article, we’ll try our best to break it down for you as easily as possible.

Why?

We’re asking ourselves day after day – why on Earth would people decide to use such a complicated concept. Well, millions of people use the internet every day, accepting cookies and sharing personal information in exchange for access to services and digital products. Users are gradually becoming more vulnerable to security breaches and unauthorized access to their data. Furthermore, individuals frequently have to give up their privacy in return for digital platform services such as suggestions, consultations, tailored support, and so on, all of which wouldn’t be available when browsing privately. Due to all the above mentioned, there is a certain asymmetry regarding access to information – you give your information in exchange for a service.

In 1985, three great minds noticed ‘a great disturbance in the Force’ ahead of their time and released a paper called "The Knowledge Complexity of Interactive Proof-Systems" which introduced the concept of Zero-Knowledge Proof (ZKP) for the first time.

So what is it?

ZKP is a set of tools that allows an item of data to be evaluated without having to reveal the data that supports it. This is made feasible by a set of cryptographic methods that allow a "tester" to mathematically prove to a "verifier" that a computational statement is valid without disclosing any data.

It is possible to establish that particular facts are correct without having to share them with a third party in this way. For example, a user could demonstrate that he is of legal age to access a product or service without having to reveal his exact age. Or, it’s a bit like showing your friend your driving license instead of proving to him that you can drive by road-tripping to Mexico.

This technique is often used in the digital world to authenticate systems without the risk of information being stolen. Indeed, it’s no longer necessary to provide any personal data in order to establish a person's identity.

Sounds great, but how does it work?

The prover and the verifier are the two most important roles in zero-knowledge proofs. The prover must demonstrate that they are aware of the secret whereas the verifier must be able to determine whether or not the prover is lying.

It works because the verifier asks the prover to do actions that can only be done if the prover is certain that he or she is aware of the secret. If the prover is guessing, the verifier's tests will catch him or her out. If the secret is known, the prover will pass the verifier's exam with flying colours every time. It's similar to when a bank or other institution requests letters from a known secret word in order to authenticate your identity. You're not telling the bank how much money you have in your account; you're simply demonstrating that you know.

Wonderful, but how does it REALLY work?

To answer this, let’s take a look at a piece of research by Kamil Kulesza.

Assume that two characters, Alice and Bob, find themselves at the mouth of a cave with two independent entrances leading to two different paths (A and B). A door inside the cave connects both paths, but it can only be unlocked with a secret code. This code belongs to Bob (the 'tester,') and Alice (the 'verifier,') wants to buy it, but first, she wants to make sure Bob isn't lying.

How can Bob demonstrate to Alice that he has the code without divulging its contents? They perform the following to achieve this: Bob enters the cave via one of the entrances at random while Alice waits outside (A or B). Once inside, Alice approaches the front door, summons Bob, and instructs him to use one of the two exits. Bob will always be able to return by the path that Alice used since he knows the secret code.

Bob will always be able to return via the path that Alice directs him to, even if it does not coincide with the one he chose in the first place, because he can unlock the door and depart through the other side with the secret code.

But wait a minute, there is still a 50% chance that both Alice and Bob chose the same path, right? It is correct indeed, however, if this exercise is repeated several times, the likelihood that Bob will escape along the same path chosen by Alice without possessing the code decreases until it is almost impossible. Conclusion? If Bob leaves this path a sufficient number of times, he has unmistakably shown to Alice that his claim of holding the secret code is true. Moreover, there was no need to reveal the actual code in this case.

You can find out more about the Bob and Alice metaphor here.

Got it, so how is it used?

As for right now, ZKP is developing hand in hand with blockchain technology.

Zcash is a crypto platform that uses a unique iteration of zero-knowledge proofs (called zk-SNARKs). It allows native transactions to stay entirely encrypted while still being confirmed under the network's consensus rules. It’s a great example of this technology being used in practice.

Even though zero-knowledge proofs have a lot of potential to change the way today's data systems verify information, the technology is still considered to be in its infancy — primarily because researchers are still figuring out how to best use this concept while identifying any potential flaws. This, however, doesn’t stop us from using this protocol in our products! ;)

For a deeper understanding of the technical aspects and history behind this protocol, we recommend watching this video on YouTube.

What is Zero-knowledge Proof?