Society Homepage About Public Health Policy Contact

Data-driven hypothesis generation in clinical research: what we learned from a human subject study, article sidebar.

hypothesis generating analysis

Submit your own article

Register as an author to reserve your spot in the next issue of the Medical Research Archives.

Join the Society

The European Society of Medicine is more than a professional association. We are a community. Our members work in countries across the globe, yet are united by a common goal: to promote health and health equity, around the world.

Join Europe’s leading medical society and discover the many advantages of membership, including free article publication.

Main Article Content

Hypothesis generation is an early and critical step in any hypothesis-driven clinical research project. Because it is not yet a well-understood cognitive process, the need to improve the process goes unrecognized. Without an impactful hypothesis, the significance of any research project can be questionable, regardless of the rigor or diligence applied in other steps of the study, e.g., study design, data collection, and result analysis. In this perspective article, the authors provide a literature review on the following topics first: scientific thinking, reasoning, medical reasoning, literature-based discovery, and a field study to explore scientific thinking and discovery. Over the years, scientific thinking has shown excellent progress in cognitive science and its applied areas: education, medicine, and biomedical research. However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting. The results indicate that a secondary data analytical tool, VIADS—a visual interactive analytic tool for filtering, summarizing, and visualizing large health data sets coded with hierarchical terminologies, can shorten the time participants need, on average, to generate a hypothesis and also requires fewer cognitive events to generate each hypothesis. As a counterpoint, this exploration also indicates that the quality ratings of the hypotheses thus generated carry significantly lower ratings for feasibility when applying VIADS. Despite its small scale, the study confirmed the feasibility of conducting a human participant study directly to explore the hypothesis generation process in clinical research. This study provides supporting evidence to conduct a larger-scale study with a specifically designed tool to facilitate the hypothesis-generation process among inexperienced clinical researchers. A larger study could provide generalizable evidence, which in turn can potentially improve clinical research productivity and overall clinical research enterprise.

Article Details

The  Medical Research Archives  grants authors the right to publish and reproduce the unrevised contribution in whole or in part at any time and in any form for any scholarly non-commercial purpose with the condition that all publications of the contribution include a full citation to the journal as published by the  Medical Research Archives .

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • J Korean Med Sci
  • v.36(50); 2021 Dec 27

Logo of jkms

Formulating Hypotheses for Different Study Designs

Durga prasanna misra.

1 Department of Clinical Immunology and Rheumatology, Sanjay Gandhi Postgraduate Institute of Medical Sciences, Lucknow, India.

Armen Yuri Gasparyan

2 Departments of Rheumatology and Research and Development, Dudley Group NHS Foundation Trust (Teaching Trust of the University of Birmingham, UK), Russells Hall Hospital, Dudley, UK.

Olena Zimba

3 Department of Internal Medicine #2, Danylo Halytsky Lviv National Medical University, Lviv, Ukraine.

Marlen Yessirkepov

4 Department of Biology and Biochemistry, South Kazakhstan Medical Academy, Shymkent, Kazakhstan.

Vikas Agarwal

George d. kitas.

5 Centre for Epidemiology versus Arthritis, University of Manchester, Manchester, UK.

Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate hypotheses. Observational and interventional studies help to test hypotheses. A good hypothesis is usually based on previous evidence-based reports. Hypotheses without evidence-based justification and a priori ideas are not received favourably by the scientific community. Original research to test a hypothesis should be carefully planned to ensure appropriate methodology and adequate statistical power. While hypotheses can challenge conventional thinking and may be controversial, they should not be destructive. A hypothesis should be tested by ethically sound experiments with meaningful ethical and clinical implications. The coronavirus disease 2019 pandemic has brought into sharp focus numerous hypotheses, some of which were proven (e.g. effectiveness of corticosteroids in those with hypoxia) while others were disproven (e.g. ineffectiveness of hydroxychloroquine and ivermectin).

Graphical Abstract

An external file that holds a picture, illustration, etc.
Object name is jkms-36-e338-abf001.jpg

DEFINING WORKING AND STANDALONE SCIENTIFIC HYPOTHESES

Science is the systematized description of natural truths and facts. Routine observations of existing life phenomena lead to the creative thinking and generation of ideas about mechanisms of such phenomena and related human interventions. Such ideas presented in a structured format can be viewed as hypotheses. After generating a hypothesis, it is necessary to test it to prove its validity. Thus, hypothesis can be defined as a proposed mechanism of a naturally occurring event or a proposed outcome of an intervention. 1 , 2

Hypothesis testing requires choosing the most appropriate methodology and adequately powering statistically the study to be able to “prove” or “disprove” it within predetermined and widely accepted levels of certainty. This entails sample size calculation that often takes into account previously published observations and pilot studies. 2 , 3 In the era of digitization, hypothesis generation and testing may benefit from the availability of numerous platforms for data dissemination, social networking, and expert validation. Related expert evaluations may reveal strengths and limitations of proposed ideas at early stages of post-publication promotion, preventing the implementation of unsupported controversial points. 4

Thus, hypothesis generation is an important initial step in the research workflow, reflecting accumulating evidence and experts' stance. In this article, we overview the genesis and importance of scientific hypotheses and their relevance in the era of the coronavirus disease 2019 (COVID-19) pandemic.

DO WE NEED HYPOTHESES FOR ALL STUDY DESIGNS?

Broadly, research can be categorized as primary or secondary. In the context of medicine, primary research may include real-life observations of disease presentations and outcomes. Single case descriptions, which often lead to new ideas and hypotheses, serve as important starting points or justifications for case series and cohort studies. The importance of case descriptions is particularly evident in the context of the COVID-19 pandemic when unique, educational case reports have heralded a new era in clinical medicine. 5

Case series serve similar purpose to single case reports, but are based on a slightly larger quantum of information. Observational studies, including online surveys, describe the existing phenomena at a larger scale, often involving various control groups. Observational studies include variable-scale epidemiological investigations at different time points. Interventional studies detail the results of therapeutic interventions.

Secondary research is based on already published literature and does not directly involve human or animal subjects. Review articles are generated by secondary research. These could be systematic reviews which follow methods akin to primary research but with the unit of study being published papers rather than humans or animals. Systematic reviews have a rigid structure with a mandatory search strategy encompassing multiple databases, systematic screening of search results against pre-defined inclusion and exclusion criteria, critical appraisal of study quality and an optional component of collating results across studies quantitatively to derive summary estimates (meta-analysis). 6 Narrative reviews, on the other hand, have a more flexible structure. Systematic literature searches to minimise bias in selection of articles are highly recommended but not mandatory. 7 Narrative reviews are influenced by the authors' viewpoint who may preferentially analyse selected sets of articles. 8

In relation to primary research, case studies and case series are generally not driven by a working hypothesis. Rather, they serve as a basis to generate a hypothesis. Observational or interventional studies should have a hypothesis for choosing research design and sample size. The results of observational and interventional studies further lead to the generation of new hypotheses, testing of which forms the basis of future studies. Review articles, on the other hand, may not be hypothesis-driven, but form fertile ground to generate future hypotheses for evaluation. Fig. 1 summarizes which type of studies are hypothesis-driven and which lead on to hypothesis generation.

An external file that holds a picture, illustration, etc.
Object name is jkms-36-e338-g001.jpg

STANDARDS OF WORKING AND SCIENTIFIC HYPOTHESES

A review of the published literature did not enable the identification of clearly defined standards for working and scientific hypotheses. It is essential to distinguish influential versus not influential hypotheses, evidence-based hypotheses versus a priori statements and ideas, ethical versus unethical, or potentially harmful ideas. The following points are proposed for consideration while generating working and scientific hypotheses. 1 , 2 Table 1 summarizes these points.

Points to be considered while evaluating the validity of hypotheses
Backed by evidence-based data
Testable by relevant study designs
Supported by preliminary (pilot) studies
Testable by ethical studies
Maintaining a balance between scientific temper and controversy

Evidence-based data

A scientific hypothesis should have a sound basis on previously published literature as well as the scientist's observations. Randomly generated (a priori) hypotheses are unlikely to be proven. A thorough literature search should form the basis of a hypothesis based on published evidence. 7

Unless a scientific hypothesis can be tested, it can neither be proven nor be disproven. Therefore, a scientific hypothesis should be amenable to testing with the available technologies and the present understanding of science.

Supported by pilot studies

If a hypothesis is based purely on a novel observation by the scientist in question, it should be grounded on some preliminary studies to support it. For example, if a drug that targets a specific cell population is hypothesized to be useful in a particular disease setting, then there must be some preliminary evidence that the specific cell population plays a role in driving that disease process.

Testable by ethical studies

The hypothesis should be testable by experiments that are ethically acceptable. 9 For example, a hypothesis that parachutes reduce mortality from falls from an airplane cannot be tested using a randomized controlled trial. 10 This is because it is obvious that all those jumping from a flying plane without a parachute would likely die. Similarly, the hypothesis that smoking tobacco causes lung cancer cannot be tested by a clinical trial that makes people take up smoking (since there is considerable evidence for the health hazards associated with smoking). Instead, long-term observational studies comparing outcomes in those who smoke and those who do not, as was performed in the landmark epidemiological case control study by Doll and Hill, 11 are more ethical and practical.

Balance between scientific temper and controversy

Novel findings, including novel hypotheses, particularly those that challenge established norms, are bound to face resistance for their wider acceptance. Such resistance is inevitable until the time such findings are proven with appropriate scientific rigor. However, hypotheses that generate controversy are generally unwelcome. For example, at the time the pandemic of human immunodeficiency virus (HIV) and AIDS was taking foot, there were numerous deniers that refused to believe that HIV caused AIDS. 12 , 13 Similarly, at a time when climate change is causing catastrophic changes to weather patterns worldwide, denial that climate change is occurring and consequent attempts to block climate change are certainly unwelcome. 14 The denialism and misinformation during the COVID-19 pandemic, including unfortunate examples of vaccine hesitancy, are more recent examples of controversial hypotheses not backed by science. 15 , 16 An example of a controversial hypothesis that was a revolutionary scientific breakthrough was the hypothesis put forth by Warren and Marshall that Helicobacter pylori causes peptic ulcers. Initially, the hypothesis that a microorganism could cause gastritis and gastric ulcers faced immense resistance. When the scientists that proposed the hypothesis themselves ingested H. pylori to induce gastritis in themselves, only then could they convince the wider world about their hypothesis. Such was the impact of the hypothesis was that Barry Marshall and Robin Warren were awarded the Nobel Prize in Physiology or Medicine in 2005 for this discovery. 17 , 18

DISTINGUISHING THE MOST INFLUENTIAL HYPOTHESES

Influential hypotheses are those that have stood the test of time. An archetype of an influential hypothesis is that proposed by Edward Jenner in the eighteenth century that cowpox infection protects against smallpox. While this observation had been reported for nearly a century before this time, it had not been suitably tested and publicised until Jenner conducted his experiments on a young boy by demonstrating protection against smallpox after inoculation with cowpox. 19 These experiments were the basis for widespread smallpox immunization strategies worldwide in the 20th century which resulted in the elimination of smallpox as a human disease today. 20

Other influential hypotheses are those which have been read and cited widely. An example of this is the hygiene hypothesis proposing an inverse relationship between infections in early life and allergies or autoimmunity in adulthood. An analysis reported that this hypothesis had been cited more than 3,000 times on Scopus. 1

LESSONS LEARNED FROM HYPOTHESES AMIDST THE COVID-19 PANDEMIC

The COVID-19 pandemic devastated the world like no other in recent memory. During this period, various hypotheses emerged, understandably so considering the public health emergency situation with innumerable deaths and suffering for humanity. Within weeks of the first reports of COVID-19, aberrant immune system activation was identified as a key driver of organ dysfunction and mortality in this disease. 21 Consequently, numerous drugs that suppress the immune system or abrogate the activation of the immune system were hypothesized to have a role in COVID-19. 22 One of the earliest drugs hypothesized to have a benefit was hydroxychloroquine. Hydroxychloroquine was proposed to interfere with Toll-like receptor activation and consequently ameliorate the aberrant immune system activation leading to pathology in COVID-19. 22 The drug was also hypothesized to have a prophylactic role in preventing infection or disease severity in COVID-19. It was also touted as a wonder drug for the disease by many prominent international figures. However, later studies which were well-designed randomized controlled trials failed to demonstrate any benefit of hydroxychloroquine in COVID-19. 23 , 24 , 25 , 26 Subsequently, azithromycin 27 , 28 and ivermectin 29 were hypothesized as potential therapies for COVID-19, but were not supported by evidence from randomized controlled trials. The role of vitamin D in preventing disease severity was also proposed, but has not been proven definitively until now. 30 , 31 On the other hand, randomized controlled trials identified the evidence supporting dexamethasone 32 and interleukin-6 pathway blockade with tocilizumab as effective therapies for COVID-19 in specific situations such as at the onset of hypoxia. 33 , 34 Clues towards the apparent effectiveness of various drugs against severe acute respiratory syndrome coronavirus 2 in vitro but their ineffectiveness in vivo have recently been identified. Many of these drugs are weak, lipophilic bases and some others induce phospholipidosis which results in apparent in vitro effectiveness due to non-specific off-target effects that are not replicated inside living systems. 35 , 36

Another hypothesis proposed was the association of the routine policy of vaccination with Bacillus Calmette-Guerin (BCG) with lower deaths due to COVID-19. This hypothesis emerged in the middle of 2020 when COVID-19 was still taking foot in many parts of the world. 37 , 38 Subsequently, many countries which had lower deaths at that time point went on to have higher numbers of mortality, comparable to other areas of the world. Furthermore, the hypothesis that BCG vaccination reduced COVID-19 mortality was a classic example of ecological fallacy. Associations between population level events (ecological studies; in this case, BCG vaccination and COVID-19 mortality) cannot be directly extrapolated to the individual level. Furthermore, such associations cannot per se be attributed as causal in nature, and can only serve to generate hypotheses that need to be tested at the individual level. 39

IS TRADITIONAL PEER REVIEW EFFICIENT FOR EVALUATION OF WORKING AND SCIENTIFIC HYPOTHESES?

Traditionally, publication after peer review has been considered the gold standard before any new idea finds acceptability amongst the scientific community. Getting a work (including a working or scientific hypothesis) reviewed by experts in the field before experiments are conducted to prove or disprove it helps to refine the idea further as well as improve the experiments planned to test the hypothesis. 40 A route towards this has been the emergence of journals dedicated to publishing hypotheses such as the Central Asian Journal of Medical Hypotheses and Ethics. 41 Another means of publishing hypotheses is through registered research protocols detailing the background, hypothesis, and methodology of a particular study. If such protocols are published after peer review, then the journal commits to publishing the completed study irrespective of whether the study hypothesis is proven or disproven. 42 In the post-pandemic world, online research methods such as online surveys powered via social media channels such as Twitter and Instagram might serve as critical tools to generate as well as to preliminarily test the appropriateness of hypotheses for further evaluation. 43 , 44

Some radical hypotheses might be difficult to publish after traditional peer review. These hypotheses might only be acceptable by the scientific community after they are tested in research studies. Preprints might be a way to disseminate such controversial and ground-breaking hypotheses. 45 However, scientists might prefer to keep their hypotheses confidential for the fear of plagiarism of ideas, avoiding online posting and publishing until they have tested the hypotheses.

SUGGESTIONS ON GENERATING AND PUBLISHING HYPOTHESES

Publication of hypotheses is important, however, a balance is required between scientific temper and controversy. Journal editors and reviewers might keep in mind these specific points, summarized in Table 2 and detailed hereafter, while judging the merit of hypotheses for publication. Keeping in mind the ethical principle of primum non nocere, a hypothesis should be published only if it is testable in a manner that is ethically appropriate. 46 Such hypotheses should be grounded in reality and lend themselves to further testing to either prove or disprove them. It must be considered that subsequent experiments to prove or disprove a hypothesis have an equal chance of failing or succeeding, akin to tossing a coin. A pre-conceived belief that a hypothesis is unlikely to be proven correct should not form the basis of rejection of such a hypothesis for publication. In this context, hypotheses generated after a thorough literature search to identify knowledge gaps or based on concrete clinical observations on a considerable number of patients (as opposed to random observations on a few patients) are more likely to be acceptable for publication by peer-reviewed journals. Also, hypotheses should be considered for publication or rejection based on their implications for science at large rather than whether the subsequent experiments to test them end up with results in favour of or against the original hypothesis.

Points to be considered before a hypothesis is acceptable for publication
Experiments required to test hypotheses should be ethically acceptable as per the World Medical Association declaration on ethics and related statements
Pilot studies support hypotheses
Single clinical observations and expert opinion surveys may support hypotheses
Testing hypotheses requires robust methodology and statistical power
Hypotheses that challenge established views and concepts require proper evidence-based justification

Hypotheses form an important part of the scientific literature. The COVID-19 pandemic has reiterated the importance and relevance of hypotheses for dealing with public health emergencies and highlighted the need for evidence-based and ethical hypotheses. A good hypothesis is testable in a relevant study design, backed by preliminary evidence, and has positive ethical and clinical implications. General medical journals might consider publishing hypotheses as a specific article type to enable more rapid advancement of science.

Disclosure: The authors have no potential conflicts of interest to disclose.

Author Contributions:

  • Data curation: Gasparyan AY, Misra DP, Zimba O, Yessirkepov M, Agarwal V, Kitas GD.

January 13, 2024

hypothesis generating analysis

Demystifying Hypothesis Generation: A Guide to AI-Driven Insights

Hypothesis generation involves making informed guesses about various aspects of a business, market, or problem that need further exploration and testing. This article discusses the process you need to follow while generating hypothesis and how an AI tool, like Akaike's BYOB can help you achieve the process quicker and better.

hypothesis generating analysis

What is Hypothesis Generation?

Hypothesis generation involves making informed guesses about various aspects of a business, market, or problem that need further exploration and testing. It's a crucial step while applying the scientific method to business analysis and decision-making. 

Here is an example from a popular B-school marketing case study: 

A bicycle manufacturer noticed that their sales had dropped significantly in 2002 compared to the previous year. The team investigating the reasons for this had many hypotheses. One of them was: “many cycling enthusiasts have switched to walking with their iPods plugged in.” The Apple iPod was launched in late 2001 and was an immediate hit among young consumers. Data collected manually by the team seemed to show that the geographies around Apple stores had indeed shown a sales decline.

Traditionally, hypothesis generation is time-consuming and labour-intensive. However, the advent of Large Language Models (LLMs) and Generative AI (GenAI) tools has transformed the practice altogether. These AI tools can rapidly process extensive datasets, quickly identifying patterns, correlations, and insights that might have even slipped human eyes, thus streamlining the stages of hypothesis generation.

These tools have also revolutionised experimentation by optimising test designs, reducing resource-intensive processes, and delivering faster results. LLMs' role in hypothesis generation goes beyond mere assistance, bringing innovation and easy, data-driven decision-making to businesses.

Hypotheses come in various types, such as simple, complex, null, alternative, logical, statistical, or empirical. These categories are defined based on the relationships between the variables involved and the type of evidence required for testing them. In this article, we aim to demystify hypothesis generation. We will explore the role of LLMs in this process and outline the general steps involved, highlighting why it is a valuable tool in your arsenal.

Understanding Hypothesis Generation

A hypothesis is born from a set of underlying assumptions and a prediction of how those assumptions are anticipated to unfold in a given context. Essentially, it's an educated, articulated guess that forms the basis for action and outcome assessment.

A hypothesis is a declarative statement that has not yet been proven true. Based on past scholarship , we could sum it up as the following: 

  • A definite statement, not a question
  • Based on observations and knowledge
  • Testable and can be proven wrong
  • Predicts the anticipated results clearly
  • Contains a dependent and an independent variable where the dependent variable is the phenomenon being explained and the independent variable does the explaining

In a business setting, hypothesis generation becomes essential when people are made to explain their assumptions. This clarity from hypothesis to expected outcome is crucial, as it allows people to acknowledge a failed hypothesis if it does not provide the intended result. Promoting such a culture of effective hypothesising can lead to more thoughtful actions and a deeper understanding of outcomes. Failures become just another step on the way to success, and success brings more success.

Hypothesis generation is a continuous process where you start with an educated guess and refine it as you gather more information. You form a hypothesis based on what you know or observe.

Say you're a pen maker whose sales are down. You look at what you know:

  • I can see that pen sales for my brand are down in May and June.
  • I also know that schools are closed in May and June and that schoolchildren use a lot of pens.
  • I hypothesise that my sales are down because school children are not using pens in May and June, and thus not buying newer ones.

The next step is to collect and analyse data to test this hypothesis, like tracking sales before and after school vacations. As you gather more data and insights, your hypothesis may evolve. You might discover that your hypothesis only holds in certain markets but not others, leading to a more refined hypothesis.

Once your hypothesis is proven correct, there are many actions you may take - (a) reduce supply in these months (b) reduce the price so that sales pick up (c) release a limited supply of novelty pens, and so on.

Once you decide on your action, you will further monitor the data to see if your actions are working. This iterative cycle of formulating, testing, and refining hypotheses - and using insights in decision-making - is vital in making impactful decisions and solving complex problems in various fields, from business to scientific research.

How do Analysts generate Hypotheses? Why is it iterative?

A typical human working towards a hypothesis would start with:

    1. Picking the Default Action

    2. Determining the Alternative Action

    3. Figuring out the Null Hypothesis (H0)

    4. Inverting the Null Hypothesis to get the Alternate Hypothesis (H1)

    5. Hypothesis Testing

The default action is what you would naturally do, regardless of any hypothesis or in a case where you get no further information. The alternative action is the opposite of your default action.

The null hypothesis, or H0, is what brings about your default action. The alternative hypothesis (H1) is essentially the negation of H0.

For example, suppose you are tasked with analysing a highway tollgate data (timestamp, vehicle number, toll amount) to see if a raise in tollgate rates will increase revenue or cause a volume drop. Following the above steps, we can determine:

Default Action “I want to increase toll rates by 10%.”
Alternative Action “I will keep my rates constant.”
H “A 10% increase in the toll rate will not cause a significant dip in traffic (say 3%).”
H “A 10% increase in the toll rate will cause a dip in traffic of greater than 3%.”

Now, we can start looking at past data of tollgate traffic in and around rate increases for different tollgates. Some data might be irrelevant. For example, some tollgates might be much cheaper so customers might not have cared about an increase. Or, some tollgates are next to a large city, and customers have no choice but to pay. 

Ultimately, you are looking for the level of significance between traffic and rates for comparable tollgates. Significance is often noted as its P-value or probability value . P-value is a way to measure how surprising your test results are, assuming that your H0 holds true.

The lower the p-value, the more convincing your data is to change your default action.

Usually, a p-value that is less than 0.05 is considered to be statistically significant, meaning there is a need to change your null hypothesis and reject your default action. In our example, a low p-value would suggest that a 10% increase in the toll rate causes a significant dip in traffic (>3%). Thus, it is better if we keep our rates as is if we want to maintain revenue. 

In other examples, where one has to explore the significance of different variables, we might find that some variables are not correlated at all. In general, hypothesis generation is an iterative process - you keep looking for data and keep considering whether that data convinces you to change your default action.

Internal and External Data 

Hypothesis generation feeds on data. Data can be internal or external. In businesses, internal data is produced by company owned systems (areas such as operations, maintenance, personnel, finance, etc). External data comes from outside the company (customer data, competitor data, and so on).

Let’s consider a real-life hypothesis generated from internal data: 

Multinational company Johnson & Johnson was looking to enhance employee performance and retention.  Initially, they favoured experienced industry candidates for recruitment, assuming they'd stay longer and contribute faster. However, HR and the people analytics team at J&J hypothesised that recent college graduates outlast experienced hires and perform equally well.  They compiled data on 47,000 employees to test the hypothesis and, based on it, Johnson & Johnson increased hires of new graduates by 20% , leading to reduced turnover with consistent performance. 

For an analyst (or an AI assistant), external data is often hard to source - it may not be available as organised datasets (or reports), or it may be expensive to acquire. Teams might have to collect new data from surveys, questionnaires, customer feedback and more. 

Further, there is the problem of context. Suppose an analyst is looking at the dynamic pricing of hotels offered on his company’s platform in a particular geography. Suppose further that the analyst has no context of the geography, the reasons people visit the locality, or of local alternatives; then the analyst will have to learn additional context to start making hypotheses to test. 

Internal data, of course, is internal, meaning access is already guaranteed. However, this probably adds up to staggering volumes of data. 

Looking Back, and Looking Forward

Data analysts often have to generate hypotheses retrospectively, where they formulate and evaluate H0 and H1 based on past data. For the sake of this article, let's call it retrospective hypothesis generation.

Alternatively, a prospective approach to hypothesis generation could be one where hypotheses are formulated before data collection or before a particular event or change is implemented. 

For example: 

A pen seller has a hypothesis that during the lean periods of summer, when schools are closed, a Buy One Get One (BOGO) campaign will lead to a 100% sales recovery because customers will buy pens in advance.  He then collects feedback from customers in the form of a survey and also implements a BOGO campaign in a single territory to see whether his hypothesis is correct, or not.
The HR head of a multi-office employer realises that some of the company’s offices have been providing snacks at 4:30 PM in the common area, and the rest have not. He has a hunch that these offices have higher productivity. The leader asks the company’s data science team to look at employee productivity data and the employee location data. “Am I correct, and to what extent?”, he asks. 

These examples also reflect another nuance, in which the data is collected differently: 

  • Observational: Observational testing happens when researchers observe a sample population and collect data as it occurs without intervention. The data for the snacks vs productivity hypothesis was observational. 
  • Experimental: In experimental testing, the sample is divided into multiple groups, with one control group. The test for the non-control groups will be varied to determine how the data collected differs from that of the control group. The data collected by the pen seller in the single territory experiment was experimental.

Such data-backed insights are a valuable resource for businesses because they allow for more informed decision-making, leading to the company's overall growth. Taking a data-driven decision, from forming a hypothesis to updating and validating it across iterations, to taking action based on your insights reduces guesswork, minimises risks, and guides businesses towards strategies that are more likely to succeed.

How can GenAI help in Hypothesis Generation?

Of course, hypothesis generation is not always straightforward. Understanding the earlier examples is easy for us because we're already inundated with context. But, in a situation where an analyst has no domain knowledge, suddenly, hypothesis generation becomes a tedious and challenging process.

AI, particularly high-capacity, robust tools such as LLMs, have radically changed how we process and analyse large volumes of data. With its help, we can sift through massive datasets with precision and speed, regardless of context, whether it's customer behaviour, financial trends, medical records, or more. Generative AI, including LLMs, are trained on diverse text data, enabling them to comprehend and process various topics.

Now, imagine an AI assistant helping you with hypothesis generation. LLMs are not born with context. Instead, they are trained upon vast amounts of data, enabling them to develop context in a completely unfamiliar environment. This skill is instrumental when adopting a more exploratory approach to hypothesis generation. For example, the HR leader from earlier could simply ask an LLM tool: “Can you look at this employee productivity data and find cohorts of high-productivity and see if they correlate to any other employee data like location, pedigree, years of service, marital status, etc?” 

For an LLM-based tool to be useful, it requires a few things:

  • Domain Knowledge: A human could take months to years to acclimatise to a particular field fully, but LLMs, when fed extensive information and utilising Natural Language Processing (NLP), can familiarise themselves in a very short time.
  • Explainability:   Explainability is its ability to explain its thought process and output to cease being a "black box".
  • Customisation: For consistent improvement, contextual AI must allow tweaks, allowing users to change its behaviour to meet their expectations. Human intervention and validation is a necessary step in adoptingAI tools. NLP allows these tools to discern context within textual data, meaning it can read, categorise, and analyse data with unimaginable speed. LLMs, thus, can quickly develop contextual understanding and generate human-like text while processing vast amounts of unstructured data, making it easier for businesses and researchers to organise and utilise data effectively.LLMs have the potential to become indispensable tools for businesses. The future rests on AI tools that harness the powers of LLMs and NLP to deliver actionable insights, mitigate risks, inform decision-making, predict future trends, and drive business transformation across various sectors.

Together, these technologies empower data analysts to unravel hidden insights within their data. For our pen maker, for example, an AI tool could aid data analytics. It can look through historical data to track when sales peaked or go through sales data to identify the pens that sold the most. It can refine a hypothesis across iterations, just as a human analyst would. It can even be used to brainstorm other hypotheses. Consider the situation where you ask the LLM, " Where do I sell the most pens? ". It will go through all of the data you have made available - places where you sell pens, the number of pens you sold - to return the answer. Now, if we were to do this on our own, even if we were particularly meticulous about keeping records, it would take us at least five to ten minutes, that too, IF we know how to query a database and extract the needed information. If we don't, there's the added effort required to find and train such a person. An AI assistant, on the other hand, could share the answer with us in mere seconds. Its finely-honed talents in sorting through data, identifying patterns, refining hypotheses iteratively, and generating data-backed insights enhance problem-solving and decision-making, supercharging our business model.

Top-Down and Bottom-Up Hypothesis Generation

As we discussed earlier, every hypothesis begins with a default action that determines your initial hypotheses and all your subsequent data collection. You look at data and a LOT of data. The significance of your data is dependent on the effect and the relevance it has to your default action. This would be a top-down approach to hypothesis generation.

There is also the bottom-up method , where you start by going through your data and figuring out if there are any interesting correlations that you could leverage better. This method is usually not as focused as the earlier approach and, as a result, involves even more data collection, processing, and analysis. AI is a stellar tool for Exploratory Data Analysis (EDA). Wading through swathes of data to highlight trends, patterns, gaps, opportunities, errors, and concerns is hardly a challenge for an AI tool equipped with NLP and powered by LLMs.

EDA can help with: 

  • Cleaning your data
  • Understanding your variables
  • Analysing relationships between variables

An AI assistant performing EDA can help you review your data, remove redundant data points, identify errors, note relationships, and more. All of this ensures ease, efficiency, and, best of all, speed for your data analysts.

Good hypotheses are extremely difficult to generate. They are nuanced and, without necessary context, almost impossible to ascertain in a top-down approach. On the other hand, an AI tool adopting an exploratory approach is swift, easily running through available data - internal and external. 

If you want to rearrange how your LLM looks at your data, you can also do that. Changing the weight you assign to the various events and categories in your data is a simple process. That’s why LLMs are a great tool in hypothesis generation - analysts can tailor them to their specific use cases. 

Ethical Considerations and Challenges

There are numerous reasons why you should adopt AI tools into your hypothesis generation process. But why are they still not as popular as they should be?

Some worry that AI tools can inadvertently pick up human biases through the data it is fed. Others fear AI and raise privacy and trust concerns. Data quality and ability are also often questioned. Since LLMs and Generative AI are developing technologies, such issues are bound to be, but these are all obstacles researchers are earnestly tackling.

One oft-raised complaint against LLM tools (like OpenAI's ChatGPT) is that they 'fill in' gaps in knowledge, providing information where there is none, thus giving inaccurate, embellished, or outright wrong answers; this tendency to "hallucinate" was a major cause for concern. But, to combat this phenomenon, newer AI tools have started providing citations with the insights they offer so that their answers become verifiable. Human validation is an essential step in interpreting AI-generated hypotheses and queries in general. This is why we need a collaboration between the intelligent and artificially intelligent mind to ensure optimised performance.

Clearly, hypothesis generation is an immensely time-consuming activity. But AI can take care of all these steps for you. From helping you figure out your default action, determining all the major research questions, initial hypotheses and alternative actions, and exhaustively weeding through your data to collect all relevant points, AI can help make your analysts' jobs easier. It can take any approach - prospective, retrospective, exploratory, top-down, bottom-up, etc. Furthermore, with LLMs, your structured and unstructured data are taken care of, meaning no more worries about messy data! With the wonders of human intuition and the ease and reliability of Generative AI and Large Language Models, you can speed up and refine your process of hypothesis generation based on feedback and new data to provide the best assistance to your business.

Related Posts

The latest industry news, interviews, technologies, and resources.

hypothesis generating analysis

What is Open Source AI, Exactly?

hypothesis generating analysis

Analyst 2.0: How is AI Changing the Role of Data Analysts

The future belongs to those who forge a symbiotic relationship between Human Ingenuity and Machine Intelligence

hypothesis generating analysis

From Development to Deployment: Exploring the LLMOps Life Cycle

Discover how Large Language Models (LLMs) are revolutionizing enterprise AI with capabilities like text generation, sentiment analysis, and language translation. Learn about LLMOps, the specialized practices for deploying, monitoring, and maintaining LLMs in production, ensuring reliability, performance, and security in business operations.

hypothesis generating analysis

8 Ways By Which AI Fraud Detection Helps Financial Firms

In the era of the Digital revolution, financial systems and AI fraud detection go hand-in-hand as they share a common characteristic.

Knowledge Center

Case Studies

hypothesis generating analysis

© 2023 Akaike Technologies Pvt. Ltd. and/or its associates and partners

Terms of Use

Privacy Policy

Terms of Service

© Akaike Technologies Pvt. Ltd. and/or its associates and partners

hypothesis generating analysis

Hypothesis Generation and Interpretation

Design Principles and Patterns for Big Data Applications

  • © 2024
  • Hiroshi Ishikawa 0

Department of Systems Design, Tokyo Metropolitan University, Hino, Japan

You can also search for this author in PubMed   Google Scholar

  • Provides an integrated perspective on why decisions are made and how the process is modeled
  • Presentation of design patterns enables use in a wide variety of big-data applications
  • Multiple practical use cases indicate the broad real-world significance of the methods presented

Part of the book series: Studies in Big Data (SBD, volume 139)

2432 Accesses

This is a preview of subscription content, log in via an institution to check access.

Access this book

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Other ways to access

Licence this eBook for your library

Institutional subscriptions

About this book

The novel methods and technologies proposed in  Hypothesis Generation and Interpretation are supported by the incorporation of historical perspectives on science and an emphasis on the origin and development of the ideas behind their design principles and patterns.

Similar content being viewed by others

hypothesis generating analysis

A New Kind of Science: Big Data und Algorithmen verändern die Wissenschaft

hypothesis generating analysis

Analysis, Visualization and Exploration Scenarios: Formal Methods for Systematic Meta Studies of Big Data Applications

hypothesis generating analysis

The Nexus Between Big Data and Decision-Making: A Study of Big Data Techniques and Technologies

  • Hypothesis Generation
  • Hypothesis Interpretation
  • Data Engineering
  • Data Science
  • Data Management
  • Machine Learning
  • Data Mining
  • Design Patterns
  • Design Principles

Table of contents (8 chapters)

Front matter, basic concept.

Hiroshi Ishikawa

Science and Hypothesis

Machine learning and integrated approach, hypothesis generation by difference, methods for integrated hypothesis generation, interpretation, back matter, authors and affiliations, about the author.

He has published actively in international, refereed journals and conferences, such as ACM Transactions on Database Systems , IEEE Transactions on Knowledge and Data Engineering , The VLDB Journal , IEEE International Conference on Data Engineering, and ACM SIGSPATIAL and Management of Emergent Digital EcoSystems (MEDES). He has authored and co-authored a dozen books, including Social Big Data Mining (CRC, 2015) and Object-Oriented Database System (Springer-Verlag, 1993).

Bibliographic Information

Book Title : Hypothesis Generation and Interpretation

Book Subtitle : Design Principles and Patterns for Big Data Applications

Authors : Hiroshi Ishikawa

Series Title : Studies in Big Data

DOI : https://doi.org/10.1007/978-3-031-43540-9

Publisher : Springer Cham

eBook Packages : Computer Science , Computer Science (R0)

Copyright Information : The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2024

Hardcover ISBN : 978-3-031-43539-3 Published: 02 February 2024

Softcover ISBN : 978-3-031-43542-3 Due: 15 February 2025

eBook ISBN : 978-3-031-43540-9 Published: 01 January 2024

Series ISSN : 2197-6503

Series E-ISSN : 2197-6511

Edition Number : 1

Number of Pages : XII, 372

Number of Illustrations : 52 b/w illustrations, 125 illustrations in colour

Topics : Theory of Computation , Database Management , Data Mining and Knowledge Discovery , Machine Learning , Big Data , Complex Systems

  • Publish with us

Policies and ethics

  • Find a journal
  • Track your research

Machine Learning as a Tool for Hypothesis Generation

While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not. We illustrate the procedure with a concrete application: judge decisions about who to jail. We begin with a striking fact: The defendant’s face alone matters greatly for the judge’s jailing decision. In fact, an algorithm given only the pixels in the defendant’s mugshot accounts for up to half of the predictable variation. We develop a procedure that allows human subjects to interact with this black-box algorithm to produce hypotheses about what in the face influences judge decisions. The procedure generates hypotheses that are both interpretable and novel: They are not explained by demographics (e.g. race) or existing psychology research; nor are they already known (even if tacitly) to people or even experts. Though these results are specific, our procedure is general. It provides a way to produce novel, interpretable hypotheses from any high-dimensional dataset (e.g. cell phones, satellites, online behavior, news headlines, corporate filings, and high-frequency time series). A central tenet of our paper is that hypothesis generation is in and of itself a valuable activity, and hope this encourages future work in this largely “pre-scientific” stage of science.

This is a revised version of Chicago Booth working paper 22-15 “Algorithmic Behavioral Science: Machine Learning as a Tool for Scientific Discovery.” We gratefully acknowledge support from the Alfred P. Sloan Foundation, Emmanuel Roman, and the Center for Applied Artificial Intelligence at the University of Chicago. For valuable comments we thank Andrei Shliefer, Larry Katz and five anonymous referees, as well as Marianne Bertrand, Jesse Bruhn, Steven Durlauf, Joel Ferguson, Emma Harrington, Supreet Kaur, Matteo Magnaricotte, Dev Patel, Betsy Levy Paluck, Roberto Rocha, Evan Rose, Suproteem Sarkar, Josh Schwartzstein, Nick Swanson, Nadav Tadelis, Richard Thaler, Alex Todorov, Jenny Wang and Heather Yang, as well as seminar participants at Bocconi, Brown, Columbia, ETH Zurich, Harvard, MIT, Stanford, the University of California Berkeley, the University of Chicago, the University of Pennsylvania, the 2022 Behavioral Economics Annual Meetings and the 2022 NBER summer institute. For invaluable assistance with the data and analysis we thank Cecilia Cook, Logan Crowl, Arshia Elyaderani, and especially Jonas Knecht and James Ross. This research was reviewed by the University of Chicago Social and Behavioral Sciences Institutional Review Board (IRB20-0917) and deemed exempt because the project relies on secondary analysis of public data sources. All opinions and any errors are of course our own. The views expressed herein are those of the authors and do not necessarily reflect the views of the National Bureau of Economic Research.

MARC RIS BibTeΧ

Download Citation Data

Published Versions

Jens Ludwig & Sendhil Mullainathan, 2024. " Machine Learning as a Tool for Hypothesis Generation, " The Quarterly Journal of Economics, vol 139(2), pages 751-827.

Working Groups

Conferences, more from nber.

In addition to working papers , the NBER disseminates affiliates’ latest findings through a range of free periodicals — the NBER Reporter , the NBER Digest , the Bulletin on Retirement and Disability , the Bulletin on Health , and the Bulletin on Entrepreneurship  — as well as online conference reports , video lectures , and interviews .

2024, 16th Annual Feldstein Lecture, Cecilia E. Rouse," Lessons for Economists from the Pandemic" cover slide

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • How to Write a Strong Hypothesis | Steps & Examples

How to Write a Strong Hypothesis | Steps & Examples

Published on May 6, 2022 by Shona McCombes . Revised on November 20, 2023.

A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection .

Example: Hypothesis

Daily apple consumption leads to fewer doctor’s visits.

Table of contents

What is a hypothesis, developing a hypothesis (with example), hypothesis examples, other interesting articles, frequently asked questions about writing hypotheses.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess – it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Variables in hypotheses

Hypotheses propose a relationship between two or more types of variables .

  • An independent variable is something the researcher changes or controls.
  • A dependent variable is something the researcher observes and measures.

If there are any control variables , extraneous variables , or confounding variables , be sure to jot those down as you go to minimize the chances that research bias  will affect your results.

In this example, the independent variable is exposure to the sun – the assumed cause . The dependent variable is the level of happiness – the assumed effect .

Prevent plagiarism. Run a free check.

Step 1. ask a question.

Writing a hypothesis begins with a research question that you want to answer. The question should be focused, specific, and researchable within the constraints of your project.

Step 2. Do some preliminary research

Your initial answer to the question should be based on what is already known about the topic. Look for theories and previous studies to help you form educated assumptions about what your research will find.

At this stage, you might construct a conceptual framework to ensure that you’re embarking on a relevant topic . This can also help you identify which variables you will study and what you think the relationships are between them. Sometimes, you’ll have to operationalize more complex constructs.

Step 3. Formulate your hypothesis

Now you should have some idea of what you expect to find. Write your initial answer to the question in a clear, concise sentence.

4. Refine your hypothesis

You need to make sure your hypothesis is specific and testable. There are various ways of phrasing a hypothesis, but all the terms you use should have clear definitions, and the hypothesis should contain:

  • The relevant variables
  • The specific group being studied
  • The predicted outcome of the experiment or analysis

5. Phrase your hypothesis in three ways

To identify the variables, you can write a simple prediction in  if…then form. The first part of the sentence states the independent variable and the second part states the dependent variable.

In academic research, hypotheses are more commonly phrased in terms of correlations or effects, where you directly state the predicted relationship between variables.

If you are comparing two groups, the hypothesis can state what difference you expect to find between them.

6. Write a null hypothesis

If your research involves statistical hypothesis testing , you will also have to write a null hypothesis . The null hypothesis is the default position that there is no association between the variables. The null hypothesis is written as H 0 , while the alternative hypothesis is H 1 or H a .

  • H 0 : The number of lectures attended by first-year students has no effect on their final exam scores.
  • H 1 : The number of lectures attended by first-year students has a positive effect on their final exam scores.
Research question Hypothesis Null hypothesis
What are the health benefits of eating an apple a day? Increasing apple consumption in over-60s will result in decreasing frequency of doctor’s visits. Increasing apple consumption in over-60s will have no effect on frequency of doctor’s visits.
Which airlines have the most delays? Low-cost airlines are more likely to have delays than premium airlines. Low-cost and premium airlines are equally likely to have delays.
Can flexible work arrangements improve job satisfaction? Employees who have flexible working hours will report greater job satisfaction than employees who work fixed hours. There is no relationship between working hour flexibility and job satisfaction.
How effective is high school sex education at reducing teen pregnancies? Teenagers who received sex education lessons throughout high school will have lower rates of unplanned pregnancy teenagers who did not receive any sex education. High school sex education has no effect on teen pregnancy rates.
What effect does daily use of social media have on the attention span of under-16s? There is a negative between time spent on social media and attention span in under-16s. There is no relationship between social media use and attention span in under-16s.

If you want to know more about the research process , methodology , research bias , or statistics , make sure to check out some of our other articles with explanations and examples.

  • Sampling methods
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Likert scales
  • Reproducibility

 Statistics

  • Null hypothesis
  • Statistical power
  • Probability distribution
  • Effect size
  • Poisson distribution

Research bias

  • Optimism bias
  • Cognitive bias
  • Implicit bias
  • Hawthorne effect
  • Anchoring bias
  • Explicit bias

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis generating analysis

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

McCombes, S. (2023, November 20). How to Write a Strong Hypothesis | Steps & Examples. Scribbr. Retrieved August 30, 2024, from https://www.scribbr.com/methodology/hypothesis/

Is this article helpful?

Shona McCombes

Shona McCombes

Other students also liked, construct validity | definition, types, & examples, what is a conceptual framework | tips & examples, operationalization | a guide with examples, pros & cons, what is your plagiarism score.

  • Business Essentials
  • Leadership & Management
  • Credential of Leadership, Impact, and Management in Business (CLIMB)
  • Entrepreneurship & Innovation
  • Digital Transformation
  • Finance & Accounting
  • Business in Society
  • For Organizations
  • Support Portal
  • Media Coverage
  • Founding Donors
  • Leadership Team

hypothesis generating analysis

  • Harvard Business School →
  • HBS Online →
  • Business Insights →

Business Insights

Harvard Business School Online's Business Insights Blog provides the career insights you need to achieve your goals and gain confidence in your business skills.

  • Career Development
  • Communication
  • Decision-Making
  • Earning Your MBA
  • Negotiation
  • News & Events
  • Productivity
  • Staff Spotlight
  • Student Profiles
  • Work-Life Balance
  • AI Essentials for Business
  • Alternative Investments
  • Business Analytics
  • Business Strategy
  • Business and Climate Change
  • Creating Brand Value
  • Design Thinking and Innovation
  • Digital Marketing Strategy
  • Disruptive Strategy
  • Economics for Managers
  • Entrepreneurship Essentials
  • Financial Accounting
  • Global Business
  • Launching Tech Ventures
  • Leadership Principles
  • Leadership, Ethics, and Corporate Accountability
  • Leading Change and Organizational Renewal
  • Leading with Finance
  • Management Essentials
  • Negotiation Mastery
  • Organizational Leadership
  • Power and Influence for Positive Impact
  • Strategy Execution
  • Sustainable Business Strategy
  • Sustainable Investing
  • Winning with Digital Platforms

A Beginner’s Guide to Hypothesis Testing in Business

Business professionals performing hypothesis testing

  • 30 Mar 2021

Becoming a more data-driven decision-maker can bring several benefits to your organization, enabling you to identify new opportunities to pursue and threats to abate. Rather than allowing subjective thinking to guide your business strategy, backing your decisions with data can empower your company to become more innovative and, ultimately, profitable.

If you’re new to data-driven decision-making, you might be wondering how data translates into business strategy. The answer lies in generating a hypothesis and verifying or rejecting it based on what various forms of data tell you.

Below is a look at hypothesis testing and the role it plays in helping businesses become more data-driven.

Access your free e-book today.

What Is Hypothesis Testing?

To understand what hypothesis testing is, it’s important first to understand what a hypothesis is.

A hypothesis or hypothesis statement seeks to explain why something has happened, or what might happen, under certain conditions. It can also be used to understand how different variables relate to each other. Hypotheses are often written as if-then statements; for example, “If this happens, then this will happen.”

Hypothesis testing , then, is a statistical means of testing an assumption stated in a hypothesis. While the specific methodology leveraged depends on the nature of the hypothesis and data available, hypothesis testing typically uses sample data to extrapolate insights about a larger population.

Hypothesis Testing in Business

When it comes to data-driven decision-making, there’s a certain amount of risk that can mislead a professional. This could be due to flawed thinking or observations, incomplete or inaccurate data , or the presence of unknown variables. The danger in this is that, if major strategic decisions are made based on flawed insights, it can lead to wasted resources, missed opportunities, and catastrophic outcomes.

The real value of hypothesis testing in business is that it allows professionals to test their theories and assumptions before putting them into action. This essentially allows an organization to verify its analysis is correct before committing resources to implement a broader strategy.

As one example, consider a company that wishes to launch a new marketing campaign to revitalize sales during a slow period. Doing so could be an incredibly expensive endeavor, depending on the campaign’s size and complexity. The company, therefore, may wish to test the campaign on a smaller scale to understand how it will perform.

In this example, the hypothesis that’s being tested would fall along the lines of: “If the company launches a new marketing campaign, then it will translate into an increase in sales.” It may even be possible to quantify how much of a lift in sales the company expects to see from the effort. Pending the results of the pilot campaign, the business would then know whether it makes sense to roll it out more broadly.

Related: 9 Fundamental Data Science Skills for Business Professionals

Key Considerations for Hypothesis Testing

1. alternative hypothesis and null hypothesis.

In hypothesis testing, the hypothesis that’s being tested is known as the alternative hypothesis . Often, it’s expressed as a correlation or statistical relationship between variables. The null hypothesis , on the other hand, is a statement that’s meant to show there’s no statistical relationship between the variables being tested. It’s typically the exact opposite of whatever is stated in the alternative hypothesis.

For example, consider a company’s leadership team that historically and reliably sees $12 million in monthly revenue. They want to understand if reducing the price of their services will attract more customers and, in turn, increase revenue.

In this case, the alternative hypothesis may take the form of a statement such as: “If we reduce the price of our flagship service by five percent, then we’ll see an increase in sales and realize revenues greater than $12 million in the next month.”

The null hypothesis, on the other hand, would indicate that revenues wouldn’t increase from the base of $12 million, or might even decrease.

Check out the video below about the difference between an alternative and a null hypothesis, and subscribe to our YouTube channel for more explainer content.

2. Significance Level and P-Value

Statistically speaking, if you were to run the same scenario 100 times, you’d likely receive somewhat different results each time. If you were to plot these results in a distribution plot, you’d see the most likely outcome is at the tallest point in the graph, with less likely outcomes falling to the right and left of that point.

distribution plot graph

With this in mind, imagine you’ve completed your hypothesis test and have your results, which indicate there may be a correlation between the variables you were testing. To understand your results' significance, you’ll need to identify a p-value for the test, which helps note how confident you are in the test results.

In statistics, the p-value depicts the probability that, assuming the null hypothesis is correct, you might still observe results that are at least as extreme as the results of your hypothesis test. The smaller the p-value, the more likely the alternative hypothesis is correct, and the greater the significance of your results.

3. One-Sided vs. Two-Sided Testing

When it’s time to test your hypothesis, it’s important to leverage the correct testing method. The two most common hypothesis testing methods are one-sided and two-sided tests , or one-tailed and two-tailed tests, respectively.

Typically, you’d leverage a one-sided test when you have a strong conviction about the direction of change you expect to see due to your hypothesis test. You’d leverage a two-sided test when you’re less confident in the direction of change.

Business Analytics | Become a data-driven leader | Learn More

4. Sampling

To perform hypothesis testing in the first place, you need to collect a sample of data to be analyzed. Depending on the question you’re seeking to answer or investigate, you might collect samples through surveys, observational studies, or experiments.

A survey involves asking a series of questions to a random population sample and recording self-reported responses.

Observational studies involve a researcher observing a sample population and collecting data as it occurs naturally, without intervention.

Finally, an experiment involves dividing a sample into multiple groups, one of which acts as the control group. For each non-control group, the variable being studied is manipulated to determine how the data collected differs from that of the control group.

A Beginner's Guide to Data and Analytics | Access Your Free E-Book | Download Now

Learn How to Perform Hypothesis Testing

Hypothesis testing is a complex process involving different moving pieces that can allow an organization to effectively leverage its data and inform strategic decisions.

If you’re interested in better understanding hypothesis testing and the role it can play within your organization, one option is to complete a course that focuses on the process. Doing so can lay the statistical and analytical foundation you need to succeed.

Do you want to learn more about hypothesis testing? Explore Business Analytics —one of our online business essentials courses —and download our Beginner’s Guide to Data & Analytics .

hypothesis generating analysis

About the Author

Tutorial Playlist

Statistics tutorial, everything you need to know about the probability density function in statistics, the best guide to understand central limit theorem, an in-depth guide to measures of central tendency : mean, median and mode, the ultimate guide to understand conditional probability.

A Comprehensive Look at Percentile in Statistics

The Best Guide to Understand Bayes Theorem

Everything you need to know about the normal distribution, an in-depth explanation of cumulative distribution function, a complete guide to chi-square test, what is hypothesis testing in statistics types and examples, understanding the fundamentals of arithmetic and geometric progression, the definitive guide to understand spearman’s rank correlation, mean squared error: overview, examples, concepts and more, all you need to know about the empirical rule in statistics, the complete guide to skewness and kurtosis, a holistic look at bernoulli distribution.

All You Need to Know About Bias in Statistics

A Complete Guide to Get a Grasp of Time Series Analysis

The Key Differences Between Z-Test Vs. T-Test

The Complete Guide to Understand Pearson's Correlation

A complete guide on the types of statistical studies, everything you need to know about poisson distribution, your best guide to understand correlation vs. regression, the most comprehensive guide for beginners on what is correlation, hypothesis testing in statistics - types | examples.

Lesson 10 of 24 By Avijeet Biswal

What Is Hypothesis Testing in Statistics? Types and Examples

Table of Contents

In today’s data-driven world, decisions are based on data all the time. Hypothesis plays a crucial role in that process, whether it may be making business decisions, in the health sector, academia, or in quality improvement. Without hypothesis & hypothesis tests, you risk drawing the wrong conclusions and making bad decisions. In this tutorial, you will look at Hypothesis Testing in Statistics.

The Ultimate Ticket to Top Data Science Job Roles

The Ultimate Ticket to Top Data Science Job Roles

What Is Hypothesis Testing in Statistics?

Hypothesis Testing is a type of statistical analysis in which you put your assumptions about a population parameter to the test. It is used to estimate the relationship between 2 statistical variables.

Let's discuss few examples of statistical hypothesis from real-life - 

  • A teacher assumes that 60% of his college's students come from lower-middle-class families.
  • A doctor believes that 3D (Diet, Dose, and Discipline) is 90% effective for diabetic patients.

Now that you know about hypothesis testing, look at the two types of hypothesis testing in statistics.

Hypothesis Testing Formula

Z = ( x̅ – μ0 ) / (σ /√n)

  • Here, x̅ is the sample mean,
  • μ0 is the population mean,
  • σ is the standard deviation,
  • n is the sample size.

How Hypothesis Testing Works?

An analyst performs hypothesis testing on a statistical sample to present evidence of the plausibility of the null hypothesis. Measurements and analyses are conducted on a random sample of the population to test a theory. Analysts use a random population sample to test two hypotheses: the null and alternative hypotheses.

The null hypothesis is typically an equality hypothesis between population parameters; for example, a null hypothesis may claim that the population means return equals zero. The alternate hypothesis is essentially the inverse of the null hypothesis (e.g., the population means the return is not equal to zero). As a result, they are mutually exclusive, and only one can be correct. One of the two possibilities, however, will always be correct.

Your Dream Career is Just Around The Corner!

Your Dream Career is Just Around The Corner!

Null Hypothesis and Alternative Hypothesis

The Null Hypothesis is the assumption that the event will not occur. A null hypothesis has no bearing on the study's outcome unless it is rejected.

H0 is the symbol for it, and it is pronounced H-naught.

The Alternate Hypothesis is the logical opposite of the null hypothesis. The acceptance of the alternative hypothesis follows the rejection of the null hypothesis. H1 is the symbol for it.

Let's understand this with an example.

A sanitizer manufacturer claims that its product kills 95 percent of germs on average. 

To put this company's claim to the test, create a null and alternate hypothesis.

H0 (Null Hypothesis): Average = 95%.

Alternative Hypothesis (H1): The average is less than 95%.

Another straightforward example to understand this concept is determining whether or not a coin is fair and balanced. The null hypothesis states that the probability of a show of heads is equal to the likelihood of a show of tails. In contrast, the alternate theory states that the probability of a show of heads and tails would be very different.

Become a Data Scientist with Hands-on Training!

Become a Data Scientist with Hands-on Training!

Hypothesis Testing Calculation With Examples

Let's consider a hypothesis test for the average height of women in the United States. Suppose our null hypothesis is that the average height is 5'4". We gather a sample of 100 women and determine that their average height is 5'5". The standard deviation of population is 2.

To calculate the z-score, we would use the following formula:

z = ( x̅ – μ0 ) / (σ /√n)

z = (5'5" - 5'4") / (2" / √100)

z = 0.5 / (0.045)

We will reject the null hypothesis as the z-score of 11.11 is very large and conclude that there is evidence to suggest that the average height of women in the US is greater than 5'4".

Steps in Hypothesis Testing

Hypothesis testing is a statistical method to determine if there is enough evidence in a sample of data to infer that a certain condition is true for the entire population. Here’s a breakdown of the typical steps involved in hypothesis testing:

Formulate Hypotheses

  • Null Hypothesis (H0): This hypothesis states that there is no effect or difference, and it is the hypothesis you attempt to reject with your test.
  • Alternative Hypothesis (H1 or Ha): This hypothesis is what you might believe to be true or hope to prove true. It is usually considered the opposite of the null hypothesis.

Choose the Significance Level (α)

The significance level, often denoted by alpha (α), is the probability of rejecting the null hypothesis when it is true. Common choices for α are 0.05 (5%), 0.01 (1%), and 0.10 (10%).

Select the Appropriate Test

Choose a statistical test based on the type of data and the hypothesis. Common tests include t-tests, chi-square tests, ANOVA, and regression analysis. The selection depends on data type, distribution, sample size, and whether the hypothesis is one-tailed or two-tailed.

Collect Data

Gather the data that will be analyzed in the test. This data should be representative of the population to infer conclusions accurately.

Calculate the Test Statistic

Based on the collected data and the chosen test, calculate a test statistic that reflects how much the observed data deviates from the null hypothesis.

Determine the p-value

The p-value is the probability of observing test results at least as extreme as the results observed, assuming the null hypothesis is correct. It helps determine the strength of the evidence against the null hypothesis.

Make a Decision

Compare the p-value to the chosen significance level:

  • If the p-value ≤ α: Reject the null hypothesis, suggesting sufficient evidence in the data supports the alternative hypothesis.
  • If the p-value > α: Do not reject the null hypothesis, suggesting insufficient evidence to support the alternative hypothesis.

Report the Results

Present the findings from the hypothesis test, including the test statistic, p-value, and the conclusion about the hypotheses.

Perform Post-hoc Analysis (if necessary)

Depending on the results and the study design, further analysis may be needed to explore the data more deeply or to address multiple comparisons if several hypotheses were tested simultaneously.

Types of Hypothesis Testing

To determine whether a discovery or relationship is statistically significant, hypothesis testing uses a z-test. It usually checks to see if two means are the same (the null hypothesis). Only when the population standard deviation is known and the sample size is 30 data points or more, can a z-test be applied.

A statistical test called a t-test is employed to compare the means of two groups. To determine whether two groups differ or if a procedure or treatment affects the population of interest, it is frequently used in hypothesis testing.

Chi-Square 

You utilize a Chi-square test for hypothesis testing concerning whether your data is as predicted. To determine if the expected and observed results are well-fitted, the Chi-square test analyzes the differences between categorical variables from a random sample. The test's fundamental premise is that the observed values in your data should be compared to the predicted values that would be present if the null hypothesis were true.

Hypothesis Testing and Confidence Intervals

Both confidence intervals and hypothesis tests are inferential techniques that depend on approximating the sample distribution. Data from a sample is used to estimate a population parameter using confidence intervals. Data from a sample is used in hypothesis testing to examine a given hypothesis. We must have a postulated parameter to conduct hypothesis testing.

Bootstrap distributions and randomization distributions are created using comparable simulation techniques. The observed sample statistic is the focal point of a bootstrap distribution, whereas the null hypothesis value is the focal point of a randomization distribution.

A variety of feasible population parameter estimates are included in confidence ranges. In this lesson, we created just two-tailed confidence intervals. There is a direct connection between these two-tail confidence intervals and these two-tail hypothesis tests. The results of a two-tailed hypothesis test and two-tailed confidence intervals typically provide the same results. In other words, a hypothesis test at the 0.05 level will virtually always fail to reject the null hypothesis if the 95% confidence interval contains the predicted value. A hypothesis test at the 0.05 level will nearly certainly reject the null hypothesis if the 95% confidence interval does not include the hypothesized parameter.

Become a Data Scientist through hands-on learning with hackathons, masterclasses, webinars, and Ask-Me-Anything! Start learning now!

Simple and Composite Hypothesis Testing

Depending on the population distribution, you can classify the statistical hypothesis into two types.

Simple Hypothesis: A simple hypothesis specifies an exact value for the parameter.

Composite Hypothesis: A composite hypothesis specifies a range of values.

A company is claiming that their average sales for this quarter are 1000 units. This is an example of a simple hypothesis.

Suppose the company claims that the sales are in the range of 900 to 1000 units. Then this is a case of a composite hypothesis.

One-Tailed and Two-Tailed Hypothesis Testing

The One-Tailed test, also called a directional test, considers a critical region of data that would result in the null hypothesis being rejected if the test sample falls into it, inevitably meaning the acceptance of the alternate hypothesis.

In a one-tailed test, the critical distribution area is one-sided, meaning the test sample is either greater or lesser than a specific value.

In two tails, the test sample is checked to be greater or less than a range of values in a Two-Tailed test, implying that the critical distribution area is two-sided.

If the sample falls within this range, the alternate hypothesis will be accepted, and the null hypothesis will be rejected.

Become a Data Scientist With Real-World Experience

Become a Data Scientist With Real-World Experience

Right Tailed Hypothesis Testing

If the larger than (>) sign appears in your hypothesis statement, you are using a right-tailed test, also known as an upper test. Or, to put it another way, the disparity is to the right. For instance, you can contrast the battery life before and after a change in production. Your hypothesis statements can be the following if you want to know if the battery life is longer than the original (let's say 90 hours):

  • The null hypothesis is (H0 <= 90) or less change.
  • A possibility is that battery life has risen (H1) > 90.

The crucial point in this situation is that the alternate hypothesis (H1), not the null hypothesis, decides whether you get a right-tailed test.

Left Tailed Hypothesis Testing

Alternative hypotheses that assert the true value of a parameter is lower than the null hypothesis are tested with a left-tailed test; they are indicated by the asterisk "<".

Suppose H0: mean = 50 and H1: mean not equal to 50

According to the H1, the mean can be greater than or less than 50. This is an example of a Two-tailed test.

In a similar manner, if H0: mean >=50, then H1: mean <50

Here the mean is less than 50. It is called a One-tailed test.

Type 1 and Type 2 Error

A hypothesis test can result in two types of errors.

Type 1 Error: A Type-I error occurs when sample results reject the null hypothesis despite being true.

Type 2 Error: A Type-II error occurs when the null hypothesis is not rejected when it is false, unlike a Type-I error.

Suppose a teacher evaluates the examination paper to decide whether a student passes or fails.

H0: Student has passed

H1: Student has failed

Type I error will be the teacher failing the student [rejects H0] although the student scored the passing marks [H0 was true]. 

Type II error will be the case where the teacher passes the student [do not reject H0] although the student did not score the passing marks [H1 is true].

Our Data Scientist Master's Program covers core topics such as R, Python, Machine Learning, Tableau, Hadoop, and Spark. Get started on your journey today!

Limitations of Hypothesis Testing

Hypothesis testing has some limitations that researchers should be aware of:

  • It cannot prove or establish the truth: Hypothesis testing provides evidence to support or reject a hypothesis, but it cannot confirm the absolute truth of the research question.
  • Results are sample-specific: Hypothesis testing is based on analyzing a sample from a population, and the conclusions drawn are specific to that particular sample.
  • Possible errors: During hypothesis testing, there is a chance of committing type I error (rejecting a true null hypothesis) or type II error (failing to reject a false null hypothesis).
  • Assumptions and requirements: Different tests have specific assumptions and requirements that must be met to accurately interpret results.

Learn All The Tricks Of The BI Trade

Learn All The Tricks Of The BI Trade

After reading this tutorial, you would have a much better understanding of hypothesis testing, one of the most important concepts in the field of Data Science . The majority of hypotheses are based on speculation about observed behavior, natural phenomena, or established theories.

If you are interested in statistics of data science and skills needed for such a career, you ought to explore the Post Graduate Program in Data Science.

If you have any questions regarding this ‘Hypothesis Testing In Statistics’ tutorial, do share them in the comment section. Our subject matter expert will respond to your queries. Happy learning!

1. What is hypothesis testing in statistics with example?

Hypothesis testing is a statistical method used to determine if there is enough evidence in a sample data to draw conclusions about a population. It involves formulating two competing hypotheses, the null hypothesis (H0) and the alternative hypothesis (Ha), and then collecting data to assess the evidence. An example: testing if a new drug improves patient recovery (Ha) compared to the standard treatment (H0) based on collected patient data.

2. What is H0 and H1 in statistics?

In statistics, H0​ and H1​ represent the null and alternative hypotheses. The null hypothesis, H0​, is the default assumption that no effect or difference exists between groups or conditions. The alternative hypothesis, H1​, is the competing claim suggesting an effect or a difference. Statistical tests determine whether to reject the null hypothesis in favor of the alternative hypothesis based on the data.

3. What is a simple hypothesis with an example?

A simple hypothesis is a specific statement predicting a single relationship between two variables. It posits a direct and uncomplicated outcome. For example, a simple hypothesis might state, "Increased sunlight exposure increases the growth rate of sunflowers." Here, the hypothesis suggests a direct relationship between the amount of sunlight (independent variable) and the growth rate of sunflowers (dependent variable), with no additional variables considered.

4. What are the 3 major types of hypothesis?

The three major types of hypotheses are:

  • Null Hypothesis (H0): Represents the default assumption, stating that there is no significant effect or relationship in the data.
  • Alternative Hypothesis (Ha): Contradicts the null hypothesis and proposes a specific effect or relationship that researchers want to investigate.
  • Nondirectional Hypothesis: An alternative hypothesis that doesn't specify the direction of the effect, leaving it open for both positive and negative possibilities.

Find our PL-300 Microsoft Power BI Certification Training Online Classroom training classes in top cities:

NameDatePlace
21 Sep -6 Oct 2024,
Weekend batch
Your City
12 Oct -27 Oct 2024,
Weekend batch
Your City
26 Oct -10 Nov 2024,
Weekend batch
Your City

About the Author

Avijeet Biswal

Avijeet is a Senior Research Analyst at Simplilearn. Passionate about Data Analytics, Machine Learning, and Deep Learning, Avijeet is also interested in politics, cricket, and football.

Recommended Resources

The Key Differences Between Z-Test Vs. T-Test

Free eBook: Top Programming Languages For A Data Scientist

Normality Test in Minitab: Minitab with Statistics

Normality Test in Minitab: Minitab with Statistics

A Comprehensive Look at Percentile in Statistics

Machine Learning Career Guide: A Playbook to Becoming a Machine Learning Engineer

  • PMP, PMI, PMBOK, CAPM, PgMP, PfMP, ACP, PBA, RMP, SP, and OPM3 are registered marks of the Project Management Institute, Inc.
  • Search Search Please fill out this field.

What Is Hypothesis Testing?

  • How It Works

4 Step Process

The bottom line.

  • Fundamental Analysis

Hypothesis Testing: 4 Steps and Example

hypothesis generating analysis

Hypothesis testing, sometimes called significance testing, is an act in statistics whereby an analyst tests an assumption regarding a population parameter. The methodology employed by the analyst depends on the nature of the data used and the reason for the analysis.

Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data. Such data may come from a larger population or a data-generating process. The word "population" will be used for both of these cases in the following descriptions.

Key Takeaways

  • Hypothesis testing is used to assess the plausibility of a hypothesis by using sample data.
  • The test provides evidence concerning the plausibility of the hypothesis, given the data.
  • Statistical analysts test a hypothesis by measuring and examining a random sample of the population being analyzed.
  • The four steps of hypothesis testing include stating the hypotheses, formulating an analysis plan, analyzing the sample data, and analyzing the result.

How Hypothesis Testing Works

In hypothesis testing, an  analyst  tests a statistical sample, intending to provide evidence on the plausibility of the null hypothesis. Statistical analysts measure and examine a random sample of the population being analyzed. All analysts use a random population sample to test two different hypotheses: the null hypothesis and the alternative hypothesis.

The null hypothesis is usually a hypothesis of equality between population parameters; e.g., a null hypothesis may state that the population mean return is equal to zero. The alternative hypothesis is effectively the opposite of a null hypothesis. Thus, they are mutually exclusive , and only one can be true. However, one of the two hypotheses will always be true.

The null hypothesis is a statement about a population parameter, such as the population mean, that is assumed to be true.

  • State the hypotheses.
  • Formulate an analysis plan, which outlines how the data will be evaluated.
  • Carry out the plan and analyze the sample data.
  • Analyze the results and either reject the null hypothesis, or state that the null hypothesis is plausible, given the data.

Example of Hypothesis Testing

If an individual wants to test that a penny has exactly a 50% chance of landing on heads, the null hypothesis would be that 50% is correct, and the alternative hypothesis would be that 50% is not correct. Mathematically, the null hypothesis is represented as Ho: P = 0.5. The alternative hypothesis is shown as "Ha" and is identical to the null hypothesis, except with the equal sign struck-through, meaning that it does not equal 50%.

A random sample of 100 coin flips is taken, and the null hypothesis is tested. If it is found that the 100 coin flips were distributed as 40 heads and 60 tails, the analyst would assume that a penny does not have a 50% chance of landing on heads and would reject the null hypothesis and accept the alternative hypothesis.

If there were 48 heads and 52 tails, then it is plausible that the coin could be fair and still produce such a result. In cases such as this where the null hypothesis is "accepted," the analyst states that the difference between the expected results (50 heads and 50 tails) and the observed results (48 heads and 52 tails) is "explainable by chance alone."

When Did Hypothesis Testing Begin?

Some statisticians attribute the first hypothesis tests to satirical writer John Arbuthnot in 1710, who studied male and female births in England after observing that in nearly every year, male births exceeded female births by a slight proportion. Arbuthnot calculated that the probability of this happening by chance was small, and therefore it was due to “divine providence.”

What are the Benefits of Hypothesis Testing?

Hypothesis testing helps assess the accuracy of new ideas or theories by testing them against data. This allows researchers to determine whether the evidence supports their hypothesis, helping to avoid false claims and conclusions. Hypothesis testing also provides a framework for decision-making based on data rather than personal opinions or biases. By relying on statistical analysis, hypothesis testing helps to reduce the effects of chance and confounding variables, providing a robust framework for making informed conclusions.

What are the Limitations of Hypothesis Testing?

Hypothesis testing relies exclusively on data and doesn’t provide a comprehensive understanding of the subject being studied. Additionally, the accuracy of the results depends on the quality of the available data and the statistical methods used. Inaccurate data or inappropriate hypothesis formulation may lead to incorrect conclusions or failed tests. Hypothesis testing can also lead to errors, such as analysts either accepting or rejecting a null hypothesis when they shouldn’t have. These errors may result in false conclusions or missed opportunities to identify significant patterns or relationships in the data.

Hypothesis testing refers to a statistical process that helps researchers determine the reliability of a study. By using a well-formulated hypothesis and set of statistical tests, individuals or businesses can make inferences about the population that they are studying and draw conclusions based on the data presented. All hypothesis testing methods have the same four-step process, which includes stating the hypotheses, formulating an analysis plan, analyzing the sample data, and analyzing the result.

Sage. " Introduction to Hypothesis Testing ," Page 4.

Elder Research. " Who Invented the Null Hypothesis? "

Formplus. " Hypothesis Testing: Definition, Uses, Limitations and Examples ."

hypothesis generating analysis

  • Terms of Service
  • Editorial Policy
  • Privacy Policy
  • Data Science
  • Data Analysis
  • Data Visualization
  • Machine Learning
  • Deep Learning
  • Computer Vision
  • Artificial Intelligence
  • AI ML DS Interview Series
  • AI ML DS Projects series
  • Data Engineering
  • Web Scrapping

Understanding Hypothesis Testing

Hypothesis testing involves formulating assumptions about population parameters based on sample statistics and rigorously evaluating these assumptions against empirical evidence. This article sheds light on the significance of hypothesis testing and the critical steps involved in the process.

What is Hypothesis Testing?

A hypothesis is an assumption or idea, specifically a statistical claim about an unknown population parameter. For example, a judge assumes a person is innocent and verifies this by reviewing evidence and hearing testimony before reaching a verdict.

Hypothesis testing is a statistical method that is used to make a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. 

To test the validity of the claim or assumption about the population parameter:

  • A sample is drawn from the population and analyzed.
  • The results of the analysis are used to decide whether the claim is true or not.
Example: You say an average height in the class is 30 or a boy is taller than a girl. All of these is an assumption that we are assuming, and we need some statistical way to prove these. We need some mathematical conclusion whatever we are assuming is true.

Defining Hypotheses

  • Null hypothesis (H 0 ): In statistics, the null hypothesis is a general statement or default position that there is no relationship between two measured cases or no relationship among groups. In other words, it is a basic assumption or made based on the problem knowledge. Example : A company’s mean production is 50 units/per da H 0 : [Tex]\mu [/Tex] = 50.
  • Alternative hypothesis (H 1 ): The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis.  Example: A company’s production is not equal to 50 units/per day i.e. H 1 : [Tex]\mu [/Tex] [Tex]\ne [/Tex] 50.

Key Terms of Hypothesis Testing

  • Level of significance : It refers to the degree of significance in which we accept or reject the null hypothesis. 100% accuracy is not possible for accepting a hypothesis, so we, therefore, select a level of significance that is usually 5%. This is normally denoted with  [Tex]\alpha[/Tex] and generally, it is 0.05 or 5%, which means your output should be 95% confident to give a similar kind of result in each sample.
  • P-value: The P value , or calculated probability, is the probability of finding the observed/extreme results when the null hypothesis(H0) of a study-given problem is true. If your P-value is less than the chosen significance level then you reject the null hypothesis i.e. accept that your sample claims to support the alternative hypothesis.
  • Test Statistic: The test statistic is a numerical value calculated from sample data during a hypothesis test, used to determine whether to reject the null hypothesis. It is compared to a critical value or p-value to make decisions about the statistical significance of the observed results.
  • Critical value : The critical value in statistics is a threshold or cutoff point used to determine whether to reject the null hypothesis in a hypothesis test.
  • Degrees of freedom: Degrees of freedom are associated with the variability or freedom one has in estimating a parameter. The degrees of freedom are related to the sample size and determine the shape.

Why do we use Hypothesis Testing?

Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing. 

One-Tailed and Two-Tailed Test

One tailed test focuses on one direction, either greater than or less than a specified value. We use a one-tailed test when there is a clear directional expectation based on prior knowledge or theory. The critical region is located on only one side of the distribution curve. If the sample falls into this critical region, the null hypothesis is rejected in favor of the alternative hypothesis.

One-Tailed Test

There are two types of one-tailed test:

  • Left-Tailed (Left-Sided) Test: The alternative hypothesis asserts that the true parameter value is less than the null hypothesis. Example: H 0 ​: [Tex]\mu \geq 50 [/Tex] and H 1 : [Tex]\mu < 50 [/Tex]
  • Right-Tailed (Right-Sided) Test : The alternative hypothesis asserts that the true parameter value is greater than the null hypothesis. Example: H 0 : [Tex]\mu \leq50 [/Tex] and H 1 : [Tex]\mu > 50 [/Tex]

Two-Tailed Test

A two-tailed test considers both directions, greater than and less than a specified value.We use a two-tailed test when there is no specific directional expectation, and want to detect any significant difference.

Example: H 0 : [Tex]\mu = [/Tex] 50 and H 1 : [Tex]\mu \neq 50 [/Tex]

To delve deeper into differences into both types of test: Refer to link

What are Type 1 and Type 2 errors in Hypothesis Testing?

In hypothesis testing, Type I and Type II errors are two possible errors that researchers can make when drawing conclusions about a population based on a sample of data. These errors are associated with the decisions made regarding the null hypothesis and the alternative hypothesis.

  • Type I error: When we reject the null hypothesis, although that hypothesis was true. Type I error is denoted by alpha( [Tex]\alpha [/Tex] ).
  • Type II errors : When we accept the null hypothesis, but it is false. Type II errors are denoted by beta( [Tex]\beta [/Tex] ).


Null Hypothesis is True

Null Hypothesis is False

Null Hypothesis is True (Accept)

Correct Decision

Type II Error (False Negative)

Alternative Hypothesis is True (Reject)

Type I Error (False Positive)

Correct Decision

How does Hypothesis Testing work?

Step 1: define null and alternative hypothesis.

State the null hypothesis ( [Tex]H_0 [/Tex] ), representing no effect, and the alternative hypothesis ( [Tex]H_1 [/Tex] ​), suggesting an effect or difference.

We first identify the problem about which we want to make an assumption keeping in mind that our assumption should be contradictory to one another, assuming Normally distributed data.

Step 2 – Choose significance level

Select a significance level ( [Tex]\alpha [/Tex] ), typically 0.05, to determine the threshold for rejecting the null hypothesis. It provides validity to our hypothesis test, ensuring that we have sufficient data to back up our claims. Usually, we determine our significance level beforehand of the test. The p-value is the criterion used to calculate our significance value.

Step 3 – Collect and Analyze data.

Gather relevant data through observation or experimentation. Analyze the data using appropriate statistical methods to obtain a test statistic.

Step 4-Calculate Test Statistic

The data for the tests are evaluated in this step we look for various scores based on the characteristics of data. The choice of the test statistic depends on the type of hypothesis test being conducted.

There are various hypothesis tests, each appropriate for various goal to calculate our test. This could be a Z-test , Chi-square , T-test , and so on.

  • Z-test : If population means and standard deviations are known. Z-statistic is commonly used.
  • t-test : If population standard deviations are unknown. and sample size is small than t-test statistic is more appropriate.
  • Chi-square test : Chi-square test is used for categorical data or for testing independence in contingency tables
  • F-test : F-test is often used in analysis of variance (ANOVA) to compare variances or test the equality of means across multiple groups.

We have a smaller dataset, So, T-test is more appropriate to test our hypothesis.

T-statistic is a measure of the difference between the means of two groups relative to the variability within each group. It is calculated as the difference between the sample means divided by the standard error of the difference. It is also known as the t-value or t-score.

Step 5 – Comparing Test Statistic:

In this stage, we decide where we should accept the null hypothesis or reject the null hypothesis. There are two ways to decide where we should accept or reject the null hypothesis.

Method A: Using Crtical values

Comparing the test statistic and tabulated critical value we have,

  • If Test Statistic>Critical Value: Reject the null hypothesis.
  • If Test Statistic≤Critical Value: Fail to reject the null hypothesis.

Note: Critical values are predetermined threshold values that are used to make a decision in hypothesis testing. To determine critical values for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Method B: Using P-values

We can also come to an conclusion using the p-value,

  • If the p-value is less than or equal to the significance level i.e. ( [Tex]p\leq\alpha [/Tex] ), you reject the null hypothesis. This indicates that the observed results are unlikely to have occurred by chance alone, providing evidence in favor of the alternative hypothesis.
  • If the p-value is greater than the significance level i.e. ( [Tex]p\geq \alpha[/Tex] ), you fail to reject the null hypothesis. This suggests that the observed results are consistent with what would be expected under the null hypothesis.

Note : The p-value is the probability of obtaining a test statistic as extreme as, or more extreme than, the one observed in the sample, assuming the null hypothesis is true. To determine p-value for hypothesis testing, we typically refer to a statistical distribution table , such as the normal distribution or t-distribution tables based on.

Step 7- Interpret the Results

At last, we can conclude our experiment using method A or B.

Calculating test statistic

To validate our hypothesis about a population parameter we use statistical functions . We use the z-score, p-value, and level of significance(alpha) to make evidence for our hypothesis for normally distributed data .

1. Z-statistics:

When population means and standard deviations are known.

[Tex]z = \frac{\bar{x} – \mu}{\frac{\sigma}{\sqrt{n}}}[/Tex]

  • [Tex]\bar{x} [/Tex] is the sample mean,
  • μ represents the population mean, 
  • σ is the standard deviation
  • and n is the size of the sample.

2. T-Statistics

T test is used when n<30,

t-statistic calculation is given by:

[Tex]t=\frac{x̄-μ}{s/\sqrt{n}} [/Tex]

  • t = t-score,
  • x̄ = sample mean
  • μ = population mean,
  • s = standard deviation of the sample,
  • n = sample size

3. Chi-Square Test

Chi-Square Test for Independence categorical Data (Non-normally distributed) using:

[Tex]\chi^2 = \sum \frac{(O_{ij} – E_{ij})^2}{E_{ij}}[/Tex]

  • [Tex]O_{ij}[/Tex] is the observed frequency in cell [Tex]{ij} [/Tex]
  • i,j are the rows and columns index respectively.
  • [Tex]E_{ij}[/Tex] is the expected frequency in cell [Tex]{ij}[/Tex] , calculated as : [Tex]\frac{{\text{{Row total}} \times \text{{Column total}}}}{{\text{{Total observations}}}}[/Tex]

Real life Examples of Hypothesis Testing

Let’s examine hypothesis testing using two real life situations,

Case A: D oes a New Drug Affect Blood Pressure?

Imagine a pharmaceutical company has developed a new drug that they believe can effectively lower blood pressure in patients with hypertension. Before bringing the drug to market, they need to conduct a study to assess its impact on blood pressure.

  • Before Treatment: 120, 122, 118, 130, 125, 128, 115, 121, 123, 119
  • After Treatment: 115, 120, 112, 128, 122, 125, 110, 117, 119, 114

Step 1 : Define the Hypothesis

  • Null Hypothesis : (H 0 )The new drug has no effect on blood pressure.
  • Alternate Hypothesis : (H 1 )The new drug has an effect on blood pressure.

Step 2: Define the Significance level

Let’s consider the Significance level at 0.05, indicating rejection of the null hypothesis.

If the evidence suggests less than a 5% chance of observing the results due to random variation.

Step 3 : Compute the test statistic

Using paired T-test analyze the data to obtain a test statistic and a p-value.

The test statistic (e.g., T-statistic) is calculated based on the differences between blood pressure measurements before and after treatment.

t = m/(s/√n)

  • m  = mean of the difference i.e X after, X before
  • s  = standard deviation of the difference (d) i.e d i ​= X after, i ​− X before,
  • n  = sample size,

then, m= -3.9, s= 1.8 and n= 10

we, calculate the , T-statistic = -9 based on the formula for paired t test

Step 4: Find the p-value

The calculated t-statistic is -9 and degrees of freedom df = 9, you can find the p-value using statistical software or a t-distribution table.

thus, p-value = 8.538051223166285e-06

Step 5: Result

  • If the p-value is less than or equal to 0.05, the researchers reject the null hypothesis.
  • If the p-value is greater than 0.05, they fail to reject the null hypothesis.

Conclusion: Since the p-value (8.538051223166285e-06) is less than the significance level (0.05), the researchers reject the null hypothesis. There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

Python Implementation of Case A

Let’s create hypothesis testing with python, where we are testing whether a new drug affects blood pressure. For this example, we will use a paired T-test. We’ll use the scipy.stats library for the T-test.

Scipy is a mathematical library in Python that is mostly used for mathematical equations and computations.

We will implement our first real life problem via python,

import numpy as np from scipy import stats # Data before_treatment = np . array ([ 120 , 122 , 118 , 130 , 125 , 128 , 115 , 121 , 123 , 119 ]) after_treatment = np . array ([ 115 , 120 , 112 , 128 , 122 , 125 , 110 , 117 , 119 , 114 ]) # Step 1: Null and Alternate Hypotheses # Null Hypothesis: The new drug has no effect on blood pressure. # Alternate Hypothesis: The new drug has an effect on blood pressure. null_hypothesis = "The new drug has no effect on blood pressure." alternate_hypothesis = "The new drug has an effect on blood pressure." # Step 2: Significance Level alpha = 0.05 # Step 3: Paired T-test t_statistic , p_value = stats . ttest_rel ( after_treatment , before_treatment ) # Step 4: Calculate T-statistic manually m = np . mean ( after_treatment - before_treatment ) s = np . std ( after_treatment - before_treatment , ddof = 1 ) # using ddof=1 for sample standard deviation n = len ( before_treatment ) t_statistic_manual = m / ( s / np . sqrt ( n )) # Step 5: Decision if p_value <= alpha : decision = "Reject" else : decision = "Fail to reject" # Conclusion if decision == "Reject" : conclusion = "There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different." else : conclusion = "There is insufficient evidence to claim a significant difference in average blood pressure before and after treatment with the new drug." # Display results print ( "T-statistic (from scipy):" , t_statistic ) print ( "P-value (from scipy):" , p_value ) print ( "T-statistic (calculated manually):" , t_statistic_manual ) print ( f "Decision: { decision } the null hypothesis at alpha= { alpha } ." ) print ( "Conclusion:" , conclusion )

T-statistic (from scipy): -9.0 P-value (from scipy): 8.538051223166285e-06 T-statistic (calculated manually): -9.0 Decision: Reject the null hypothesis at alpha=0.05. Conclusion: There is statistically significant evidence that the average blood pressure before and after treatment with the new drug is different.

In the above example, given the T-statistic of approximately -9 and an extremely small p-value, the results indicate a strong case to reject the null hypothesis at a significance level of 0.05. 

  • The results suggest that the new drug, treatment, or intervention has a significant effect on lowering blood pressure.
  • The negative T-statistic indicates that the mean blood pressure after treatment is significantly lower than the assumed population mean before treatment.

Case B : Cholesterol level in a population

Data: A sample of 25 individuals is taken, and their cholesterol levels are measured.

Cholesterol Levels (mg/dL): 205, 198, 210, 190, 215, 205, 200, 192, 198, 205, 198, 202, 208, 200, 205, 198, 205, 210, 192, 205, 198, 205, 210, 192, 205.

Populations Mean = 200

Population Standard Deviation (σ): 5 mg/dL(given for this problem)

Step 1: Define the Hypothesis

  • Null Hypothesis (H 0 ): The average cholesterol level in a population is 200 mg/dL.
  • Alternate Hypothesis (H 1 ): The average cholesterol level in a population is different from 200 mg/dL.

As the direction of deviation is not given , we assume a two-tailed test, and based on a normal distribution table, the critical values for a significance level of 0.05 (two-tailed) can be calculated through the z-table and are approximately -1.96 and 1.96.

The test statistic is calculated by using the z formula Z = [Tex](203.8 – 200) / (5 \div \sqrt{25}) [/Tex] ​ and we get accordingly , Z =2.039999999999992.

Step 4: Result

Since the absolute value of the test statistic (2.04) is greater than the critical value (1.96), we reject the null hypothesis. And conclude that, there is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL

Python Implementation of Case B

import scipy.stats as stats import math import numpy as np # Given data sample_data = np . array ( [ 205 , 198 , 210 , 190 , 215 , 205 , 200 , 192 , 198 , 205 , 198 , 202 , 208 , 200 , 205 , 198 , 205 , 210 , 192 , 205 , 198 , 205 , 210 , 192 , 205 ]) population_std_dev = 5 population_mean = 200 sample_size = len ( sample_data ) # Step 1: Define the Hypotheses # Null Hypothesis (H0): The average cholesterol level in a population is 200 mg/dL. # Alternate Hypothesis (H1): The average cholesterol level in a population is different from 200 mg/dL. # Step 2: Define the Significance Level alpha = 0.05 # Two-tailed test # Critical values for a significance level of 0.05 (two-tailed) critical_value_left = stats . norm . ppf ( alpha / 2 ) critical_value_right = - critical_value_left # Step 3: Compute the test statistic sample_mean = sample_data . mean () z_score = ( sample_mean - population_mean ) / \ ( population_std_dev / math . sqrt ( sample_size )) # Step 4: Result # Check if the absolute value of the test statistic is greater than the critical values if abs ( z_score ) > max ( abs ( critical_value_left ), abs ( critical_value_right )): print ( "Reject the null hypothesis." ) print ( "There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL." ) else : print ( "Fail to reject the null hypothesis." ) print ( "There is not enough evidence to conclude that the average cholesterol level in the population is different from 200 mg/dL." )

Reject the null hypothesis. There is statistically significant evidence that the average cholesterol level in the population is different from 200 mg/dL.

Limitations of Hypothesis Testing

  • Although a useful technique, hypothesis testing does not offer a comprehensive grasp of the topic being studied. Without fully reflecting the intricacy or whole context of the phenomena, it concentrates on certain hypotheses and statistical significance.
  • The accuracy of hypothesis testing results is contingent on the quality of available data and the appropriateness of statistical methods used. Inaccurate data or poorly formulated hypotheses can lead to incorrect conclusions.
  • Relying solely on hypothesis testing may cause analysts to overlook significant patterns or relationships in the data that are not captured by the specific hypotheses being tested. This limitation underscores the importance of complimenting hypothesis testing with other analytical approaches.

Hypothesis testing stands as a cornerstone in statistical analysis, enabling data scientists to navigate uncertainties and draw credible inferences from sample data. By systematically defining null and alternative hypotheses, choosing significance levels, and leveraging statistical tests, researchers can assess the validity of their assumptions. The article also elucidates the critical distinction between Type I and Type II errors, providing a comprehensive understanding of the nuanced decision-making process inherent in hypothesis testing. The real-life example of testing a new drug’s effect on blood pressure using a paired T-test showcases the practical application of these principles, underscoring the importance of statistical rigor in data-driven decision-making.

Frequently Asked Questions (FAQs)

1. what are the 3 types of hypothesis test.

There are three types of hypothesis tests: right-tailed, left-tailed, and two-tailed. Right-tailed tests assess if a parameter is greater, left-tailed if lesser. Two-tailed tests check for non-directional differences, greater or lesser.

2.What are the 4 components of hypothesis testing?

Null Hypothesis ( [Tex]H_o [/Tex] ): No effect or difference exists. Alternative Hypothesis ( [Tex]H_1 [/Tex] ): An effect or difference exists. Significance Level ( [Tex]\alpha [/Tex] ): Risk of rejecting null hypothesis when it’s true (Type I error). Test Statistic: Numerical value representing observed evidence against null hypothesis.

3.What is hypothesis testing in ML?

Statistical method to evaluate the performance and validity of machine learning models. Tests specific hypotheses about model behavior, like whether features influence predictions or if a model generalizes well to unseen data.

4.What is the difference between Pytest and hypothesis in Python?

Pytest purposes general testing framework for Python code while Hypothesis is a Property-based testing framework for Python, focusing on generating test cases based on specified properties of the code.

Please Login to comment...

Similar reads.

  • data-science
  • Best 10 IPTV Service Providers in Germany
  • Python 3.13 Releases | Enhanced REPL for Developers
  • IPTV Anbieter in Deutschland - Top IPTV Anbieter Abonnements
  • Best SSL Certificate Providers in 2024 (Free & Paid)
  • Content Improvement League 2024: From Good To A Great Article

Improve your Coding Skills with Practice

 alt=

What kind of Experience do you want to share?

hypothesis generating analysis

Hypothesis Generator

Generate hypotheses for your research.

  • Academic Research: Generate hypotheses for your thesis, dissertation, or any academic paper.
  • Data Analysis: Create hypotheses to guide your data exploration and analysis.
  • Market Research: Develop hypotheses to understand market trends and consumer behavior.
  • Product Development: Formulate hypotheses to guide your product testing and development process.
  • Scientific Research: Generate hypotheses for your experiments or observational studies.

New & Trending Tools

Ai sentence rephraser, ai article outliner, ai writing style adapter.

  • A-Z Publications

Annual Review of Psychology

Volume 48, 1997, review article, creative hypothesis generating in psychology: some useful heuristics.

  • William J. McGuire 1
  • View Affiliations Hide Affiliations Affiliations: Department of Psychology, Yale University, 2 Hillhouse Avenue, New Haven, Connecticut P.O. Box 208205, 06520-8205
  • Vol. 48:1-30 (Volume publication date February 1997) https://doi.org/10.1146/annurev.psych.48.1.1
  • © Annual Reviews

To correct a common imbalance in methodology courses, focusing almost entirely on hypothesis-testing issues to the neglect of hypothesis-generating issues which are at least as important, 49 creative heuristics are described, divided into 5 categories and 14 subcategories. Each of these heuristics has often been used to generate hypotheses in psychological research, and each is teachable to students. The 49 heuristics range from common sense perceptiveness of the oddity of natural occurrences to use of sophisticated quantitative data analyses in ways that provoke new insights.

Article metrics loading...

Full text loading...

Literature Cited

  • Abelson RP , Aronson E , McGuire WJ , Newcomb TM , Rosenberg MJ , Tannenbaum PH . eds 1968 . Theories of Cognitive Consistency . Chicago: Rand-McNally [Google Scholar]
  • Anderson NH . 1982 . Methods of Information Integration Theory . New York: Academic [Google Scholar]
  • Anzieu D . 1986 . Freud's Self-Analysis . Madison, CT: Int. Univ. Press [Google Scholar]
  • Argylle M , Cook M . 1976 . Gaze and Mutual Gaze . Cambridge: Cambridge Univ. Press [Google Scholar]
  • Breckler SJ . 1984 . Empirical validation of affect, behavior, and cognition as distinct components of attitude.. J. Pers. Soc. Psychol. 47 : 1191– 205 [Google Scholar]
  • Brock TC . 1965 . Communicator-recipient similarity and decision change.. J. Pers. Soc. Psychol. 1 : 650– 54 [Google Scholar]
  • Bush RR , Mosteller F . 1955 . Stochastic Models for Learning . New York: Wiley [Google Scholar]
  • Byrne D . 1971 . The Attraction Paradigm . New York: Academic [Google Scholar]
  • Campbell DT . 1963 . Social attitudes and other acquired behavioral dispositions. In Psychology: A Study of a Science , ed. S Koch 6 94– 172 New York: McGraw-Hill [Google Scholar]
  • Cialdini RB . 1993 . Influence: Science and Practice . New York: Harper Collins. 3rd ed [Google Scholar]
  • Collins BE , Hoyt MF . 1972 . Personal responsibility-for-consequences: an integration and extension of the “forced compliance” literature.. J. Exp. Soc. Psychol. 8 : 558– 93 [Google Scholar]
  • Deaux K . 1972 . To err is humanizing: but sex makes a difference.. Represent. Res. Soc. Psychol. 3 : 20– 28 [Google Scholar]
  • Eagly AH . 1974 . Comprehensibility of persuasive arguments as a determinant of opinion change.. J. Pers. Soc. Psychol. 29 : 758– 73 [Google Scholar]
  • Eagly AH , Carli LL . 1981 . Sex of researchers and sex-typed communications as determinants of sex differences in influenceability: a meta-analysis of social influence studies.. Psychol. Bull. 90 : 1– 20 [Google Scholar]
  • Estes WK . 1950 . Toward a statistical theory of learning.. Psychol. Rev. 57 : 94– 107 [Google Scholar]
  • Festinger L . 1957 . A Theory of Cognitive Dissonance . Stanford, CA: Stanford Univ. Press [Google Scholar]
  • Festinger L . 1964 . Conflict, Decision, and Dissonance . Stanford, CA: Stanford Univ. Press [Google Scholar]
  • Greenwald AG , Pratkanis AR , Leippe MR , Baumgardner MH . 1986 . Under what conditions does theory obstruct research progress?. Psychol. Rev. 93 : 216– 29 [Google Scholar]
  • Heilbron JL . 1986 . The Dilemmas of an Up-right Man: Max Planck as Spokesman for German Science . Berkeley, CA: Univ. Calif. Press [Google Scholar]
  • Hornstein HA , LaKind E , Frankel G , Manne S . 1975 . Effects of knowledge about remote social events on prosocial behavior, social conception, and mood.. J. Pers. Soc. Psychol. 32 : 1038– 46 [Google Scholar]
  • Hovland CI . 1952 . A “communication analysis” of concept learning.. Psychol. Rev. 59 : 461– 72 [Google Scholar]
  • Hovland CI . 1959 . Reconciling conflicting results derived from experimental and field studies of attitude change.. Am. Psychol. 14 : 8– 17 [Google Scholar]
  • Hovland CI , Lumsdaine AA , Sheffield FD . 1949 . Studies in Social Psychology in World War II , Vol. 3, Experiments on Mass Communication . Princeton, NJ: Princeton Univ. Press [Google Scholar]
  • Hull CL . 1933 . Hypnosis and Suggestibility . New York: Appleton-Century [Google Scholar]
  • Hull CL . 1952 . A Behavior System . New Haven, CT: Yale Univ. Press [Google Scholar]
  • Hull CL , Hovland CI , Ross RT , Hall M , Perkins DT , Fitch FB . 1940 . Mathematico-deductive Theory of Rote Learning . New Haven, CT: Yale Univ. Press [Google Scholar]
  • Johnson BT , Eagly AH . 1990 . Involvement and persuasion: types, traditions, and the evidence.. Psychol. Bull. 107 : 375– 84 [Google Scholar]
  • McClelland DC . 1961 . The Achieving Society . Princeton, NJ: Van Nostrand [Google Scholar]
  • McGuire AM . 1994 . Helping behaviors in the natural environment: dimensions and correlates of helping.. Pers. Soc. Psychol. Bull. 20 : 45– 56 [Google Scholar]
  • McGuire WJ . 1964 . Inducing resistance to persuasion. In Advances in Experimental Social Psychology , ed. L Berkowitz 1 191– 229 New York: Academic [Google Scholar]
  • McGuire WJ . 1968 . Personality and susceptibility to social influence. In Handbook of Personality Theory and Research , ed. EF Borgatta, WW Lambert 1130– 87 Chicago: Rand-McNally [Google Scholar]
  • McGuire WJ . 1973 . The yin and yang of progress in social psychology: seven koan.. J. Pers. Soc. Psychol. 26 : 446– 56 [Google Scholar]
  • McGuire WJ . 1983 . A contextualist theory of knowledge: its implications for innovation and reform in psychological research. In Advances in Experimental Social Psychology , ed. L Berkowitz 16 1– 47 New York: Academic [Google Scholar]
  • McGuire WJ . 1984 . Search for the self: going beyond self-esteem and the reactive self. In Personality and the Prediction of Behavior , ed. RA Zucker, J Aronoff, AI Rabin 73– 120 New York: Academic [Google Scholar]
  • McGuire WJ . 1985 . Attitudes and attitude change. In Handbook of Social Psychology , ed. G Lindsey, E Aronson pp. 3 233– 346 New York: Random House. 3rd ed [Google Scholar]
  • McGuire WJ . 1986 . The vicissitudes of attitudes and similar representational constructs in twentieth century psychology.. Eur. J. Soc. Psychol. 16 : 89– 130 [Google Scholar]
  • McGuire WJ . 1989 . A perspectivist approach to the strategic planning of programmatic scientific research. In The Psychology of Science: Contributions to Metascience , ed. B Gholson, A Houts, R Neimeyer, WR Shadish 214– 45 New York: Cambridge Univ. Press [Google Scholar]
  • Milgram S . 1976 . Interview. In The Making of Psychology , ed. RI Evans 187– 97 New York: Knopf [Google Scholar]
  • Neustadt RE , May ER . 1986 . Thinking in Time: the Uses of History for Decision Makers . New York: Free Press [Google Scholar]
  • Nisbett RE , Wilson TD . 1977 . Telling more than we can know: verbal report on mental processes.. Psychol. Rev. 84 : 231– 59 [Google Scholar]
  • Ostrom TM . 1988 . Computer simulation: the third symbol system.. J. Exp. Soc. Psychol. 24 : 381– 92 [Google Scholar]
  • Petty RE , Cacioppo J . 1986 . Communication and Persuasion: Central and Peripheral Routes to Attitude Change . New York: Springer-Verlag [Google Scholar]
  • Pratkanis AR , Greenwald AG , Leippe MR , Baumgardner MH . 1988 . In search of reliable persuasion effects. III. The sleeper effect is dead. Long live the sleeper effect J. Pers. Soc. Psychol. 54 : 203– 18 [Google Scholar]
  • Rokeach M . 1973 . The Nature of Human Values . New York: Free Press [Google Scholar]
  • Rumelhart DE , McClelland JL . 1986 . On learning the past tenses of English verbs. In Parallel Distributed Processing , ed. DE Rumelhart, JL McClelland 2 216– 71 Cambridge, MA: MIT Press [Google Scholar]

Data & Media loading...

  • Article Type: Review Article

Most Read This Month

Most cited most cited rss feed, job burnout, executive functions, social cognitive theory: an agentic perspective, on happiness and human potentials: a review of research on hedonic and eudaimonic well-being, sources of method bias in social science research and recommendations on how to control it, mediation analysis, missing data analysis: making it work in the real world, grounded cognition, personality structure: emergence of the five-factor model, motivational beliefs, values, and goals.

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Sweepstakes
  • Guided Meditations
  • Verywell Mind Insights
  • 2024 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

How to Write a Great Hypothesis

Hypothesis Definition, Format, Examples, and Tips

Verywell / Alex Dos Diaz

  • The Scientific Method

Hypothesis Format

Falsifiability of a hypothesis.

  • Operationalization

Hypothesis Types

Hypotheses examples.

  • Collecting Data

A hypothesis is a tentative statement about the relationship between two or more variables. It is a specific, testable prediction about what you expect to happen in a study. It is a preliminary answer to your question that helps guide the research process.

Consider a study designed to examine the relationship between sleep deprivation and test performance. The hypothesis might be: "This study is designed to assess the hypothesis that sleep-deprived people will perform worse on a test than individuals who are not sleep-deprived."

At a Glance

A hypothesis is crucial to scientific research because it offers a clear direction for what the researchers are looking to find. This allows them to design experiments to test their predictions and add to our scientific knowledge about the world. This article explores how a hypothesis is used in psychology research, how to write a good hypothesis, and the different types of hypotheses you might use.

The Hypothesis in the Scientific Method

In the scientific method , whether it involves research in psychology, biology, or some other area, a hypothesis represents what the researchers think will happen in an experiment. The scientific method involves the following steps:

  • Forming a question
  • Performing background research
  • Creating a hypothesis
  • Designing an experiment
  • Collecting data
  • Analyzing the results
  • Drawing conclusions
  • Communicating the results

The hypothesis is a prediction, but it involves more than a guess. Most of the time, the hypothesis begins with a question which is then explored through background research. At this point, researchers then begin to develop a testable hypothesis.

Unless you are creating an exploratory study, your hypothesis should always explain what you  expect  to happen.

In a study exploring the effects of a particular drug, the hypothesis might be that researchers expect the drug to have some type of effect on the symptoms of a specific illness. In psychology, the hypothesis might focus on how a certain aspect of the environment might influence a particular behavior.

Remember, a hypothesis does not have to be correct. While the hypothesis predicts what the researchers expect to see, the goal of the research is to determine whether this guess is right or wrong. When conducting an experiment, researchers might explore numerous factors to determine which ones might contribute to the ultimate outcome.

In many cases, researchers may find that the results of an experiment  do not  support the original hypothesis. When writing up these results, the researchers might suggest other options that should be explored in future studies.

In many cases, researchers might draw a hypothesis from a specific theory or build on previous research. For example, prior research has shown that stress can impact the immune system. So a researcher might hypothesize: "People with high-stress levels will be more likely to contract a common cold after being exposed to the virus than people who have low-stress levels."

In other instances, researchers might look at commonly held beliefs or folk wisdom. "Birds of a feather flock together" is one example of folk adage that a psychologist might try to investigate. The researcher might pose a specific hypothesis that "People tend to select romantic partners who are similar to them in interests and educational level."

Elements of a Good Hypothesis

So how do you write a good hypothesis? When trying to come up with a hypothesis for your research or experiments, ask yourself the following questions:

  • Is your hypothesis based on your research on a topic?
  • Can your hypothesis be tested?
  • Does your hypothesis include independent and dependent variables?

Before you come up with a specific hypothesis, spend some time doing background research. Once you have completed a literature review, start thinking about potential questions you still have. Pay attention to the discussion section in the  journal articles you read . Many authors will suggest questions that still need to be explored.

How to Formulate a Good Hypothesis

To form a hypothesis, you should take these steps:

  • Collect as many observations about a topic or problem as you can.
  • Evaluate these observations and look for possible causes of the problem.
  • Create a list of possible explanations that you might want to explore.
  • After you have developed some possible hypotheses, think of ways that you could confirm or disprove each hypothesis through experimentation. This is known as falsifiability.

In the scientific method ,  falsifiability is an important part of any valid hypothesis. In order to test a claim scientifically, it must be possible that the claim could be proven false.

Students sometimes confuse the idea of falsifiability with the idea that it means that something is false, which is not the case. What falsifiability means is that  if  something was false, then it is possible to demonstrate that it is false.

One of the hallmarks of pseudoscience is that it makes claims that cannot be refuted or proven false.

The Importance of Operational Definitions

A variable is a factor or element that can be changed and manipulated in ways that are observable and measurable. However, the researcher must also define how the variable will be manipulated and measured in the study.

Operational definitions are specific definitions for all relevant factors in a study. This process helps make vague or ambiguous concepts detailed and measurable.

For example, a researcher might operationally define the variable " test anxiety " as the results of a self-report measure of anxiety experienced during an exam. A "study habits" variable might be defined by the amount of studying that actually occurs as measured by time.

These precise descriptions are important because many things can be measured in various ways. Clearly defining these variables and how they are measured helps ensure that other researchers can replicate your results.

Replicability

One of the basic principles of any type of scientific research is that the results must be replicable.

Replication means repeating an experiment in the same way to produce the same results. By clearly detailing the specifics of how the variables were measured and manipulated, other researchers can better understand the results and repeat the study if needed.

Some variables are more difficult than others to define. For example, how would you operationally define a variable such as aggression ? For obvious ethical reasons, researchers cannot create a situation in which a person behaves aggressively toward others.

To measure this variable, the researcher must devise a measurement that assesses aggressive behavior without harming others. The researcher might utilize a simulated task to measure aggressiveness in this situation.

Hypothesis Checklist

  • Does your hypothesis focus on something that you can actually test?
  • Does your hypothesis include both an independent and dependent variable?
  • Can you manipulate the variables?
  • Can your hypothesis be tested without violating ethical standards?

The hypothesis you use will depend on what you are investigating and hoping to find. Some of the main types of hypotheses that you might use include:

  • Simple hypothesis : This type of hypothesis suggests there is a relationship between one independent variable and one dependent variable.
  • Complex hypothesis : This type suggests a relationship between three or more variables, such as two independent and dependent variables.
  • Null hypothesis : This hypothesis suggests no relationship exists between two or more variables.
  • Alternative hypothesis : This hypothesis states the opposite of the null hypothesis.
  • Statistical hypothesis : This hypothesis uses statistical analysis to evaluate a representative population sample and then generalizes the findings to the larger group.
  • Logical hypothesis : This hypothesis assumes a relationship between variables without collecting data or evidence.

A hypothesis often follows a basic format of "If {this happens} then {this will happen}." One way to structure your hypothesis is to describe what will happen to the  dependent variable  if you change the  independent variable .

The basic format might be: "If {these changes are made to a certain independent variable}, then we will observe {a change in a specific dependent variable}."

A few examples of simple hypotheses:

  • "Students who eat breakfast will perform better on a math exam than students who do not eat breakfast."
  • "Students who experience test anxiety before an English exam will get lower scores than students who do not experience test anxiety."​
  • "Motorists who talk on the phone while driving will be more likely to make errors on a driving course than those who do not talk on the phone."
  • "Children who receive a new reading intervention will have higher reading scores than students who do not receive the intervention."

Examples of a complex hypothesis include:

  • "People with high-sugar diets and sedentary activity levels are more likely to develop depression."
  • "Younger people who are regularly exposed to green, outdoor areas have better subjective well-being than older adults who have limited exposure to green spaces."

Examples of a null hypothesis include:

  • "There is no difference in anxiety levels between people who take St. John's wort supplements and those who do not."
  • "There is no difference in scores on a memory recall task between children and adults."
  • "There is no difference in aggression levels between children who play first-person shooter games and those who do not."

Examples of an alternative hypothesis:

  • "People who take St. John's wort supplements will have less anxiety than those who do not."
  • "Adults will perform better on a memory task than children."
  • "Children who play first-person shooter games will show higher levels of aggression than children who do not." 

Collecting Data on Your Hypothesis

Once a researcher has formed a testable hypothesis, the next step is to select a research design and start collecting data. The research method depends largely on exactly what they are studying. There are two basic types of research methods: descriptive research and experimental research.

Descriptive Research Methods

Descriptive research such as  case studies ,  naturalistic observations , and surveys are often used when  conducting an experiment is difficult or impossible. These methods are best used to describe different aspects of a behavior or psychological phenomenon.

Once a researcher has collected data using descriptive methods, a  correlational study  can examine how the variables are related. This research method might be used to investigate a hypothesis that is difficult to test experimentally.

Experimental Research Methods

Experimental methods  are used to demonstrate causal relationships between variables. In an experiment, the researcher systematically manipulates a variable of interest (known as the independent variable) and measures the effect on another variable (known as the dependent variable).

Unlike correlational studies, which can only be used to determine if there is a relationship between two variables, experimental methods can be used to determine the actual nature of the relationship—whether changes in one variable actually  cause  another to change.

The hypothesis is a critical part of any scientific exploration. It represents what researchers expect to find in a study or experiment. In situations where the hypothesis is unsupported by the research, the research still has value. Such research helps us better understand how different aspects of the natural world relate to one another. It also helps us develop new hypotheses that can then be tested in the future.

Thompson WH, Skau S. On the scope of scientific hypotheses .  R Soc Open Sci . 2023;10(8):230607. doi:10.1098/rsos.230607

Taran S, Adhikari NKJ, Fan E. Falsifiability in medicine: what clinicians can learn from Karl Popper [published correction appears in Intensive Care Med. 2021 Jun 17;:].  Intensive Care Med . 2021;47(9):1054-1056. doi:10.1007/s00134-021-06432-z

Eyler AA. Research Methods for Public Health . 1st ed. Springer Publishing Company; 2020. doi:10.1891/9780826182067.0004

Nosek BA, Errington TM. What is replication ?  PLoS Biol . 2020;18(3):e3000691. doi:10.1371/journal.pbio.3000691

Aggarwal R, Ranganathan P. Study designs: Part 2 - Descriptive studies .  Perspect Clin Res . 2019;10(1):34-36. doi:10.4103/picr.PICR_154_18

Nevid J. Psychology: Concepts and Applications. Wadworth, 2013.

By Kendra Cherry, MSEd Kendra Cherry, MS, is a psychosocial rehabilitation specialist, psychology educator, and author of the "Everything Psychology Book."

Educational resources and simple solutions for your research journey

Research hypothesis: What it is, how to write it, types, and examples

What is a Research Hypothesis: How to Write it, Types, and Examples

hypothesis generating analysis

Any research begins with a research question and a research hypothesis . A research question alone may not suffice to design the experiment(s) needed to answer it. A hypothesis is central to the scientific method. But what is a hypothesis ? A hypothesis is a testable statement that proposes a possible explanation to a phenomenon, and it may include a prediction. Next, you may ask what is a research hypothesis ? Simply put, a research hypothesis is a prediction or educated guess about the relationship between the variables that you want to investigate.  

It is important to be thorough when developing your research hypothesis. Shortcomings in the framing of a hypothesis can affect the study design and the results. A better understanding of the research hypothesis definition and characteristics of a good hypothesis will make it easier for you to develop your own hypothesis for your research. Let’s dive in to know more about the types of research hypothesis , how to write a research hypothesis , and some research hypothesis examples .  

Table of Contents

What is a hypothesis ?  

A hypothesis is based on the existing body of knowledge in a study area. Framed before the data are collected, a hypothesis states the tentative relationship between independent and dependent variables, along with a prediction of the outcome.  

What is a research hypothesis ?  

Young researchers starting out their journey are usually brimming with questions like “ What is a hypothesis ?” “ What is a research hypothesis ?” “How can I write a good research hypothesis ?”   

A research hypothesis is a statement that proposes a possible explanation for an observable phenomenon or pattern. It guides the direction of a study and predicts the outcome of the investigation. A research hypothesis is testable, i.e., it can be supported or disproven through experimentation or observation.     

hypothesis generating analysis

Characteristics of a good hypothesis  

Here are the characteristics of a good hypothesis :  

  • Clearly formulated and free of language errors and ambiguity  
  • Concise and not unnecessarily verbose  
  • Has clearly defined variables  
  • Testable and stated in a way that allows for it to be disproven  
  • Can be tested using a research design that is feasible, ethical, and practical   
  • Specific and relevant to the research problem  
  • Rooted in a thorough literature search  
  • Can generate new knowledge or understanding.  

How to create an effective research hypothesis  

A study begins with the formulation of a research question. A researcher then performs background research. This background information forms the basis for building a good research hypothesis . The researcher then performs experiments, collects, and analyzes the data, interprets the findings, and ultimately, determines if the findings support or negate the original hypothesis.  

Let’s look at each step for creating an effective, testable, and good research hypothesis :  

  • Identify a research problem or question: Start by identifying a specific research problem.   
  • Review the literature: Conduct an in-depth review of the existing literature related to the research problem to grasp the current knowledge and gaps in the field.   
  • Formulate a clear and testable hypothesis : Based on the research question, use existing knowledge to form a clear and testable hypothesis . The hypothesis should state a predicted relationship between two or more variables that can be measured and manipulated. Improve the original draft till it is clear and meaningful.  
  • State the null hypothesis: The null hypothesis is a statement that there is no relationship between the variables you are studying.   
  • Define the population and sample: Clearly define the population you are studying and the sample you will be using for your research.  
  • Select appropriate methods for testing the hypothesis: Select appropriate research methods, such as experiments, surveys, or observational studies, which will allow you to test your research hypothesis .  

Remember that creating a research hypothesis is an iterative process, i.e., you might have to revise it based on the data you collect. You may need to test and reject several hypotheses before answering the research problem.  

How to write a research hypothesis  

When you start writing a research hypothesis , you use an “if–then” statement format, which states the predicted relationship between two or more variables. Clearly identify the independent variables (the variables being changed) and the dependent variables (the variables being measured), as well as the population you are studying. Review and revise your hypothesis as needed.  

An example of a research hypothesis in this format is as follows:  

“ If [athletes] follow [cold water showers daily], then their [endurance] increases.”  

Population: athletes  

Independent variable: daily cold water showers  

Dependent variable: endurance  

You may have understood the characteristics of a good hypothesis . But note that a research hypothesis is not always confirmed; a researcher should be prepared to accept or reject the hypothesis based on the study findings.  

hypothesis generating analysis

Research hypothesis checklist  

Following from above, here is a 10-point checklist for a good research hypothesis :  

  • Testable: A research hypothesis should be able to be tested via experimentation or observation.  
  • Specific: A research hypothesis should clearly state the relationship between the variables being studied.  
  • Based on prior research: A research hypothesis should be based on existing knowledge and previous research in the field.  
  • Falsifiable: A research hypothesis should be able to be disproven through testing.  
  • Clear and concise: A research hypothesis should be stated in a clear and concise manner.  
  • Logical: A research hypothesis should be logical and consistent with current understanding of the subject.  
  • Relevant: A research hypothesis should be relevant to the research question and objectives.  
  • Feasible: A research hypothesis should be feasible to test within the scope of the study.  
  • Reflects the population: A research hypothesis should consider the population or sample being studied.  
  • Uncomplicated: A good research hypothesis is written in a way that is easy for the target audience to understand.  

By following this research hypothesis checklist , you will be able to create a research hypothesis that is strong, well-constructed, and more likely to yield meaningful results.  

Research hypothesis: What it is, how to write it, types, and examples

Types of research hypothesis  

Different types of research hypothesis are used in scientific research:  

1. Null hypothesis:

A null hypothesis states that there is no change in the dependent variable due to changes to the independent variable. This means that the results are due to chance and are not significant. A null hypothesis is denoted as H0 and is stated as the opposite of what the alternative hypothesis states.   

Example: “ The newly identified virus is not zoonotic .”  

2. Alternative hypothesis:

This states that there is a significant difference or relationship between the variables being studied. It is denoted as H1 or Ha and is usually accepted or rejected in favor of the null hypothesis.  

Example: “ The newly identified virus is zoonotic .”  

3. Directional hypothesis :

This specifies the direction of the relationship or difference between variables; therefore, it tends to use terms like increase, decrease, positive, negative, more, or less.   

Example: “ The inclusion of intervention X decreases infant mortality compared to the original treatment .”   

4. Non-directional hypothesis:

While it does not predict the exact direction or nature of the relationship between the two variables, a non-directional hypothesis states the existence of a relationship or difference between variables but not the direction, nature, or magnitude of the relationship. A non-directional hypothesis may be used when there is no underlying theory or when findings contradict previous research.  

Example, “ Cats and dogs differ in the amount of affection they express .”  

5. Simple hypothesis :

A simple hypothesis only predicts the relationship between one independent and another independent variable.  

Example: “ Applying sunscreen every day slows skin aging .”  

6 . Complex hypothesis :

A complex hypothesis states the relationship or difference between two or more independent and dependent variables.   

Example: “ Applying sunscreen every day slows skin aging, reduces sun burn, and reduces the chances of skin cancer .” (Here, the three dependent variables are slowing skin aging, reducing sun burn, and reducing the chances of skin cancer.)  

7. Associative hypothesis:  

An associative hypothesis states that a change in one variable results in the change of the other variable. The associative hypothesis defines interdependency between variables.  

Example: “ There is a positive association between physical activity levels and overall health .”  

8 . Causal hypothesis:

A causal hypothesis proposes a cause-and-effect interaction between variables.  

Example: “ Long-term alcohol use causes liver damage .”  

Note that some of the types of research hypothesis mentioned above might overlap. The types of hypothesis chosen will depend on the research question and the objective of the study.  

hypothesis generating analysis

Research hypothesis examples  

Here are some good research hypothesis examples :  

“The use of a specific type of therapy will lead to a reduction in symptoms of depression in individuals with a history of major depressive disorder.”  

“Providing educational interventions on healthy eating habits will result in weight loss in overweight individuals.”  

“Plants that are exposed to certain types of music will grow taller than those that are not exposed to music.”  

“The use of the plant growth regulator X will lead to an increase in the number of flowers produced by plants.”  

Characteristics that make a research hypothesis weak are unclear variables, unoriginality, being too general or too vague, and being untestable. A weak hypothesis leads to weak research and improper methods.   

Some bad research hypothesis examples (and the reasons why they are “bad”) are as follows:  

“This study will show that treatment X is better than any other treatment . ” (This statement is not testable, too broad, and does not consider other treatments that may be effective.)  

“This study will prove that this type of therapy is effective for all mental disorders . ” (This statement is too broad and not testable as mental disorders are complex and different disorders may respond differently to different types of therapy.)  

“Plants can communicate with each other through telepathy . ” (This statement is not testable and lacks a scientific basis.)  

Importance of testable hypothesis  

If a research hypothesis is not testable, the results will not prove or disprove anything meaningful. The conclusions will be vague at best. A testable hypothesis helps a researcher focus on the study outcome and understand the implication of the question and the different variables involved. A testable hypothesis helps a researcher make precise predictions based on prior research.  

To be considered testable, there must be a way to prove that the hypothesis is true or false; further, the results of the hypothesis must be reproducible.  

Research hypothesis: What it is, how to write it, types, and examples

Frequently Asked Questions (FAQs) on research hypothesis  

1. What is the difference between research question and research hypothesis ?  

A research question defines the problem and helps outline the study objective(s). It is an open-ended statement that is exploratory or probing in nature. Therefore, it does not make predictions or assumptions. It helps a researcher identify what information to collect. A research hypothesis , however, is a specific, testable prediction about the relationship between variables. Accordingly, it guides the study design and data analysis approach.

2. When to reject null hypothesis ?

A null hypothesis should be rejected when the evidence from a statistical test shows that it is unlikely to be true. This happens when the test statistic (e.g., p -value) is less than the defined significance level (e.g., 0.05). Rejecting the null hypothesis does not necessarily mean that the alternative hypothesis is true; it simply means that the evidence found is not compatible with the null hypothesis.  

3. How can I be sure my hypothesis is testable?  

A testable hypothesis should be specific and measurable, and it should state a clear relationship between variables that can be tested with data. To ensure that your hypothesis is testable, consider the following:  

  • Clearly define the key variables in your hypothesis. You should be able to measure and manipulate these variables in a way that allows you to test the hypothesis.  
  • The hypothesis should predict a specific outcome or relationship between variables that can be measured or quantified.   
  • You should be able to collect the necessary data within the constraints of your study.  
  • It should be possible for other researchers to replicate your study, using the same methods and variables.   
  • Your hypothesis should be testable by using appropriate statistical analysis techniques, so you can draw conclusions, and make inferences about the population from the sample data.  
  • The hypothesis should be able to be disproven or rejected through the collection of data.  

4. How do I revise my research hypothesis if my data does not support it?  

If your data does not support your research hypothesis , you will need to revise it or develop a new one. You should examine your data carefully and identify any patterns or anomalies, re-examine your research question, and/or revisit your theory to look for any alternative explanations for your results. Based on your review of the data, literature, and theories, modify your research hypothesis to better align it with the results you obtained. Use your revised hypothesis to guide your research design and data collection. It is important to remain objective throughout the process.  

5. I am performing exploratory research. Do I need to formulate a research hypothesis?  

As opposed to “confirmatory” research, where a researcher has some idea about the relationship between the variables under investigation, exploratory research (or hypothesis-generating research) looks into a completely new topic about which limited information is available. Therefore, the researcher will not have any prior hypotheses. In such cases, a researcher will need to develop a post-hoc hypothesis. A post-hoc research hypothesis is generated after these results are known.  

6. How is a research hypothesis different from a research question?

A research question is an inquiry about a specific topic or phenomenon, typically expressed as a question. It seeks to explore and understand a particular aspect of the research subject. In contrast, a research hypothesis is a specific statement or prediction that suggests an expected relationship between variables. It is formulated based on existing knowledge or theories and guides the research design and data analysis.

7. Can a research hypothesis change during the research process?

Yes, research hypotheses can change during the research process. As researchers collect and analyze data, new insights and information may emerge that require modification or refinement of the initial hypotheses. This can be due to unexpected findings, limitations in the original hypotheses, or the need to explore additional dimensions of the research topic. Flexibility is crucial in research, allowing for adaptation and adjustment of hypotheses to align with the evolving understanding of the subject matter.

8. How many hypotheses should be included in a research study?

The number of research hypotheses in a research study varies depending on the nature and scope of the research. It is not necessary to have multiple hypotheses in every study. Some studies may have only one primary hypothesis, while others may have several related hypotheses. The number of hypotheses should be determined based on the research objectives, research questions, and the complexity of the research topic. It is important to ensure that the hypotheses are focused, testable, and directly related to the research aims.

9. Can research hypotheses be used in qualitative research?

Yes, research hypotheses can be used in qualitative research, although they are more commonly associated with quantitative research. In qualitative research, hypotheses may be formulated as tentative or exploratory statements that guide the investigation. Instead of testing hypotheses through statistical analysis, qualitative researchers may use the hypotheses to guide data collection and analysis, seeking to uncover patterns, themes, or relationships within the qualitative data. The emphasis in qualitative research is often on generating insights and understanding rather than confirming or rejecting specific research hypotheses through statistical testing.

Editage All Access is a subscription-based platform that unifies the best AI tools and services designed to speed up, simplify, and streamline every step of a researcher’s journey. The Editage All Access Pack is a one-of-a-kind subscription that unlocks full access to an AI writing assistant, literature recommender, journal finder, scientific illustration tool, and exclusive discounts on professional publication services from Editage.  

Based on 22+ years of experience in academia, Editage All Access empowers researchers to put their best research forward and move closer to success. Explore our top AI Tools pack, AI Tools + Publication Services pack, or Build Your Own Plan. Find everything a researcher needs to succeed, all in one place –  Get All Access now starting at just $14 a month !    

Related Posts

Back to school 2024 sale

Back to School – Lock-in All Access Pack for a Year at the Best Price

journal turnaround time

Journal Turnaround Time: Researcher.Life and Scholarly Intelligence Join Hands to Empower Researchers with Publication Time Insights 

  • Open access
  • Published: 22 August 2024

CD19 CAR T cells for B cell malignancies: a systematic review and meta-analysis focused on clinical impacts of CAR structural domains, manufacturing conditions, cellular product, doses, patient’s age, and tumor types

  • Erik Montagna   ORCID: orcid.org/0000-0001-6834-0261 1 ,
  • Najla Santos Pacheco de Campos   ORCID: orcid.org/0000-0002-2019-5947 2 , 3 ,
  • Victoria Alves Porto   ORCID: orcid.org/0000-0003-3640-427X 2 ,
  • Giselle Correia Próspero da Silva   ORCID: orcid.org/0000-0002-5679-9734 2 &
  • Eloah Rabello Suarez   ORCID: orcid.org/0000-0002-0521-1056 2 , 3  

BMC Cancer volume  24 , Article number:  1037 ( 2024 ) Cite this article

359 Accesses

1 Altmetric

Metrics details

CD19-targeted chimeric antigen receptors (CAR) T cells are one of the most remarkable cellular therapies for managing B cell malignancies. However, long-term disease-free survival is still a challenge to overcome. Here, we evaluated the influence of different hinge, transmembrane (TM), and costimulatory CAR domains, as well as manufacturing conditions, cellular product type, doses, patient’s age, and tumor types on the clinical outcomes of patients with B cell cancers treated with CD19 CAR T cells. The primary outcome was defined as the best complete response (BCR), and the secondary outcomes were the best objective response (BOR) and 12-month overall survival (OS). The covariates considered were the type of hinge, TM, and costimulatory domains in the CAR, CAR T cell manufacturing conditions, cell population transduced with the CAR, the number of CAR T cell infusions, amount of CAR T cells injected/Kg, CD19 CAR type (name), tumor type, and age. Fifty-six studies (3493 patients) were included in the systematic review and 46 (3421 patients) in the meta-analysis. The overall BCR rate was 56%, with 60% OS and 75% BOR. Younger patients displayed remarkably higher BCR prevalence without differences in OS. The presence of CD28 in the CAR’s hinge, TM, and costimulatory domains improved all outcomes evaluated. Doses from one to 4.9 million cells/kg resulted in better clinical outcomes. Our data also suggest that regardless of whether patients have had high objective responses, they might have survival benefits from CD19 CAR T therapy. This meta-analysis is a critical hypothesis-generating instrument, capturing effects in the CD19 CAR T cells literature lacking randomized clinical trials and large observational studies.

Peer Review reports

Introduction

Chimeric antigen receptors (CARs) are artificial cell membrane receptors responsible for immune cell activation. They are constituted by an extracellular binding domain selected against an antigen, usually in the form of a single-chain variable fragment (scFv), a hinge sequence, and a transmembrane domain fused to intracellular costimulatory and stimulatory signaling domains. First-generation CARs had only one CD3ζ chain in the intracellular domain for T cell activation. Second- and third-generation CARs harbor one and two additional intracellular costimulatory domains, respectively, eliciting complete T cell activation. Fourth-generation CARs are based on second or third-generation CARs designed in a vector able to induce the expression of additional transgenic products, constitutively or by induction, such as cytokines or monoclonal antibodies. The CAR expression has been vastly explored in T cells (CAR T cells), and is evolving in other immune cell types, such as NK cells, dendritic cells, and macrophages, ushering in a new era for the treatment of cancer and other diseases [ 1 , 2 ]. In clinical trials, the main domains constituting the hinge part of a CAR are CD28, CD8 alpha, IgG4, or IgG1, while for the transmembrane domain (TM), CD28 or CD8 alpha are the most applied. The costimulatory domains more extensively applied in the clinical setting are CD28 and 4-1BB. CD28 incorporation into the costimulatory domain of CD19 CAR elicits tumor eradication, glycolysis, effector memory maturation, and T cell exhaustion, whereas 4-1BB signaling induces in vivo T cell persistence, mitochondrial biogenesis, and reprogramming towards a central memory T cell phenotype [ 3 ]. Regardless of a few small studies that explored the clinical impact of using different costimulatory domains in the CAR, there is a lack of information about the influence of different hinge or TM domains on the clinical outcomes of patients treated with CAR T cells.

One of the current most effective CAR T cell therapies targets CD19, an antigen expressed by B cells in all stages of development until differentiation in plasmocytes, including B cell malignancies, such as Hodgkin (HL) and non-Hodgkin lymphoma (NHL), acute (ALL) or chronic lymphocytic leukemia (CLL) [ 4 ]. All tumor types treated with this therapy had a high initial complete response (CR) rate, but long-term disease-free survival can still be improved [ 4 ]The therapeutic success of CAR T cells is sometimes discrepant as it is shaped by several factors, boosting the conduction of a comparative analysis to address the global impact of in vivo and ex vivo conditions that influence CD19 CAR T cell performance in clinical trials.

Here, we analyzed the rates of the primary outcome – defined as best complete response (BCR) – and secondary outcomes defined as 12-month overall survival (OS) and best objective response (BOR) of CD19-positive leukemia or lymphoma patients treated with CD19 CAR T cells containing different hinge, transmembrane (TMD), and costimulatory domains. We have also analyzed the impact of different parameters related to CAR T cell manufacturing conditions, such as the type of interleukin used for CAR T cell expansion, CAR T cells activation method, and cell population transduced with the CAR. We have also evaluated the number of CAR T cell infusions, amount of CAR T cells injected/Kg, CD19 CAR type (name), tumor type, and age. This meta-analysis will be helpful as a hypothesis-generating instrument as it tries to capture effects in the literature that is still recent, lacking randomized clinical trials and large observational studies.

Search strategy

We accomplished a systematic review and meta-analysis according to the PRISMA statement [ 5 , 6 ], registered on PROSPERO (CRD42022360268). The main study question is the rate of BCR in patients undergoing treatment for B cell malignancies according to the CD19 CAR T cells hinge, transmembrane, and costimulatory domains. The MEDLINE/PubMed database was searched from the inception until August 2021, using the following keywords: “receptors, chimeric antigen“[MeSH Terms] OR (“receptors“[All Fields] AND “chimeric“[All Fields] AND “antigen“[All Fields]) OR “chimeric antigen receptors“[All Fields] OR (“chimeric“[All Fields] AND “antigen“[All Fields] AND “receptor“[All Fields]) OR “chimeric antigen receptor“[All Fields]) AND “CD19“[All Fields].

Study eligibility criteria

The inclusion criteria were patients with CD19-positive leukemia or lymphoma treated with second or third-generation CD19 CAR T cells. Only studies with original data and in English were included. Grey literature and reference lists from included studies were also considered.

The exclusion criteria were studies with (a) no primary outcome reported, (b) dual CAR, (c) other CAR cells types, such as CAR macrophages, (d) combinations with CAR T cells targeting other molecules or with other targeted or non-targeted therapies, such as hematopoietic stem cell transplant, (e) patients with multiple myeloma and other non-hematological tumors, (f) case series, (g) studies such as meta-analyses, reviews, case reports, protocols, books, letters to the editor, comments or specialists’ opinions, abstracts, and (h) pre-clinical studies. Studies ≤ 10 patients were included in the evidence summary but were excluded from the meta-analysis due to statistical constraints.

Data extraction

Data extracted comprised the rate of successful outcomes versus the sample included in the study, and BCR was defined as the primary outcome. The secondary outcomes were OS and BOR. For the meta-analysis, categorical covariates were the types of hinge, TM, and costimulatory (costimulation) domains in the CAR, CAR T cell manufacturing conditions, such as the interleukin used for CAR T cell expansion, CAR T cells activation method, and cell population transduced with the CAR – PBMCs or other specific subsets – (CAR T cell type), as well as the CD19 CAR type (CAR name), and tumor type. Numerical covariates were patient age, number of CAR T cells injected/Kg, and the number of CAR T cells infusions.

Two independent investigators (ERS and NSPC) screened titles and abstracts with ties resolved by a third person (VAP). Three authors (NSPC, VAP, GCPS) independently performed the full-text review and extracted the data, and ERS resolved disagreements.

Data syntheses

The data was presented in a summary of evidence and synthesized as forest plots, with studies ordered by publication year. All methodological details of the meta-analysis were included in the Supplementary Methods.

Risk of bias assessment

Risk of bias assessment adopted the Modified Institute of Health Economics Tool for bias analysis [ 7 ] and was performed independently by three authors (NSPC, VAP, GCPS).

Statistical analysis

Statistical analysis was performed with RStudio version 1.1.383 (The R Foundation for Statistical Computing, Vienna, Austria), using meta and metafor packages [ 8 , 9 ].

Fifty-six studies were included in the systematic review with a total of 3493 patients, 2904 treated with CAR T, and 2809 patients analyzed for rate estimation of BCR. Of these patients, 1440 presented a CR, and 1587 had an objective response (OR). We have also evaluated 12 months-OS, having 42 studies with a total of 2992 patients included, 2479 patients treated with CAR T, and 2393 patients analyzed, of whom 1567 were alive at 12 months.

A total of 46 studies with more than or equal to 10 patients were included in the meta-analysis involving 3421 patients, of whom 2837 were treated with CAR T and 2746 patients analyzed for rate estimation of the primary outcome BCR, being 1251 patients presenting CR and 1571 presenting OR, one of the secondary outcomes evaluated. For the other secondary outcome assessed, OS, we had 37 studies with 2949 patients, 2439 patients treated with CAR T, and 2356 patients analyzed for OS, of whom 1547 were alive at 12 months. The PRISM flow diagram is present in Fig.  1 , and the summary of evidence in Table  1 .

figure 1

Flow chart for eligibility assessment according to PRISMA guidelines. Based on model reference(Page et al., 2021a). For more information, visit: http://www.prisma-statement.org/

Meta-analysis

General clinical responses of cd19 car t therapy.

The general proportion of BCR was 56% (95%CI: 49 – 63%), the I 2 was 81%, and the τ 2 was 0.7911 indicating a large between-study variance (Fig.  2 ). However, it equals or exceeds 50% in 28 of 46 studies (Fig.  2 ). Table  2 summarizes meta-analysis data for primary outcome BCR (also presented in full version with references as Suppl. Table 1 ). The bias assessment is presented in Fig.  3 .

figure 2

The Forest Plot represents the overall rate of the primary outcome Best Complete Response (BCR) of patients treated with CD19 CAR T therapy based on the studies included in the meta-analysis

figure 3

Funnel, Baujat, and Radial plots showing the heterogeneity observed for the primary outcome Best Complete Response (BCR) of patients treated with CD19 CAR T therapy based on the studies included in the meta-analysis

The general proportion of OS was 60% (95%CI: 53 – 67%), the I 2 was 87%, and the τ 2 was 0.5642 (Suppl. Figure  1 and Suppl. Table 2 ) indicating a moderate between study variance. The overall rate of BOR with CD19 CAR T therapy was 75% (95% CI: 68 – 82%, I 2  = 78%) with a very high between-study variance (τ2 = 1.2262) and rates equal to or above 50% in 40 of 46 studies (Suppl. Figure  2 and Suppl. Table 3 ). Together, these data indicate substantial heterogeneity. The bias assessment for OS and BOR are also presented in Suppl. Figure  3 . All the other forest plots are presented as Suppl. Figure  4 to 39 .

Sensitivity -analysis

Patients under 18 years old had a 79% BCR (95%CI: 65-89%, I 2 :64%), 62% OS (95%CI: 41-80%, I 2 :73%) and 84% BOR (95%CI: 75-90%, I 2 :31%) (Suppl Figs.  4 , 5 and 6 , respectively). Patients above 18 years old presented a 51% BCR (95%CI: 43 − 57%, I 2 :82%), 60% OS (95%CI: 52- 67%, I 2 :88%) and 73% BOR (95%CI: 64-81%, I 2 :79%) (Suppl Figs.  4 , 5 and 6 , respectively).

CD19 CAR T cells manufacturing conditions

Considering interleukin used for CAR T cell expansion, when IL-2 was used we found 58% BCR (95%CI: 50-66%, I 2 :76%), 56% OS (95%CI: 45-66%, I 2 :86%) and 79% BOR (95%CI: 68-87%, I 2 :70%) (Suppl Figs.  7 , 8 and 9 , respectively). When other interleukins were applied, we had a 54% BCR (95%CI: 43-65%, I 2 :72%), 63% OS (95%CI: 50-75%, I 2 :91%), and 73% BOR (95%CI: 64-80%, I 2 :71) (Suppl Figs.  7 , 8 and 9 , respectively).

The BCR (Suppl Fig.  10 ), OS (Suppl Fig.  11 ), and BOR (Suppl Fig.  12 ) proportions were similar for activation and expansion of CAR T cells with anti-CD3/CD28 beads or anti-CD3 mAb. Considering the cell population transduced with the CAR, we have found similar BCR (Suppl Fig.  13 ) and BOR (Suppl Fig.  14 ) rates when using full PBMCs or CD4/CD8 1:1, CD8 only, or other specific subsets. OS rate was higher when using full PBMCs (61%; 95%CI: 53–73%, I 2 : 86%) compared to 55% for CD4/CD8 1:1, CD8 only, or other specific subsets (55%; 95%CI: 35–73%, I 2 : 86%) (Suppl Fig.  15 ).

Number of T cells injected into the patients/Kg

Patients treated with doses between 1 and 4.9 million cells/ kg per injection had BCR rates of 63% (95%CI: 55-71%, I 2 :77%), 60% OS (95%CI: 50-69%, I 2 :85%), and 83% BOR (95%CI: 76-88%, I 2 :74%) (Suppl Figs.  16 , 17 and 18 , respectively). The 5 to 99 million cells/kg group had only three studies and was not considered for comparison (71% BCR; 95%CI: 25-95%, I 2 :62%; 58% OS; 95%CI: 21-88%, I 2 :66%, and 83% BOR, 95%CI:29–98%, I 2 : 64) (Suppl Figs.  16 , 17 and 18 , respectively). Doses superior to 100 million cells/kg showed lower BCR (36%; 95%CI: 28–46%, I 2 :38%), OS (56%, 95%CI: 25–83%, I 2 :94%) and BOR rates (64%, 95%CI: 32-87%, I 2 :69%) (Suppl Figs.  16 , 17 and 18 , respectively).

Number of CAR T cell infusions in the patients

The proportions for a single cell injection were 55% for BCR (95%CI:48-62%, I 2 : 81%), 61% for OS (95%CI:52-69%, I 2 : 88%) and 78% for BOR (95%CI:69-85%, I 2 : 77%) (Suppl Figs.  19 , 20 and 21 , respectively). For two infusions, the number of studies was meager (65% BCR; 95%CI: 0-100, I 2 : 97%; 70% OS, 95%CI:55-82%, I 2 : not applicable) (Suppl Figs.  19 , 20 and 21 , respectively). Studies with three or more infusions showed a 50% BCR rate (50%; 95%CI: 25–74%, I 2 : 74%) and 72% BOR (95%CI: 40–91%, I 2 : 81%) (Suppl Fig.  19 , and 21 , respectively). For OS, the number of studies was also meager (58% OS, 95%CI: 29–83%, I 2 : 66%) (Suppl Fig.  20 ).

CD19 CAR T cell products

For Axicabtagene ciloleucel (Axi-cel), we have found a 62% BCR (95%CI: 56–67%, I 2 : 52%), 68% OS (95%CI: 59–77%%, I 2 : 80%) and 86% BOR rates (95%CI: 78–91%, I 2 : 46%) (Suppl Figs.  22 , 23 and 24 , respectively). Tisagenlecleucel (Tisa-cel) showed 53% BCR (95%CI:38–67%, I 2 : 66%), 61% OS (95%CI:42–76%, I 2 : 92%) and 70% BOR rates (95%CI:59–79%, I 2 : 55%) (Suppl Figs.  22 , 23 and 24 , respectively). Other CD19 CAR T products more recently tested had a 60% BCR (95%CI:40–78%, I 2 :82%), 57% OS (95%CI:52–62%, I 2 :40%), and 67% BOR rates (95%CI:44–86%, I 2 :80%) (Suppl Figs.  22 , 23 and 24 , respectively).

CAR hinge domain

When CD28 was used to construct the CAR hinge domain, we had a 60% BCR (95%CI:55–66%, I 2 : 52%), 65% OS (95%CI:55–74%, I 2 : 81%) and 83% BOR rates (95%CI:73–90%, I 2 : 66%) (Suppl Figs.  25 , 26 and 27 , respectively). For CD8, we observed 56% BCR (95%CI:42–70%, I 2 : 75%), 59% OS (95%CI:46–71%, I 2 : 89%), and 71% BOR (95%CI:58–82%, I 2 : 66%) (Suppl Figs.  25 , 26 and 27 , respectively). IgG4 resulted in 50% BCR (95%CI:35–66%, I 2 : 85%), 50% OS (95%CI:32–59%, I 2 : 84%) and 71% BOR (95%CI: 54–83%, I 2 : 79%) (Suppl Figs.  25 , 26 and 27 , respectively).

CAR transmembrane domain

When the CD28 transmembrane domain was used to build the CAR, we found a 58% BCR (95%CI:48–67%, I 2 : 80%), 61% OS (95%CI:51–70%, I 2 : 85%) and 79% BOR (95%CI:69–86%, I 2 : 80%) (Suppl Figs.  28 , 29 and 30 , respectively). CD8 alpha in the transmembrane resulted in 54% BCR (95%CI:40–68%, I 2 : 73%), 59% OS (95%CI:45–72%, I 2 : 90%) and 70% BOR (95%CI:55–82%, I 2 : 67%) (Suppl Figs.  28 , 29 and 30 , respectively).

CAR costimulatory domain

The CD28 costimulatory domain in the CAR resulted in 60% BCR (95%CI:54–66%, I 2 : 55%), 66% OS (95%CI:57–74%, I 2 : 79%) and 85% BOR rates (95%CI:78–91%, I 2 : 45%), while for 4-1BB we had 56% BCR (95%CI:44–67%, I 2 : 82%), 56% OS (95%CI:45–66%, I 2 : 89%) and 71% BOR (95%CI:61–79%, I 2 : 76%) (Suppl Figs.  31 , 32 and 33 , respectively).

Patients with ALL achieved 73% BCR (95%CI:60–83%, I 2 : 77%), 57% OS (95%CI:45–68%%, I 2 : 67%), and 80% BOR (95%CI:66–89%, I 2 : 64%) %) (Suppl Figs.  34 , 35 and 36 , respectively), while for NHL, the general BCR was 51% (95%CI:45–57%, I 2 : 75%), 59% OS (95%CI:46–72%, I 2 : 92%) and 71% BOR (95%CI:63–78%, I 2 : 74%) (Suppl Figs.  34 , 35 and 36 , respectively).

Meta-regression

The meta-regression showed that the group aged above 18 presented a low but significant amount of heterogeneity explained by this variable (H 2  = 7.5535) and that the moderator is inversely related to BCR, suggesting that the effect size favors the younger patient (estimate= -1.3211; p  = 0.005). Also, costimulation based on CD28 and third-generation CD28/4-1BB presents a small amount of heterogeneity explained (H 2  = 9.1079), but both were statistically significant moderators ( p  = 0.0391 and p  = 0.0493, respectively). For BOR, the attributable heterogeneity for costimulatory domains was H 2  = 7.5535, and CD28 and 4-1BB were significant for this observation ( p  = 0.0047 and p  = 0.0355). The attributable heterogeneity for the CAR T cell product was small (H 2  = 7.4956); however, there was an inverse effect for Tisa-cel and JCAR014 as moderators ( p  = 0.0336 and p  = 0.0097). Finally, for OS, the attributable heterogeneity for the CAR T cell product was H 2  = 6.0343, and only the treatment with JCAR014 presented an inverse and statistically significant moderator effect ( p  = 0.0215).

A predominant low risk of bias was assessed for the primary and the secondary outcomes, presented in Suppl. Figures  37 , 38 , and 39 , respectively.

The pooled 56%BCR found for all CD19 CAR T therapies evaluated herein, with a 60% OS and 75% BOR, corroborates the results found in most CD19 CAR T clinical trials [ 66 ]. However, among the studies included in this meta-analysis, there are also some outliers, such as one published by Ramos et al. (2016), showing only 13% BCR and 19% BOR ( N  = 16 patients, no OS reported), that can be explained by the employment of a first-generation CAR, which usually fails to reach effective antitumor responses [ 67 , 68 ].For comparison, a meta-analysis focused on DLBCL conducted in 2022 by Ying and collaborators showed a similar pooled 63% OS rate and 74% BOR, diverging only by an expressively lower 48% BCR [ 69 ].Additionally, another meta-analysis published in 2021 by Aamir et al., focused on ALL patients, reported an 82% BCR rate. Neither OS nor BOR were reported in this study for comparison [ 70 ]. The difference in pooled BCR from these two studies compared with ours can be explained, at least in part, by the mixed tumor types included in our study, such as ALL, CLL, and other NHL subtypes. When we compared ALL and NHL in our sensitivity analysis for tumor type, the most expressive differences between them were also found for BCR (73 versus 51%), followed by BOR (80 versus 71%) rates, while both tumors resulted in similar OS rates (59 versus 57%). Our data also suggest that regardless of whether patients have had high objective responses or not, they might have survival benefits from CD19 CAR T therapy.

Among the CAR T manufacturing conditions evaluated herein, the cell populations chosen to build the CAR product and the cytokine used for T cell expansion promoted the most relevant differences for the clinical outcomes analyzed, mainly for OS. PBMCs had higher OS but similar BOR and BCR rates compared to CD4/CD8 1:1 clustered with CD8 and other specific subsets for analysis. The clustering of CD4/CD8 1:1, CD8 alone, or others might have influenced the results obtained since there is pre-clinical and clinical evidence that CD4:CD8 1:1 seems to outperform other populations. However, we decided to cluster these groups due to the small number of clinical studies available to evaluate each one of these cell populations separately. ILs different from IL-2 used for CD19 CAR T cell manufacturing showed higher OS rates despite similar BCR and lower BOR, evidencing the necessity of running clinical trials using different cytokines for CAR T cell expansion to evaluate their impacts on clinical responses. The CAR T cell activation and expansion methods were equivalent for all outcomes evaluated.

Considering the covariate age, patients under 18 had notably higher BCR and BOR rates but similar OS compared to older patients. CD19 CAR T cell therapy is known to induce a high clinical response rate in children and young adults, especially with B-ALL, but relapses are still a current issue [ 62 ], explaining, at least in part, the similar OS despite the higher BCR rates found in younger patients.

Regarding the CAR T cell dose effect, higher BCR, BOR, and OS rates were found for patients treated with doses between 1 and 4.9 million cells/kg compared to those with doses greater than 100 million cells/kg. The dose-effect might be biased considering the higher BCR and BOR rates found for younger patients, usually treated with lower CAR T cell doses. Nevertheless, the age bias can be ruled out for the higher OS rates observed for lower CAR T doses since OS was not affected by age. When the number of CAR T infusions was evaluated, we noted that three or more infusions presented lower rates for the evaluated outcomes. This result is critical because higher CAR T doses with repeated infusions are known to enhance toxicity [ 71 , 72 ] despite the evident increased manufacturing cost. These results might affect the design of future comparative CD19 CAR T cells-based clinical trials, which can be focused on testing different dose scales up to 100 million cells/kg, limiting the administration to one or two infusions.

The comparison of different molecules used to build the structural CD19-directed CAR hinge (CD8, CD28, or IgG4), transmembrane (CD8 or CD28), and costimulatory domains (CD28 or 4-1BB) showed that the presence of CD28 in these three domains revealed higher rates for all the clinical outcomes evaluated. It might be possible that the different CAR domains act synergistically since they are part of the same functional full costimulatory molecule in human immune cells. However, we cannot affirm or discard this hypothesis based on our data. Particularly considering OS, the most relevant rate difference was found when CD28 was in the CAR’s hinge and costimulatory regions. For BCR, the rate differences between CD28 and other molecules tested were less relevant. A CAR hinge and transmembrane-based comparison with clinical data has never been performed before in the literature, and our meta-analysis gives us some evidence that must be further investigated in future studies to clarify the possibility of synergism combining different/ equal domains. For the costimulatory domains, data recently published in a meta-analysis focused on patients with diffuse large B-cell lymphoma (DLBCL) treated with CD19 CAR T cells corroborated our findings, showing higher BCR and BOR rates of CD28 (57% BCR and 81% BOR) compared to 4-1BB (42% BCR and 70% BOR). However, they found a non-significant statistical difference between CD28 and 4-1BB considering the 12-month OS rate for DLBCL patients [ 69 ].In the same study, the CD28-based Axi-cel had higher rates for all outcomes evaluated compared with the 4-1BB Tisa-cel CAR T for the treatment of DLBCL patients, with a BCR rate of 57% versus 36%, OS rate of 65% versus 49%, and BOR rate of 82% versus 58%, respectively [ 69 ]A clinical trial comparing CD19 CAR-T containing either CD28 or 4-1BB was performed to treat ten ALL patients, five treated with each type of construction in a dose of 0.62 × 10 6 CAR T cells/kg. This study showed similar responses for both treatments, with the CD28 group resulting in 3 CR, 1 PR, and one no response (NR), and the 4-1BB with 3 CR, 0 PR, and 2 NR. Despite the superior number of NR patients in the 4-1BB group, this group had a unique patient with an ongoing anti-tumor response evaluated five months after treatment [ 73 ].This clinical trial was not conclusive due to the limited number of patients. Still considering the costimulatory domain of the CAR, Cappel and Kochenderfer recently reviewed and compared CAR T cell clinical studies based on different targets and having CD28 or 4-1BB as costimulatory domains, including but not limiting CD19 as a target. This general review showed that the available data from clinical trials do not demonstrate a clear advantage of either CD28-costimulated or 4-1BB-costimulated CARs for treating B cell lymphomas or B-ALL, pointing out that more extensive studies and comparative clinical trials must be performed to allow a conclusion about the performance of the different costimulatory domains against B-cell malignancies [ 74 ].

This study is the pioneer in evaluating the impact of the hinge and TMD CAR domains in addition to costimulatory domains in CD19 CAR T cell’s clinical response for B cell leukemia and lymphoma, which is an essential unanswered question in the field. In summary, several covariates analyzed might have a positive impact on all the evaluated clinical outcomes BCR, OS, and BOR of patients treated with CD19 CAR T cell therapies, such as age inferior to 18 years old, injection of 1 to 4.9 million CAR T cells per kg, with one CAR T cell infusion – without discard a potential efficiency using two doses – and CD28 constituting the hinge, transmembrane, and costimulatory domains of the CAR, as in Axi-cel product, and must be better explored in future comparative clinical trials.

The lack of randomized trials or large observational studies on CAR T cells justifies the implementation of this meta-analysis, which intends to provide insights on the ongoing procedures for further research, raising questions and spotting potential aspects of interest in the current approaches. Due to the unavoidable heterogeneity observed, the results of this meta-analysis are not deemed for clinical decision-making but to improve the understanding of this complex and multifaceted treatment instead. The extrapolation and generalization of the results obtained in this meta-analysis should be made with caution since it may be biased by the different study designs and characteristics considering CAR structures, CAR T cell manufacture conditions, doses, tumor type, autologous cells isolated from each individual heavily pretreated, and other variables.

Data availability

All data generated or analyzed during this study are included in this published article and its supplementary information files.

Abbreviations

Acute lymphocytic leukemia

Best complete response

Best objective response

Chimeric antigen receptors

Chronic lymphocytic leukemia

Complete response

Diffuse large B-cell lymphoma

Hodgkin lymphoma

Non-Hodgkin lymphoma

Overall survival

Objective response

Peripheral blood mononuclear cells

Single-chain variable fragment

Transmembrane domains

Dana H, Chalbatani GM, Jalali SA, Mirzaei HR, Grupp SA, Suarez ER, Raposo C, Webster TJ. CAR-T cells: early successes in blood cancer and challenges in solid tumors. Acta Pharm Sin B. 2021;11(5):1129–47.

Article   CAS   PubMed   Google Scholar  

Mantovani A, Allavena P, Marchesi F, Garlanda C. Macrophages as tools and targets in cancer therapy. Nat Rev Drug Discov. 2022;21(11):799–820.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Kawalekar OU, O’Connor RS, Fraietta JA, Guo L, McGettigan SE, Posey AD Jr., Patel PR, Guedan S, Scholler J, Keith B, et al. Distinct signaling of Coreceptors regulates specific metabolism pathways and impacts Memory Development in CAR T cells. Immunity. 2016;44(2):380–90.

Mohty M, Gautier J, Malard F, Aljurf M, Bazarbachi A, Chabannon C, Kharfan-Dabaja MA, Savani BN, Huang H, Kenderian S, et al. CD19 chimeric antigen receptor-T cells in B-cell leukemia and lymphoma: current status and perspectives. Leukemia. 2019;33(12):2767–78.

Article   PubMed   Google Scholar  

Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. 2021;372:n160.

Article   PubMed   PubMed Central   Google Scholar  

Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. J Clin Epidemiol. 2021;134:178–89.

Grigor EJM, Fergusson DA, Haggar F, Kekre N, Atkins H, Shorr R, Holt RA, Hutton B, Ramsay T, Seftel M, et al. Efficacy and safety of chimeric antigen receptor T-cell (CAR-T) therapy in patients with haematological and solid malignancies: protocol for a systematic review and meta-analysis. BMJ Open. 2017;7(12):e019321.

Balduzzi S, Rucker G, Schwarzer G. How to perform a meta-analysis with R: a practical tutorial. Evid Based Ment Health. 2019;22(4):153–60.

Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36:1–48.

Article   Google Scholar  

Brentjens RJ, Riviere I, Park JH, Davila ML, Wang X, Stefanski J, Taylor C, Yeh R, Bartido S, Borquez-Ojeda O, et al. Safety and persistence of adoptively transferred autologous CD19-targeted T cells in patients with relapsed or chemotherapy refractory B-cell leukemias. Blood. 2011;118(18):4817–28.

Kalos M, Levine BL, Porter DL, Katz S, Grupp SA, Bagg A, June CH. T cells with chimeric antigen receptors have potent antitumor effects and can establish memory in patients with advanced leukemia. Sci Transl Med. 2011;3(95):95ra73.

Kochenderfer JN, Dudley ME, Feldman SA, Wilson WH, Spaner DE, Maric I, Stetler-Stevenson M, Phan GQ, Hughes MS, Sherry RM, et al. B-cell depletion and remissions of malignancy along with cytokine-associated toxicity in a clinical trial of anti-CD19 chimeric-antigen-receptor-transduced T cells. Blood. 2012;119(12):2709–20.

Brentjens RJ, Davila ML, Riviere I, Park J, Wang X, Cowell LG, Bartido S, Stefanski J, Taylor C, Olszewska M, et al. CD19-targeted T cells rapidly induce molecular remissions in adults with chemotherapy-refractory acute lymphoblastic leukemia. Sci Transl Med. 2013;5(177):177ra138.

Kochenderfer JN, Dudley ME, Kassim SH, Somerville RP, Carpenter RO, Stetler-Stevenson M, Yang JC, Phan GQ, Hughes MS, Sherry RM, et al. Chemotherapy-refractory diffuse large B-cell lymphoma and indolent B-cell malignancies can be effectively treated with autologous T cells expressing an anti-CD19 chimeric antigen receptor. J Clin Oncol. 2015;33(6):540–9.

Lee DW, Kochenderfer JN, Stetler-Stevenson M, Cui YK, Delbrook C, Feldman SA, Fry TJ, Orentas R, Sabatino M, Shah NN, et al. T cells expressing CD19 chimeric antigen receptors for acute lymphoblastic leukaemia in children and young adults: a phase 1 dose-escalation trial. Lancet. 2015;385(9967):517–28.

Bhoj VG, Arhontoulis D, Wertheim G, Capobianchi J, Callahan CA, Ellebrecht CT, Obstfeld AE, Lacey SF, Melenhorst JJ, Nazimuddin F, et al. Persistence of long-lived plasma cells and humoral immunity in individuals responding to CD19-directed CAR T-cell therapy. Blood. 2016;128(3):360–70.

Kebriaei P, Singh H, Huls MH, Figliola MJ, Bassett R, Olivares S, Jena B, Dawson MJ, Kumaresan PR, Su S, et al. Phase I trials using sleeping beauty to generate CD19-specific CAR T cells. J Clin Invest. 2016;126(9):3363–76.

Ramos CA, Savoldo B, Torrano V, Ballard B, Zhang H, Dakhova O, Liu E, Carrum G, Kamble RT, Gee AP, et al. Clinical responses with T lymphocytes targeting malignancy-associated κ light chains. J Clin Invest. 2016;126(7):2588–96.

Turtle CJ, Hanafi LA, Berger C, Hudecek M, Pender B, Robinson E, Hawkins R, Chaney C, Cherian S, Chen X, et al. Immunotherapy of non-hodgkin’s lymphoma with a defined ratio of CD8 + and CD4 + CD19-specific chimeric antigen receptor-modified T cells. Sci Transl Med. 2016;8(355):355ra116.

Gardner RA, Finney O, Annesley C, Brakke H, Summers C, Leger K, Bleakley M, Brown C, Mgebroff S, Kelly-Spratt KS, et al. Intent-to-treat leukemia remission by CD19 CAR T cells of defined formulation and dose in children and young adults. Blood. 2017;129(25):3322–31.

Hu Y, Wu Z, Luo Y, Shi J, Yu J, Pu C, Liang Z, Wei G, Cui Q, Sun J, et al. Potent anti-leukemia activities of Chimeric Antigen Receptor-Modified T Cells against CD19 in Chinese patients with Relapsed/Refractory Acute lymphocytic leukemia. Clin Cancer Res. 2017;23(13):3297–306.

Locke FL, Neelapu SS, Bartlett NL, Siddiqi T, Chavez JC, Hosing CM, Ghobadi A, Budde LE, Bot A, Rossi JM. Phase 1 results of ZUMA-1: a multicenter study of KTE-C19 anti-CD19 CAR T cell therapy in refractory aggressive lymphoma. Mol Ther. 2017;25(1):285–95.

Neelapu SS, Locke FL, Bartlett NL, Lekakis LJ, Miklos DB, Jacobson CA, Braunschweig I, Oluwole OO, Siddiqi T, Lin Y, et al. Axicabtagene Ciloleucel CAR T-Cell Therapy in Refractory large B-Cell lymphoma. N Engl J Med. 2017;377(26):2531–44.

Schuster SJ, Svoboda J, Chong EA, Nasta SD, Mato AR, Anak Ö, Brogdon JL, Pruteanu-Malinici I, Bhoj V, Landsburg D, et al. Chimeric Antigen Receptor T Cells in Refractory B-Cell Lymphomas. N Engl J Med. 2017;377(26):2545–54.

Turtle CJ, Hay KA, Hanafi LA, Li D, Cherian S, Chen X, Wood B, Lozanski A, Byrd JC, Heimfeld S, et al. Durable molecular remissions in chronic lymphocytic leukemia treated with CD19-Specific chimeric Antigen receptor-modified T cells after failure of Ibrutinib. J Clin Oncol. 2017;35(26):3010–20.

Enblad G, Karlsson H, Gammelgård G, Wenthe J, Lövgren T, Amini RM, Wikstrom KI, Essand M, Savoldo B, Hallböök H, et al. A phase I/IIa trial using CD19-Targeted third-generation CAR T cells for Lymphoma and Leukemia. Clin Cancer Res. 2018;24(24):6185–94.

Geyer MB, Riviere I, Senechal B, Wang X, Wang Y, Purdon TJ, Hsu M, Devlin SM, Halton E, Lamanna N, et al. Autologous CD19-Targeted CAR T cells in patients with residual CLL following initial Purine Analog-based therapy. Mol Ther. 2018;26(8):1896–905.

Jacoby E, Bielorai B, Avigdor A, Itzhaki O, Hutt D, Nussboim V, Meir A, Kubi A, Levy M, Zikich D, et al. Locally produced CD19 CAR T cells leading to clinical remissions in medullary and extramedullary relapsed acute lymphoblastic leukemia. Am J Hematol. 2018;93(12):1485–92.

Maude SL, Laetsch TW, Buechner J, Rives S, Boyer M, Bittencourt H, Bader P, Verneris MR, Stefanski HE, Myers GD, et al. Tisagenlecleucel in Children and Young adults with B-Cell lymphoblastic leukemia. N Engl J Med. 2018;378(5):439–48.

Park JH, Rivière I, Gonen M, Wang X, Sénéchal B, Curran KJ, Sauter C, Wang Y, Santomasso B, Mead E, et al. Long-term follow-up of CD19 CAR therapy in Acute Lymphoblastic Leukemia. N Engl J Med. 2018;378(5):449–59.

Rossi J, Paczkowski P, Shen YW, Morse K, Flynn B, Kaiser A, Ng C, Gallatin K, Cain T, Fan R, et al. Preinfusion polyfunctional anti-CD19 chimeric antigen receptor T cells are associated with clinical outcomes in NHL. Blood. 2018;132(8):804–14.

Svoboda J, Rheingold SR, Gill SI, Grupp SA, Lacey SF, Kulikovskaya I, Suhoski MM, Melenhorst JJ, Loudon B, Mato AR, et al. Nonviral RNA chimeric antigen receptor-modified T cells in patients with Hodgkin lymphoma. Blood. 2018;132(10):1022–6.

Weng J, Lai P, Qin L, Lai Y, Jiang Z, Luo C, Huang X, Wu S, Shao D, Deng C, et al. A novel generation 1928zT2 CAR T cells induce remission in extramedullary relapse of acute lymphoblastic leukemia. J Hematol Oncol. 2018;11(1):25.

Curran KJ, Margossian SP, Kernan NA, Silverman LB, Williams DA, Shukla N, Kobos R, Forlenza CJ, Steinherz P, Prockop S, et al. Toxicity and response after CD19-specific CAR T-cell therapy in pediatric/young adult relapsed/refractory B-ALL. Blood. 2019;134(26):2361–8.

Ghorashian S, Kramer AM, Onuoha S, Wright G, Bartram J, Richardson R, Albon SJ, Casanovas-Company J, Castro F, Popova B, et al. Enhanced CAR T cell expansion and prolonged persistence in pediatric patients with ALL treated with a low-affinity CD19 CAR. Nat Med. 2019;25(9):1408–14.

Hay KA, Gauthier J, Hirayama AV, Voutsinas JM, Wu Q, Li D, Gooley TA, Cherian S, Chen X, Pender BS, et al. Factors associated with durable EFS in adult B-cell ALL patients achieving MRD-negative CR after CD19 CAR T-cell therapy. Blood. 2019;133(15):1652–63.

Hirayama AV, Gauthier J, Hay KA, Voutsinas JM, Wu Q, Gooley T, Li D, Cherian S, Chen X, Pender BS, et al. The response to lymphodepletion impacts PFS in patients with aggressive non-hodgkin lymphoma treated with CD19 CAR T cells. Blood. 2019;133(17):1876–87.

Hirayama AV, Gauthier J, Hay KA, Voutsinas JM, Wu Q, Pender BS, Hawkins RM, Vakil A, Steinmetz RN, Riddell SR, et al. High rate of durable complete remission in follicular lymphoma after CD19 CAR-T cell immunotherapy. Blood. 2019;134(7):636–40.

Locke FL, Ghobadi A, Jacobson CA, Miklos DB, Lekakis LJ, Oluwole OO, Lin Y, Braunschweig I, Hill BT, Timmerman JM, et al. Long-term safety and activity of axicabtagene ciloleucel in refractory large B-cell lymphoma (ZUMA-1): a single-arm, multicentre, phase 1–2 trial. Lancet Oncol. 2019;20(1):31–42.

Ma F, Ho JY, Du H, Xuan F, Wu X, Wang Q, Wang L, Liu Y, Ba M, Wang Y, et al. Evidence of long-lasting anti-CD19 activity of engrafted CD19 chimeric antigen receptor-modified T cells in a phase I study targeting pediatrics with acute lymphoblastic leukemia. Hematol Oncol. 2019;37(5):601–8.

Schuster SJ, Bishop MR, Tam CS, Waller EK, Borchmann P, McGuirk JP, Jäger U, Jaglowski S, Andreadis C, Westin JR, et al. Tisagenlecleucel in adult relapsed or refractory diffuse large B-Cell lymphoma. N Engl J Med. 2019;380(1):45–56.

Ying Z, Huang XF, Xiang X, Liu Y, Kang X, Song Y, Guo X, Liu H, Ding N, Zhang T. A safe and potent anti-CD19 CAR T cell therapy. Nat Med. 2019;25(6):947–53.

Zhang Q, Hu H, Chen SY, Liu CJ, Hu FF, Yu J, Wu Y, Guo AY. Transcriptome and Regulatory Network Analyses of CD19-CAR-T immunotherapy for B-ALL. Genomics Proteom Bioinf. 2019;17(2):190–200.

Article   CAS   Google Scholar  

Abramson JS, Palomba ML, Gordon LI, Lunning MA, Wang M, Arnason J, Mehta A, Purev E, Maloney DG, Andreadis C, et al. Lisocabtagene maraleucel for patients with relapsed or refractory large B-cell lymphomas (TRANSCEND NHL 001): a multicentre seamless design study. Lancet. 2020;396(10254):839–52.

An F, Wang H, Liu Z, Wu F, Zhang J, Tao Q, Li Y, Shen Y, Ruan Y, Zhang Q, et al. Influence of patient characteristics on chimeric antigen receptor T cell therapy in B-cell acute lymphoblastic leukemia. Nat Commun. 2020;11(1):5928.

Benjamin R, Graham C, Yallop D, Jozwik A, Mirci-Danicar OC, Lucchini G, Pinner D, Jain N, Kantarjian H, Boissel N, et al. Genome-edited, donor-derived allogeneic anti-CD19 chimeric antigen receptor T cells in paediatric and adult B-cell acute lymphoblastic leukaemia: results of two phase 1 studies. Lancet. 2020;396(10266):1885–94.

Cappell KM, Sherry RM, Yang JC, Goff SL, Vanasse DA, McIntyre L, Rosenberg SA, Kochenderfer JN. Long-term Follow-Up of Anti-CD19 chimeric Antigen receptor T-Cell therapy. J Clin Oncol. 2020;38(32):3805–15.

Chen YH, Zhang X, Cheng YF, Chen H, Mo XD, Yan CH, Chen Y, Han W, Sun YQ, Wang Y, et al. Long-term follow-up of CD19 chimeric antigen receptor T-cell therapy for relapsed/refractory acute lymphoblastic leukemia after allogeneic hematopoietic stem cell transplantation. Cytotherapy. 2020;22(12):755–61.

Frey NV, Gill S, Hexner EO, Schuster S, Nasta S, Loren A, Svoboda J, Stadtmauer E, Landsburg DJ, Mato A, et al. Long-term outcomes from a randomized dose optimization study of Chimeric Antigen Receptor Modified T Cells in relapsed chronic lymphocytic leukemia. J Clin Oncol. 2020;38(25):2862–71.

Gu R, Liu F, Zou D, Xu Y, Lu Y, Liu B, Liu W, Chen X, Liu K, Guo Y, et al. Efficacy and safety of CD19 CAR T constructed with a new anti-CD19 chimeric antigen receptor in relapsed or refractory acute lymphoblastic leukemia. J Hematol Oncol. 2020;13(1):122.

Jacobson CA, Hunter BD, Redd R, Rodig SJ, Chen PH, Wright K, Lipschitz M, Ritz J, Kamihara Y, Armand P, et al. Axicabtagene Ciloleucel in the Non-trial setting: outcomes and correlates of response, resistance, and toxicity. J Clin Oncol. 2020;38(27):3095–106.

Liu E, Marin D, Banerjee P, Macapinlac HA, Thompson P, Basar R, Nassif Kerbauy L, Overman B, Thall P, Kaplan M, et al. Use of CAR-Transduced Natural Killer cells in CD19-Positive lymphoid tumors. N Engl J Med. 2020;382(6):545–53.

Nastoupil LJ, Jain MD, Feng L, Spiegel JY, Ghobadi A, Lin Y, Dahiya S, Lunning M, Lekakis L, Reagan P, et al. Standard-of-care Axicabtagene Ciloleucel for relapsed or refractory large B-Cell lymphoma: results from the US Lymphoma CAR T Consortium. J Clin Oncol. 2020;38(27):3119–28.

Pasquini MC, Hu ZH, Curran K, Laetsch T, Locke F, Rouce R, Pulsipher MA, Phillips CL, Keating A, Frigault MJ, et al. Real-world evidence of tisagenlecleucel for pediatric acute lymphoblastic leukemia and non-hodgkin lymphoma. Blood Adv. 2020;4(21):5414–24.

Sesques P, Ferrant E, Safar V, Wallet F, Tordo J, Dhomps A, Karlin L, Brisou G, Vercasson M, Hospital-Gustem C, et al. Commercial anti-CD19 CAR T cell therapy for patients with relapsed/refractory aggressive B cell lymphoma in a European center. Am J Hematol. 2020;95(11):1324–33.

Wang W, Jiang J, Wu C. CAR-NK for tumor immunotherapy: clinical transformation and future prospects. Cancer Lett. 2020;472:175–80.

Zhou X, Tu S, Wang C, Huang R, Deng L, Song C, Yue C, He Y, Yang J, Liang Z, et al. Phase I Trial of Fourth-Generation Anti-CD19 Chimeric Antigen Receptor T Cells against Relapsed or refractory B cell Non-hodgkin Lymphomas. Front Immunol. 2020;11:564099.

Baird JH, Epstein DJ, Tamaresis JS, Ehlinger Z, Spiegel JY, Craig J, Claire GK, Frank MJ, Muffly L, Shiraz P, et al. Immune reconstitution and infectious complications following axicabtagene ciloleucel therapy for large B-cell lymphoma. Blood Adv. 2021;5(1):143–55.

Gauthier J, Bezerra ED, Hirayama AV, Fiorenza S, Sheih A, Chou CK, Kimble EL, Pender BS, Hawkins RM, Vakil A, et al. Factors associated with outcomes after a second CD19-targeted CAR T-cell infusion for refractory B-cell malignancies. Blood. 2021;137(3):323–35.

Iacoboni G, Villacampa G, Martinez-Cibrian N, Bailén R, Lopez Corral L, Sanchez JM, Guerreiro M, Caballero AC, Mussetti A, Sancho JM, et al. Real-world evidence of tisagenlecleucel for the treatment of relapsed or refractory large B-cell lymphoma. Cancer Med. 2021;10(10):3214–23.

Mian A, Wei W, Winter AM, Khouri J, Jagadeesh D, Anwer F, Gerds AT, Dean RM, Sobecks R, Pohlman B, et al. Outcomes and factors impacting use of axicabtagene ciloleucel in patients with relapsed or refractory large B-cell lymphoma: results from an intention-to-treat analysis. Leuk Lymphoma. 2021;62(6):1344–52.

Shah BD, Bishop MR, Oluwole OO, Logan AC, Baer MR, Donnellan WB, O’Dwyer KM, Holmes H, Arellano ML, Ghobadi A, et al. KTE-X19 anti-CD19 CAR T-cell therapy in adult relapsed/refractory acute lymphoblastic leukemia: ZUMA-3 phase 1 results. Blood. 2021;138(1):11–22.

Tan Y, Pan J, Deng B, Ling Z, Song W, Xu J, Duan J, Wang Z, Yu X, Chang AH, et al. Toxicity and effectiveness of CD19 CAR T therapy in children with high-burden central nervous system refractory B-ALL. Cancer Immunol Immunother. 2021;70(7):1979–93.

Wang S, Wang X, Ye C, Cheng H, Shi M, Chen W, Qi K, Wang G, Wu Q, Zeng L, et al. Humanized CD19-targeted chimeric antigen receptor T (CAR-T) cells for relapsed/refractory pediatric acute lymphoblastic leukemia. Am J Hematol. 2021;96(5):E162–5.

Ying Z, Yang H, Guo Y, Li W, Zou D, Zhou D, Wang Z, Zhang M, Wu J, Liu H, et al. Relmacabtagene autoleucel (relma-cel) CD19 CAR-T therapy for adults with heavily pretreated relapsed/refractory large B-cell lymphoma in China. Cancer Med. 2021;10(3):999–1011.

Ying Z, Huang XF, Xiang X, Liu Y, Kang X, Song Y, Guo X, Liu H, Ding N, Zhang T, et al. A safe and potent anti-CD19 CAR T cell therapy. Nat Med. 2019;25(6):947–53.

Dai H, Wang Y, Lu X, Han W. Chimeric Antigen receptors modified T-Cells for Cancer Therapy. J Natl Cancer Inst 2016, 108(7).

Ramos CA, Savoldo B, Torrano V, Ballard B, Zhang H, Dakhova O, Liu E, Carrum G, Kamble RT, Gee AP. Clinical responses with T lymphocytes targeting malignancy-associated κ light chains. J Clin Investig. 2016;126(7):2588–96.

Ying Z, Song Y, Zhu J. Effectiveness and safety of Anti-CD19 chimeric Antigen Receptor-T cell immunotherapy in patients with Relapsed/Refractory large B-Cell lymphoma: a systematic review and Meta-analysis. Front Pharmacol. 2022;13:834113.

Aamir S, Anwar MY, Khalid F, Khan SI, Ali MA, Khattak ZE. Systematic review and Meta-analysis of CD19-Specific CAR-T cell therapy in Relapsed/Refractory Acute Lymphoblastic Leukemia in the Pediatric and Young Adult Population: Safety and Efficacy outcomes. Clin Lymphoma Myeloma Leuk. 2021;21(4):e334–47.

Brudno JN, Kochenderfer JN. Toxicities of chimeric antigen receptor T cells: recognition and management. Blood. 2016;127(26):3321–30.

Campos NSP, Souza BS, Silva G, Porto VA, Chalbatani GM, Lagreca G, Janji B, Suarez ER. Carbonic anhydrase IX: a renewed target for Cancer Immunotherapy. Cancers (Basel) 2022, 14(6).

Li S, Zhang J, Wang M, Fu G, Li Y, Pei L, Xiong Z, Qin D, Zhang R, Tian X, et al. Treatment of acute lymphoblastic leukaemia with the second generation of CD19 CAR-T containing either CD28 or 4-1BB. Br J Haematol. 2018;181(3):360–71.

Cappell KM, Kochenderfer JN. A comparison of chimeric antigen receptors containing CD28 versus 4-1BB costimulatory domains. Nat Rev Clin Oncol. 2021;18(11):715–27.

Download references

Acknowledgements

The authors sincerely thank all authors of all studies included in this meta-analysis.

N.S.P.C had a Brazilian National Council for Scientific and Technological Development (CNPq) scholarship (143179/2021-7; 140514/2022-8). V.A.P was supported by a Federal University of ABC institutional scholarship (PIC- UFABC). E.R.S had a grant from Sao Paulo Research Foundation/Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), Brazil, number 2018/17656-0 and 2023/03631-3.

Author information

Authors and affiliations.

Centro Universitário FMABC, Santo André, 09060-870, SP, Brazil

Erik Montagna

Center for Natural and Human Sciences, Federal University of ABC, Santo Andre, 09210-580, SP, Brazil

Najla Santos Pacheco de Campos, Victoria Alves Porto, Giselle Correia Próspero da Silva & Eloah Rabello Suarez

Graduate Program in Medicine - Hematology and Oncology, Federal University of São Paulo, São Paulo, 04023-062, SP, Brazil

Najla Santos Pacheco de Campos & Eloah Rabello Suarez

You can also search for this author in PubMed   Google Scholar

Contributions

E.M. performed the statistical analysis, manuscript writing and revision; N.S.P.C performed a literature search, provided data extraction, data clarifications and revised the manuscript; V.A.P and G.C.P.S provided data extraction and revised the manuscript; E.R.S. conceived the study, performed literature search, article selection, and manuscript writing and revision.

Corresponding author

Correspondence to Eloah Rabello Suarez .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Montagna, E., de Campos, N.S.P., Porto, V.A. et al. CD19 CAR T cells for B cell malignancies: a systematic review and meta-analysis focused on clinical impacts of CAR structural domains, manufacturing conditions, cellular product, doses, patient’s age, and tumor types. BMC Cancer 24 , 1037 (2024). https://doi.org/10.1186/s12885-024-12651-6

Download citation

Received : 25 May 2023

Accepted : 16 July 2024

Published : 22 August 2024

DOI : https://doi.org/10.1186/s12885-024-12651-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Transmembrane
  • Costimulatory domain
  • CAR T cells manufacturing conditions
  • CAR T cell dose
  • Clinical response CAR T cells

ISSN: 1471-2407

hypothesis generating analysis

This paper is in the following e-collection/theme issue:

Published on 28.8.2024 in Vol 26 (2024)

Expectations and Preferences for Digital Cessation Treatment: Multimethods Study Among Older Adults Who Smoke Cigarettes

Authors of this article:

Author Orcid Image

There are no citations yet available for this article according to Crossref .

Suggestions or feedback?

MIT News | Massachusetts Institute of Technology

  • Machine learning
  • Sustainability
  • Black holes
  • Classes and programs

Departments

  • Aeronautics and Astronautics
  • Brain and Cognitive Sciences
  • Architecture
  • Political Science
  • Mechanical Engineering

Centers, Labs, & Programs

  • Abdul Latif Jameel Poverty Action Lab (J-PAL)
  • Picower Institute for Learning and Memory
  • Lincoln Laboratory
  • School of Architecture + Planning
  • School of Engineering
  • School of Humanities, Arts, and Social Sciences
  • Sloan School of Management
  • School of Science
  • MIT Schwarzman College of Computing

MIT study explains why laws are written in an incomprehensible style

Press contact :.

Stack of legal papers and gavel

Previous image Next image

Legal documents are notoriously difficult to understand, even for lawyers. This raises the question: Why are these documents written in a style that makes them so impenetrable?

MIT cognitive scientists believe they have uncovered the answer to that question. Just as “magic spells” use special rhymes and archaic terms to signal their power, the convoluted language of legalese acts to convey a sense of authority, they conclude.

In a study appearing this week in the journal of the Proceedings of the National Academy of Sciences , the researchers found that even non-lawyers use this type of language when asked to write laws.

“People seem to understand that there’s an implicit rule that this is how laws should sound, and they write them that way,” says Edward Gibson, an MIT professor of brain and cognitive sciences and the senior author of the study.

Eric Martinez PhD ’24 is the lead author of the study. Francis Mollica, a lecturer at the University of Melbourne, is also an author of the paper .

Casting a legal spell

Gibson’s research group has been studying the unique characteristics of legalese since 2020, when Martinez came to MIT after earning a law degree from Harvard Law School. In a 2022 study , Gibson, Martinez, and Mollica analyzed legal contracts totaling about 3.5 million words, comparing them with other types of writing, including movie scripts, newspaper articles, and academic papers.

That analysis revealed that legal documents frequently have long definitions inserted in the middle of sentences — a feature known as “center-embedding.” Linguists have previously found that this kind of structure can make text much more difficult to understand.

“Legalese somehow has developed this tendency to put structures inside other structures, in a way which is not typical of human languages,” Gibson says.

In a follow-up study published in 2023, the researchers found that legalese also makes documents more difficult for lawyers to understand. Lawyers tended to prefer plain English versions of documents, and they rated those versions to be just as enforceable as traditional legal documents.

“Lawyers also find legalese to be unwieldy and complicated,” Gibson says. “Lawyers don’t like it, laypeople don’t like it, so the point of this current paper was to try and figure out why they write documents this way.”

The researchers had a couple of hypotheses for why legalese is so prevalent. One was the “copy and edit hypothesis,” which suggests that legal documents begin with a simple premise, and then additional information and definitions are inserted into already existing sentences, creating complex center-embedded clauses.

“We thought it was plausible that what happens is you start with an initial draft that’s simple, and then later you think of all these other conditions that you want to include. And the idea is that once you’ve started, it’s much easier to center-embed that into the existing provision,” says Martinez, who is now a fellow and instructor at the University of Chicago Law School.

However, the findings ended up pointing toward a different hypothesis, the so-called “magic spell hypothesis.” Just as magic spells are written with a distinctive style that sets them apart from everyday language, the convoluted style of legal language appears to signal a special kind of authority, the researchers say.

“In English culture, if you want to write something that’s a magic spell, people know that the way to do that is you put a lot of old-fashioned rhymes in there. We think maybe center-embedding is signaling legalese in the same way,” Gibson says.

In this study, the researchers asked about 200 non-lawyers (native speakers of English living in the United States, who were recruited through a crowdsourcing site called Prolific), to write two types of texts. In the first task, people were told to write laws prohibiting crimes such as drunk driving, burglary, arson, and drug trafficking. In the second task, they were asked to write stories about those crimes.

To test the copy and edit hypothesis, half of the participants were asked to add additional information after they wrote their initial law or story. The researchers found that all of the subjects wrote laws with center-embedded clauses, regardless of whether they wrote the law all at once or were told to write a draft and then add to it later. And, when they wrote stories related to those laws, they wrote in much plainer English, regardless of whether they had to add information later.

“When writing laws, they did a lot of center-embedding regardless of whether or not they had to edit it or write it from scratch. And in that narrative text, they did not use center-embedding in either case,” Martinez says.

In another set of experiments, about 80 participants were asked to write laws, as well as descriptions that would explain those laws to visitors from another country. In these experiments, participants again used center-embedding for their laws, but not for the descriptions of those laws.

The origins of legalese

Gibson’s lab is now investigating the origins of center-embedding in legal documents. Early American laws were based on British law, so the researchers plan to analyze British laws to see if they feature the same kind of grammatical construction. And going back much farther, they plan to analyze whether center-embedding is found in the Hammurabi Code, the earliest known set of laws, which dates to around 1750 BC.

“There may be just a stylistic way of writing from back then, and if it was seen as successful, people would use that style in other languages,” Gibson says. “I would guess that it’s an accidental property of how the laws were written the first time, but we don’t know that yet.”

The researchers hope that their work, which has identified specific aspects of legal language that make it more difficult to understand, will motivate lawmakers to try to make laws more comprehensible. Efforts to write legal documents in plainer language date to at least the 1970s, when President Richard Nixon declared that federal regulations should be written in “layman’s terms.” However, legal language has changed very little since that time.

“We have learned only very recently what it is that makes legal language so complicated, and therefore I am optimistic about being able to change it,” Gibson says. 

Share this news article on:

Press mentions, fast company.

Researchers at MIT have uncovered a possible reason why legal documents can be so difficult to read, finding that “convoluted legalese often acts as a way to convey authority,” reports Joe Berkowitz for Fast Company . The researchers “tested whether nonlawyers would end up using legalese if asked to write legal documents,” explains Berkowitz. “In the end, all subjects wrote their laws with complex, center-embedded clauses.”

Researchers at MIT have found that the use of legalese in writing “to assert authority over those less versed in such language,” reports Noor Al-Sibai for Futurism . “By studying this cryptic take on the English language, the researchers are hoping to make legal documents much easier to read in the future,” explains Al-Sibai.

Previous item Next item

Related Links

  • Eric Martinez
  • Department of Brain and Cognitive Sciences

Related Topics

  • Communications
  • Literature, languages and writing
  • Brain and cognitive sciences

Related Articles

Two multi-colored talk bubbles. The right bubble fills more of the screen than the left.

How “blue” and “green” appeared in a language that didn’t have words for them

A magnifying glass on top of a law document; it warps the text. A large wavey “?” icon is in the magnifying glass.

Even lawyers don’t like legalese

legal document with question mark graphic

Objection: No one can understand what you’re saying

More mit news.

Five square slices show glimpse of LLMs, and the final one is green with a thumbs up.

Study: Transparency is often lacking in datasets used to train large language models

Read full story →

Charalampos Sampalis wears a headset while looking at the camera

How MIT’s online resources provide a “highly motivating, even transformative experience”

A small model shows a wooden man in a sparse room, with dramatic lighting from the windows.

Students learn theater design through the power of play

Illustration of 5 spheres with purple and brown swirls. Below that, a white koala with insets showing just its head. Each koala has one purple point on either the forehead, ears, and nose.

A framework for solving parabolic partial differential equations

Feyisayo Eweje wears lab coat and gloves while sitting in a lab.

Designing better delivery for medical therapies

Saeed Miganeh poses standing in a hallway. A street scene is visible through windows in the background

Making a measurable economic impact

  • More news on MIT News homepage →

Massachusetts Institute of Technology 77 Massachusetts Avenue, Cambridge, MA, USA

  • Map (opens in new window)
  • Events (opens in new window)
  • People (opens in new window)
  • Careers (opens in new window)
  • Accessibility
  • Social Media Hub
  • MIT on Facebook
  • MIT on YouTube
  • MIT on Instagram
  • Kreyòl Ayisyen

Consumer Financial Protection Bureau

Cash-back Fees

Executive summary, cash-back transactions, benefits and costs to merchants.

Access to cash is a necessary component of a resilient financial system and dynamic economy. Many people rely on cash for day-to-day transactions due its privacy and reliability, and cash accessibility is particularly critical in the case of a disruption or outage of digital payment systems. While people use various means of getting cash, one common method is to get “cash back” at a store when making a purchase with a debit or prepaid card. This option may be particularly important in banking deserts and in areas where banks and ATM operators charge significant fees. Retailers are essentially filling a void in access to cash, which has historically been supplied by banks and credit unions in an affordable way.

Providing cash back is valuable to consumers and merchants. Survey data show that it is a popular method to get money via consumers’ bank debit or prepaid cards. Merchants offer cash back to attract customers and reduce their cash handling costs. In its recent engagement and market monitoring, the CFPB observed that some retailers charge a fee for this transaction.

This spotlight provides an overview of consumers’ use of cash back, the benefits and costs of such transactions to merchants, and the practices of other market actors which do not charge fees for this service. The CFPB also analyzed the cash-back fees of a sample of national retailers.

Fees for cash back may serve as a barrier and reduce people’s access to cash when they need it. The CFPB will continue to monitor developments related to the fees consumers pay for accessing cash, and the underlying failure of banks and credit unions to adequately supply cash throughout the country in an affordable manner.

Key Findings

  • Cash-back fees are costing consumers millions of dollars . The CFPB found that three companies in the sample charge cash-back fees and estimates that they collect over $90 million in fees annually for people to access their cash. The CFPB also estimates that the marginal cost to merchants for processing each transaction may be a few pennies, compared to the much higher fees charged by these retailers to consumers. While there may be other costs related to cash handling, these are generally reduced by the provision of cash back, as it reduces merchants’ cash on hand.
  • Three major firms charge cash-back fees even though other competitors offer it for free. Three retail companies Dollar General, Dollar Tree, and Kroger, which also operate brands such as Family Dollar, Harris Teeter, Ralph’s, and others, charge fees for this service while other national retail companies sampled by the CFPB do not charge a fee. At the two largest dollar store corporations, cash-back fees for small withdrawal amounts are the highest in the sample ($1 or more for amounts under $50). Kroger, the country’s largest grocery chain, recently expanded cash-back fees to its Harris Teeter brand (75 cents for $100 or less), higher than those in place among its other brands (50 cents for $100 or less), in addition to higher fees for larger amounts.
  • Cash-back fees are levied on low pre-set cash withdrawal amounts . Many merchants pre-determine the withdrawal amount options in a single transaction, commonly between $5 and $50. The fees charged on small, constrained amounts often constitute a high percentage of the cash withdrawal and limit consumers’ ability to spread the cost of that fee over larger amounts. It may also induce repeat withdrawals, with consumers incurring a new fee each time.
  • Consumers with lower incomes or fewer banking choices may be more likely to encounter cash-back fees . Dollar stores are frequently located in small rural towns, communities of color, and low-income communities. These areas are also more likely to be places where there are fewer branch locations, and communities where people are more reliant on cash for daily transactions than others.

This section summarizes the importance of cash availability and the use of cash-back as an access point for consumers.

Cash is a critical part of a resilient payment ecosystem. Surveys show people still try to have cash on hand 1 and nearly 90 percent of people used cash in the last 30 days. 2 Cash accessibility is necessary should other types of digital payment systems experience failures, 3 such as in the event of a natural disaster or some other catastrophe, 4 or a technological malfunction at a single company. 5 Additionally, some populations are more reliant on cash than others for day-to-day transactions. For example, cash is more frequently used by people with lower incomes, racial minorities, and older Americans than other populations. 6 As discussed below, cash back is a common method for obtaining cash for many consumers.

How cash back works

Consumers may obtain cash during the completion of a purchase transaction at certain stores when using a PIN-authenticated debit card or prepaid card at the register. Some merchants also provide cash back at self-service registers. Consumers typically must choose from pre-set withdrawal amount options presented at the payment terminal at the time of the transaction. In a cash-back transaction, consumers are usually limited to a maximum withdrawal amount ranging from $5 to $50, though some merchants may allow higher amounts.

Scope of usage

CFPB analysis of data from the Diary and Survey of Consumer Payment Choice (Survey) found that from 2017 to 2022, cash withdrawals at retail locations made up 17 percent of all transactions by which people got cash from their checking account, savings account, or prepaid card. As shown in Figure 1, cash withdrawals at retail are second only to ATMs (61%) and more frequently used than bank tellers (14%). The Survey and methodology are discussed in the Tables and Notes section .

Figure 1: Instances of getting cash from bank account or prepaid card, by location, 2017 to 2022, combined

Pie chart showing ATM 61%, Retail point-of-sale 17%, Bank teller 14%, and Other 8%.

Source : CFPB tabulations of the Diary and Survey of Consumer Payment Choice.

The Survey data also show that from 2017 to 2022, cash withdrawals at a retail location (restricted to those where the source of funds was the consumer’s checking, savings, or a prepaid card) had a mean withdrawal amount of $34 (median: $20). 7 By contrast, during this same timeframe, the mean ATM withdrawal among survey participants was $126 (median: $100). 8 A study by researchers at the Federal Reserve Bank of Atlanta utilizing Survey data found that cash withdrawals at a retail store had the lowest average amount of cash withdrawal, and noted that “[t]he amount of cash received at a retail store is constrained by the store’s limits, so the amount of cash received in this way is not necessarily at the discretion of the consumer.” 9

Cash back may serve as a particularly important point of access in the absence of other banking services. A 2014 study by the Federal Reserve Bank of Richmond analyzed cash-back transactions from a national discount retail chain from 2010 to 2012. 10 Looking specifically at the Richmond bank’s district, the area with the highest frequency of cash-back transactions was in the southeastern region of South Carolina, an area “that has been subject to ‘persistent poverty’” and “has some of the sparsest dispersion of bank branches.” 11 The study also illustrated the lucrative nature of cash-back fees: During the course of this study period, the merchant introduced a fee for cash back. Data from this report indicates that the retailer collected approximately $21 million in cash-back fees in a year. 12

Merchants benefit from offering cash back at point-of-sale. First, the service may attract potential shoppers, either people making a purchase in order to get cash back or people who prefer one retail location over another in order to conveniently combine tasks. Second, it reduces merchants’ cash handling costs. 13 Dispensing cash to consumers, such as through cash-back transactions, reduces merchants’ supply of cash and therefore also reduces their cost of handling, transporting, and depositing excess cash.

Merchants incur costs for processing any type of payment transaction, including cash-back transactions. On any purchase using an electronic payment method, including a PIN-authorized debit-card or prepaid card, a merchant will incur a range of fees for processing that payment, such as interchange, network, and processing fees. While the merchant incurs these fees for a consumer’s purchase, there is an additional cost for providing cash back to the consumer.

To assess this additional transaction cost to the merchant for providing cash back, the CFPB modeled potential scenarios based on publicly available data and our market monitoring activities. The model incorporates estimates of merchant-incurred fees, such as interchange, network, processing, and fraud control fees. Methodology is discussed in detail in the Table and Figure Notes. The CFPB estimates that the additional marginal transactional cost to a merchant for processing a typical cash-back debit card transaction may range from a penny to about 20 cents (Table 1).

Table 1: Estimated additional merchant cost of a debit card cash-back transaction

Example Retailer Purchase Amount Merchant Transaction Cost for Purchase Only Additional Merchant Cost for $10 Cash Back Additional Merchant Cost for $40 Cash Back

National Discount Chain

$20

$0.33

$0.05

$0.19

National Grocery Store

$20

$0.33

$0.01

$0.02

Source : CFPB calculations based on public data about industry practices and averages. See Table and Figure Notes below for methodology .

This section provides an analysis of cash-back fee practices of eight national retail chains. It includes a discussion of the variation of these practices among these national chains and other actors, such as local independent grocers. The analysis is supplemented by market monitoring discussions with merchants about fees, costs, and consumer trends, both among merchants who charge cash back fees and those who do not. The CFPB also conducted consumer experience interviews and reviewed consumer complaints submitted to the CFPB. It concludes with a discussion of how these fees appear to function differently than fees for cash withdrawals at ATMs.

Current market practices

As of August 2024, there is no publicly available survey data regarding merchants’ cash-back practices or fees. To establish a baseline, the CFPB documented the fee practices of eight large retail companies. The sample consists of the two largest retail actors, measured by number of locations, across four different sectors: Dollar Stores, Grocery Stores, Drugstores, and Discount Retailers. 14 Using this approach, the eight retailers sampled are: Dollar General and Dollar Tree Inc. (Dollar Stores), Kroger Co. and Albertsons Companies (Grocery Stores), Walgreens and CVS (Drugstores), and Walmart and Target (Discount Retailers).

All retailers in our sample offer cash-back services, but only Dollar General, Dollar Tree Inc., and Kroger Co. brands charge a fee. Other retailers offer cash-back for free, even for withdrawal amounts similar to or larger than those provided by the three retailers who charge. (Table 2). Among the national chains that charge these cash-back fees, the CFPB estimates that they collect over $90 million in fees annually for people to access their cash. 15

Table 2: Cash-back fee practices, major retail companies

Company U.S. Stores Fee for Cash Back Maximum Withdrawal Amount (Per Transaction)

Dollar General

20,022

$1 to $2.50, depending on amount and other variables

$40

Dollar Tree Inc.
(Family Dollar and Dollar Tree)

16,278

Family Dollar: $1.50
Dollar Tree: $1

$50

Kroger Co.
(incl. Kroger, Ralph’s, Fred Meyer, Pick ‘n Save, and other brands)

2,722

Harris Teeter brand:
75 cents for ≤ $100; $3.00 for >$100
Other brands:
50 cents for ≤$100, $3.50 for >$100

Harris Teeter brand: $200
Other brands: $300

Albertsons Brand

2,271

No

$200

Walmart

5,214

No

$100

Target

1,956

No

$40

Walgreens

8,600

No

$20

CVS

7,500

No

$60

Source : CFPB analysis of the retail cash-back market. See Table and Figure Notes for methodology .

Beyond these national chains, there are other providers offering cash back as a free service to their customers. Through its market monitoring activities, the CFPB observed that many local independent grocers offer the service, but do not charge a fee. They do not charge a fee even though they are likely to have thinner profit margins and less bargaining power than national chains to negotiate on pricing on costs they incur from wholesalers or fees for payment processors. The U.S. Postal Service also offers cash back on debit transactions, in increments of $10 up to a $50 maximum, free of charge. 16

Cash-back fees at dollar stores

Among the merchants sampled, Dollar General and Dollar Tree Inc. charge the highest fees for withdrawal amounts under $50. These fees combined with the constrained withdrawal amount may mean that the fee takes up a hefty percentage relative to the amount of cash withdrawn, and people may be less able to limit the impact of the fee by taking out more cash.

Additionally, the geographic distribution of dollar store chains and their primary consumer base raises concerns that these fees may be borne by economically vulnerable populations and those with limited banking access. Dollar stores are prevalent in rural communities, low-income communities, and communities of color – the same communities who may also face challenges in accessing banking services. 17 For example, Dollar General noted that in 2023 “approximately 80% of [its] stores are located in towns of 20,000 or fewer people,” 18 while Dollar Tree Inc. operated at least 810 dual-brand combination stores (Family Dollar and Dollar Tree in a single building) designed specifically “for small towns and rural communities…with populations of 3,000 to 4,000 residents.” 19

Though they are open to and serve consumers of all income levels, dollar stores report that they locate stores specifically to serve their core customers: lower-income consumers. 20 In urban communities, one study shows, “proximity to dollar stores is highly associated with neighborhoods of color even when controlling for other factors.” 21 These same communities may also face challenges in accessing banking services. Low-income communities and communities of color often face barriers to access to banking services, and rural communities are 10 times more likely to meet the definition of a banking desert than urban areas. 22

Though the dollar store concept existed as far back as the 1950s, it has experienced significant expansion and consolidation since the 2000s. 23 Dollar Tree Inc. acquired Family Dollar in 2015. 24 From 2018 to 2021, nearly half of all retail locations opened in the U.S. were dollar stores. 25 In research examining the impact of dollar store expansion, studies indicate that the opening of a dollar store is associated with the closure of nearby local grocery retailers. 26

Variation of fees charged

In its scan of current market practices, the CFPB found variations in fee charges among store locations and brands owned by the same company. For example, as reflected in Table 2, Dollar Tree charges consumers $1 for cash back at Dollar Tree branded stores, but $1.50 in its Family Dollar stores. Similarly, Kroger Co. has two different fee tiers for its brands. In 2019, Kroger Co. rolled out a $0.50 cash-back fee for amounts of $100 or less, and $3.50 for amounts between $100 and $300. This took effect at brands such as Kroger, Fred Meyers, Ralph’s, QFC, Pick ‘N Save, and others. At the time of the rollout, the company noted two exceptions: Electronic benefits transfer (EBT) card users would not be charged a fee, and customers using their Kroger Plus card would not be charged for amounts under $100 but would be charged $0.50 for larger amounts. Kroger Co. acquired the southern grocery chain Harris Teeter in 2014, but it did not begin charging a cash-back fee at those stores until January 2024, at $0.75 for amounts of $100 or less, and $3 for larger amounts. 27

In its engagement with stakeholders, the CFPB learned that Dollar General’s fees appeared to vary in different locations. To better understand this potential variation, in December 2022, the CFPB mystery shopped at nine locations in one state, across a mix of rural, suburban, and urban communities. The CFPB acknowledges this is a small sample and is not intended to be representative. The data collected is based on the knowledge of the store associates at the time of each interaction.

In these findings, the CFPB learned of a range of fee variations across store locations: five of the nine respondents noted that the fee varies depending on the type of card used for the transaction. When probed for the meaning of “type of card,” most noted that it is dependent on the customer’s bank, though it is not exactly clear what fees will be triggered by what card type prior to initiating the transaction. Additionally, reported fees range from $1 to $2.50, with some stores reporting a flat fee structure of $1.50 and others reporting a range that tiered up with larger withdrawal amounts (with a cap of withdrawal amounts at $40). Most stores in this sample had a range of fees between $1.00 and $1.50, although two stores located in small, completely rural counties had a higher range of fees. The store located in the smallest and most isolated county within the sample, with only about 3,600 people, had the highest reported fee amount of $2.50.

Distinction from ATM fees

One of the market dynamics likely contributing to retailers’ ability to charge these fees is the high fees also charged to consumers for using out-of-network automated teller machines (ATMs). One source estimates that the average out-of-network ATM fee is $4.77, accounting for both the surcharge fee charged by the ATM owner and the foreign fee charged by the consumer’s financial institution. 28 By comparison, a $2 fee for cash back at a retailer may appear cheaper, and usually does not trigger an additional fee by the consumers’ financial institution or prepaid card issuer. Notwithstanding the high ATM fees, there are reasons for focused attention on the consumer risk of cash-back fees charged by retailers, primarily the amount of the fee relative to the value of the cash withdrawal and the distribution of the fee burden across income groups.

In a typical ATM transaction, a consumer has a greater ability to distribute the cost of the fee across a larger amount of cash than with cash back. There may be some exceptions to this for consumers who have only $10 or $20 in their bank account, but as shown in Table 3, low-income consumers and others withdraw greater amounts at ATMs than via cash-back, on average. In cash-back transactions, lower withdrawal limits are in place, and consumers do not have that option to withdraw larger amounts. CFPB analysis of the Diary and Survey of Consumer Payment Choice from 2017 to 2022 show that even among consumers with incomes below $50,000, the amount withdrawn at an ATM is more than double the typical cash-back withdrawal amount. Additionally, for the average and median amounts, across all incomes the ATM withdrawal amounts are larger than cash-back withdrawal amounts. (Table 3).

Table 3: Average ATM and cash-back withdrawal amounts, by income, 2017 to 2022 combined

Income Average ATM Withdrawal Average Cash-back Withdrawal Median ATM Withdrawal Median Cash-back Withdrawal

Less than $25,000

$144

$45

$65

$20

$25,000 to $49,999

$113

$35

$60

$25

$50,000 to $74,999

$113

$29

$84

$20

$75,000 to $99,000

$114

$45

$100

$26

$100,000 or more

$146

$33

$100

$20

Source: CFPB tabulations of the Diary and Survey of Consumer Payment Choice. See Table and Figure Notes for methodology .

Further, while merchants limit the amount of a single withdrawal, there is no limit on the number of withdrawals. So, if a consumer needs $100 cash at a store which limits a single withdrawal to a maximum amount of $50 with a $2 fee, the consumer would have to make two $50 withdrawals for a $4 fee plus the cost of any otherwise unwanted purchase required to access the cash-back service.

Finally, the burden of cash-back fees may be distributed differently than ATM fee burdens. The share of borrowers who pay ATM fees for cash withdrawals is relatively evenly distributed across income levels, according to a study based on the Diary and Survey of Consumer Payment Choice. 29 The study found little variation in the percentage of consumers who encountered a fee for an ATM cash withdrawal by income quintile, though the study did not look at the amount of the ATM fees paid. Analogous data are not available for cash-back fees, but a similarly even distribution across incomes is unlikely given the demographics of the consumer base served by the largest retailers which charge fees (dollar stores).

While the use of digital payment methods is on the rise, cash accessibility remains a critical component of a resilient financial infrastructure and dynamic economy. Bank mergers, branch closures, and bank fee creep have reduced the supply of free cash access points for consumers. In this void, people may be more reliant on retailers for certain financial services historically provided by banks and credit unions, such as cash access. In this context, we observe that some retailers provide cash back as a helpful service to their customers, while other retailers may be exploiting these conditions by charging fees to their consumers for accessing their cash.

This spotlight examines the presence of retailer cash-back fees and impact to consumers. Cash-back fees are being levied by just a small handful of large retail conglomerates (Dollar General, Dollar Tree Inc., and Kroger Co.) amidst a backdrop of consolidation in these segments. Meanwhile, other larger retailers continue to offer cash-back services free. The CFPB estimates cash-back fees cost consumers about $90 million a year.

The CFPB is concerned that reduced access to cash undermines the resilience of the financial system and deprives consumers of a free, reliable, and private means of engaging in day-to-day transactions. The CFPB will continue to monitor developments related to the fees consumers pay for accessing cash, and work with agencies across the federal government to ensure people have fair and meaningful access to the money that underpins our economy.

Table and Figure Notes

Notes for figure 1.

The Federal Reserve Bank of Atlanta’s annual Diary and Survey of Consumer Payment Choice (Survey) tracks consumers’ self-reported payment habits over a three-day period in October using a nationally representative sample. The survey includes a question about whether and how consumers access cash, such as where they made the withdrawal, the source of the cash, and the amount of the withdrawal. Figure 1 provides a percentage of all cash-back withdrawal transactions from a bank account, checking account, or prepaid card reported between 2017 and 2022, by location (ATM, Retail point-of-sale, Bank teller, and Other). The number of observations during this time is 192 transactions. It does not include cash-back transactions made using a credit card cash advance feature or other form of credit.

Notes for Table 1

This model assumes that 80 percent of the merchant transaction cost is due to interchange fees, 15 percent due to network fees, and 5 percent due to payment acquirer fees. It also includes a $0.01 fee for fraud protection. For regulated transactions, the interchange fees are $0.22 + 0.05% of the transaction amount. Regulated transactions are those where the debit card used is issued by a bank with more than $10 billion in assets, and subject to 15 U.S.C. § 1693o-2. Exempt transactions are those not subject to this statutory cap on interchange fees. While Mastercard does not publish its fees for exempt transactions, Visa does. This model uses Visa’s published fees as of October 2023 for card-present transactions: for the National Discount Chain, the fees for Exempt Retail Debit ($0.15 + 0.80%), and for the National Grocery Chain, Exempt Supermarket Debit ($0.30 flat fee). An October 2023 Federal Reserve report on interchange fee revenue found that in 2021, the most recent data available, 56.21 percent of debit transactions were regulated and 43.79 percent were exempt. This composition is reflected in the table.

Notes for Table 2

The storefront counts for each of the retailers come from their websites, last visited on March 28, 2024, or their most recent reports to investors. Fee information was gathered either through publicly available information such as the merchant’s website, and/or verified through the CFPB’s market monitoring activities.

Dollar Tree Inc. announced on March 13, 2024 that it will close 1,000 of its Family Dollar and Dollar Tree brands stores over the course of the year. If those closures occur, Dollar Tree, Inc. will still have over 15,000 storefronts across the country.

In October 2022, Kroger Co. and Albertsons Companies announced their proposal to merge, though on February 26, 2024, the Federal Trade Commission and nine state attorneys general sued to block this proposal, alleging that the deal is anti-competitive. On April 22, 2024, Kroger Co. and Albertsons Companies announced a revised plan in which, if the merger is approved, the combined entity would divest 579 stores to C&S Wholesalers. If the divestiture occurs, the combined entity will still have over 4,400 stores across the country.

Notes for Table 3

See above notes for Figure 1 about the Diary and Survey of Consumer Payment Choice (Survey). Table 3 provides mean and median amounts of ATM and Retail point-of-sale cash withdrawal transactions by income. In the Survey, participants were asked to report the total combined income of all family members over age 15 living in the household during the past 12 months. From these responses, we constructed five income brackets – four of $25,000 each plus a fifth bin for any respondents reporting more than $100,000 in annual household income for each respondent in each year.

See e.g., Jay Lindsay, A Fatal Cash Crash? Conditions Were Ripe for It After the Pandemic Hit, but It Didn’t Happen , Fed. Rsrv. Bank of Boston (Nov. 2, 2023), https://www.bostonfed.org/news-and-events/news/2023/11/cash-crash-pandemic-increasing-credit-card-use-diary-of-consumer-payment-choice.aspx

Kevin Foster, Claire Greene, & Joanna Stavins, The 2023 Survey and Diary of Consumer Payment Choice , Fed. Rsrv Bank of Atlanta (June 2024), https://doi.org/10.29338/rdr2024-01

See e.g., Hilary Allen, Payments Failure, Boston College Law Review, Forthcoming, American University, WCL Research Paper No. 2021- 11, (Feb. 21, 2020) available at https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3539797

See e.g., Scarlett Heinbuch, Cash Is Critical in Times of Crisis , Fed. Rsrv. Bank of Atlanta (Mar. 7, 2022), https://www.atlantafed.org/blogs/take-on-payments/2022/03/07/cash-in-crisis

See e.g., Carly Page, Square Says It Has Resolved Daylong Outage , TechCrunch, (Sept. 8, 2023), https://techcrunch.com/2023/09/08/square-day-long-outage-resolved/ . See also Caroline Haskins, The Global CrowdStrike Outage Triggered a Surprise Return to Cash , Wired, (July 19, 2024), https://www.wired.com/story/microsoft-crowdstrike-outage-cash/ .

See Berhan Bayeh, Emily Cubides and Shaun O’Brien, 2024 Findings from the Diary of Consumer Payment Choice , Fed. Rsrv. (May 13, 2024), https://www.frbservices.org/binaries/content/assets/crsocms/news/research/2024-diary-of-consumer-payment-choice.pdf (findings related to low-income consumers and older Americans use of cash); Emily Cubides and Shaun O’Brian, 2023 Findings from the Diary of Consumer Payment Choice , Fed. Rsrv., (May 19, 2024), https://www.frbsf.org/cash/wp-content/uploads/sites/7/2023-Findings-from-the-Diary-of-Consumer-Payment-Choice.pdf (findings related to unbanked households use of cash), and Michelle Faviero, , More Americans are Joining the ‘Cashless’ Economy ,” Pew Rsch. Ctr, (Oct. 5, 2022), https://www.pewresearch.org/short-reads/2022/10/05/more-americans-are-joining-the-cashless-economy/ (findings related to use of cash by race and other demographics).

Similarly, the average cash-back withdrawal amount was $33 in 2012, the most recent data available from the Federal Reserve Payments Study. The study was based on self-reported information from financial institutions surveyed by the Federal Reserve. Of the reported transactions, 73 percent were debit cards with an average amount of $33 and 27 percent on general purpose prepaid cards with an average withdrawal amount of $19. 2013 Federal Reserve Payments Study: Recent and Long-Term Payment Trends in the United States: 2003 – 2012 , Fed. Rsrv. Bd. (July 2014), https://www.frbservices.org/binaries/content/assets/crsocms/news/research/2013-fed-res-paymt-study-summary-rpt.pdf

The amounts in the Survey are lower than the average ATM withdrawal amounts reported in 2022 Federal Reserve Payments study, which utilizes data from surveying financial institutions. Per this study, in 2021, the average ATM withdrawal was $198. The Federal Reserve Payments Study: 2022 Triennial Initial Data Release , Fed. Rsrv. Bd. (Apr. 21, 2023), https://www.federalreserve.gov/paymentsystems/fr-payments-study.htm

Claire Green and Oz Shy, How Consumers Get Cash: Evidence from a Diary Survey , Fed. Rsrv. Bank of Atlanta, (Apr. 2019), at 5, https://www.atlantafed.org/-/media/documents/banking/consumer-payments/research-data-reports/2019/05/08/how-consumers-get-cash-evidence-from-a-diary-survey/rdr1901.pdf (finding, “For the largest amounts of cash, respondents mostly turned to employers, with an average dollar value of cash received of $227. At bank tellers and ATMs, consumers also received average dollar values greater than the overall average: $159 and $137, respectively. Consumers received smaller amounts from family or friends ($93) and, notably, cash back at a retail store ($34). All these dollar amounts are weighted. The amount of cash received at a retail store is constrained by the store’s limits, so the amount of cash received in this way is not necessarily at the discretion of the consumer.”)

Neil Mitchell and Ann Ramage, The Second Participant in the Consumer to Business Payments Study , Fed. Rsrv. Bank of Richmond (Sept. 15, 2014), https://www.richmondfed.org/~/media/richmondfedorg/banking/payments_services/understanding_payments/pdf/psg_ck_20141118.pdf

Id. at 8, Figures 7 and 8.

See e.g., Stan Sienkiewicz, The Evolution of EFT Networks from ATMs to New On-Line Debit Payment Products , Discussion Paper, Payment Cards Ctr. of the Fed. Rsrv. Bank of Philadelphia (Apr. 2002), https://www.philadelphiafed.org/-/media/frbp/assets/consumer-finance/discussion-papers/eftnetworks_042002.pdf?la=en&hash=88302801FC98A898AB167AC2F9131CE1 (“The cash back option became popular with supermarket retailers, since store owners recognized savings as a result of less cash to count at the end of the day, a chore that represented a carrying cost to the establishment.”).

These market segments and retailers for purposes of markets analysis are similar to those used in other academic literature related to dollar store locations in the context of food access or impact on other market dynamics, such as on local grocers. See e.g., El Hadi Caoui, Brett Hollenbeck, and Matthew Osbourne, The Impact of Dollar Store Expansion on Local Market Structure and Food Access ,” (June 22, 2022), available at https://ssrn.com/abstract=4163102 (finding "In 2021, there were more of these stores operating than all the Walmarts, CVS, Walgreens, and Targets combined by a large margin.”) and Yue Cao, The Welfare Impact of Dollar Stores ,” available at https://yuecao.dev/assets/pdf/YueCaoDollarStore.pdf (last visited Aug. 23, 2024) (using the categories of dollar stores, groceries, and mass merchandise (such as Walmart) for comparisons across retail segments and noting that dollar stores regard these other segments as competitors).

Estimate based on information voluntarily provided in the CFPB's market monitoring activities.

What Forms of Payment are Accepted? U.S. Postal Serv., https://faq.usps.com/s/article/What-Forms-of-Payment-are-Accepted (last visited Aug. 23, 2024).

See generally, Stacy Mitchell, Kennedy Smith, and Susan Holmberg , The Dollar Store Invasion , Inst. for Local Self Reliance (Mar. 2023), https://cdn.ilsr.org/wp-content/uploads/2023/01/ILSR-Report-The-Dollar-Store-Invasion-2023.pdf . There is also extensive research on dollar store locations in other contexts such as food access and impact on consumer spending habits. El Hadi Caoui, Brett Hollenbeck, and Matthew Osbourne, The Impact of Dollar Store Expansion on Local Market Structure and Food Access ,” at 5, (June 22, 2022), available at https://ssrn.com/abstract=4163102

Dollar General Annual Report (Form10-K) at 7 (Mar. 25. 2024), https://investor.dollargeneral.com/websites/dollargeneral/English/310010/us-sec-filing.html?format=convpdf&secFilingId=003b8c70-dfa4-4f21-bfe7-40e6d8b26f63&shortDesc=Annual%20Report .

Dollar Tree, Inc. Annual Report (Form 10-K) at 7 (Mar. 20. 2024), https://corporate.dollartree.com/investors/sec-filings/content/0000935703-23-000016/0000935703-23-000016.pdf

See e.g., Dollar General Annual Report (Form10-K) at 7 (Mar. 25. 2024) (“We generally locate our stores and plan our merchandise selections to best serve the needs of our core customers, the low and fixed income households often underserved by other retailers, and we are focused on helping them make the most of their spending dollar.” And, Dollar Tree, Inc. Annual Report (Form 10-K) at 6 (Mar. 20. 2024), (“Family Dollar primarily serves a lower than average income customer in urban and rural locations, offering great values on everyday items.”)

Dr. Jerry Shannon, Dollar Stores, Retailer Redlining, and the Metropolitan Geographies of Precarious Consumption , Ann. of the Am. Assoc. of Geographers, Vol. 111, No. 4, 1200-1218 (2021), (analyzing over 29,000 storefront locations of Dollar General, Dollar Tree, and Family Dollar locations across the three largest MSA in each of the nine U.S. Census Bureau-defined divisions.)

Kristen Broady, Mac McComas, and Amine Ouazad, An Analysis of Financial Institutions in Black-Majority Communities: Black Borrowers and Depositors Face Considerable Challenges in Accessing Banking Services ,” Brookings Inst., (Nov. 2, 2021), https://www.brookings.edu/articles/an-analysis-of-financial-institutions-in-black-majority-communities-black-borrowers-and-depositors-face-considerable-challenges-in-accessing-banking-services/ and Drew Dahl and Michelle Franke, Banking Deserts Become a Concern as Branches Dry Up , Fed. Rsrv. Bank of St. Louis, https://www.stlouisfed.org/publications/regional-economist/second-quarter-2017/banking-deserts-become-a-concern-as-branches-dry-up (July 25, 2017).

El Hadi Caoui, Brett Hollenbeck, and Matthew Osbourne, The Impact of Dollar Store Expansion on Local Market Structure and Food Access ,” (June 22, 2022), available at https://ssrn.com/abstract=4163102 .

Dollar Tree Completes Acquisition of Family Dollar , Dollar Tree Inc., (July 6, 2015), available at https://corporate.dollartree.com/news-media/press-releases/detail/120/dollar-tree-completes-acquisition-of-family-dollar

El Hadi Caoui, Brett Hollenbeck, and Matthew Osbourne, The Impact of Dollar Store Expansion on Local Market Structure and Food Access ,” (June 22, 2022), available at https://ssrn.com/abstract=4163102 and Yue Cao, The Welfare Impact of Dollar Stores, https://yuecao.dev/assets/pdf/YueCaoDollarStore.pdf (last visited Aug. 23. 2024).

Evan Moore, Harris Teeter Introduces New Fees that Have Customers Upset. What To Know Before You’re Charged , Charlotte Observer, (Mar. 14, 2024), https://amp.charlotteobserver.com/news/business/article286627340.html

Karen Bennett and Matthew Goldberg, Survey: ATM fees Reach 26-year High While Overdraft Fees Inch Back Up , Bankrate.com (Aug. 21, 2024), https://www.bankrate.com/banking/checking/checking-account-survey/

Oz Shy and Joanna Stavins, Who Is Paying All These Fees? An Empirical Analysis of Bank Account and Credit Card Fees , Fed. Rsrv. Bank of Boston, Working Paper No. 22-18, at Table 2, (Aug. 2022), https://www.bostonfed.org/publications/research-department-working-paper/2022/who-is-paying-all-these-fees-an-empirical-analysis-of-bank-account-and-credit-card-fees .

hypothesis generating analysis

Energy & Environmental Science

A semi-vapor electrolysis technology for hydrogen generation from wide water resources.

Cost-effective and scalable green hydrogen production from water electrolysis is crucial to achieve a net-zero emission future. Progress on water electrolysis technologies has long been made towards materials design and device assembly optimization to improve cost effectiveness. However, expensive iridium-based electrocatalyst, pure water feedstock, low current density, and energy efficiency limit state-of-the-art water electrolysis, i.e., alkaline and polymer exchange membrane water electrolyzers based on liquid-water feeding for large-scale implementation. Here we propose a new semi-vapor electrolysis (SVE) system for cost-effective hydrogen generation that adopts low temperature vapor electrolysis at the anode while maintains liquid water circulation at the cathode. The SVE process requires no additional energy input as compared to conventional liquid water electrolysis while enabling the direct use of a wide range of water resources without pre-treatment. In addition, much cheaper and intrinsically more active ruthenium oxide can be used under the conditions of SVE by avoiding dissolution, which achieves an ultrahigh current density of 4.67 A cm-2 at 1.8 V and superior stability under 1.0 A cm-2 operation. Techno-economic assessment suggests significant hydrogen cost reduction due to the improved energy efficiency, reduced material cost and simplified system, and by-product profits, demonstrating the scalability of the as-proposed SVE.

Supplementary files

  • Supplementary information PDF (4842K)
  • Supplementary information XLSX (20K)

Article information

Download citation, permissions.

hypothesis generating analysis

J. Tang, K. Guo, D. Guan, Y. Hao and Z. Shao, Energy Environ. Sci. , 2024, Accepted Manuscript , DOI: 10.1039/D4EE02722A

To request permission to reproduce material from this article, please go to the Copyright Clearance Center request page .

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page .

Read more about how to correctly acknowledge RSC content .

Social activity

Search articles by author.

This article has not yet been cited.

Advertisements

COMMENTS

  1. Data-Driven Hypothesis Generation in Clinical Research: What We Learned

    However, a review of the literature reveals the lack of original studies on hypothesis generation in clinical research. The authors then summarize their first human participant study exploring data-driven hypothesis generation by clinical researchers in a simulated setting.

  2. Hypothesis Generation for Data Science Projects

    Hypothesis generation is a key step in data science projects. Here's a case study on hypotheis generation for data science.

  3. Hypothesis Testing

    Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

  4. Formulating Hypotheses for Different Study Designs

    Formulating Hypotheses for Different Study Designs. Generating a testable working hypothesis is the first step towards conducting original research. Such research may prove or disprove the proposed hypothesis. Case reports, case series, online surveys and other observational studies, clinical trials, and narrative reviews help to generate ...

  5. Hypothesis

    The hypothesis generation methods mentioned above are mainly techniques included in data analysis such as data mining, machine learning, and multivariate analysis.

  6. Hypothesis testing for data scientists

    Hypothesis testing is a common statistical tool used in research and data science to support the certainty of findings. The aim of testing is to answer how probable an apparent effect is detected by chance given a random data sample. This article provides a detailed explanation of the key concepts in Frequentist hypothesis testing using ...

  7. Demystifying Hypothesis Generation: A Guide to AI-Driven Insights

    What is Hypothesis Generation? Hypothesis generation involves making informed guesses about various aspects of a business, market, or problem that need further exploration and testing. It's a crucial step while applying the scientific method to business analysis and decision-making.

  8. Hypothesis Generation and Interpretation

    The novel methods and technologies proposed in Hypothesis Generation and Interpretation are supported by the incorporation of historical perspectives on science and an emphasis on the origin and development of the ideas behind their design principles and patterns.

  9. Machine Learning as a Tool for Hypothesis Generation

    While hypothesis testing is a highly formalized activity, hypothesis generation remains largely informal. We propose a systematic procedure to generate novel hypotheses about human behavior, which uses the capacity of machine learning algorithms to notice patterns people might not.

  10. How to Write a Strong Hypothesis

    A hypothesis is a statement that can be tested by scientific research. If you want to test a relationship between two or more variables, you need to write hypotheses before you start your experiment or data collection.

  11. A Beginner's Guide to Hypothesis Testing in Business

    Hypothesis testing, then, is a statistical means of testing an assumption stated in a hypothesis. While the specific methodology leveraged depends on the nature of the hypothesis and data available, hypothesis testing typically uses sample data to extrapolate insights about a larger population.

  12. Hypothesis Testing in Data Science

    Hypothesis Testing vs Hypothesis Generation In the world of Data Science, there are two parts to consider when putting together a hypothesis. Hypothesis Testing is when the team builds a strong hypothesis based on the available dataset. This will help direct the team and plan accordingly throughout the data science project.

  13. Hypothesis Generation : An Efficient Way of Performing EDA

    As mentioned in this article, Hypothesis generation is an educated "guess" of various factors that are impacting the business problem that needs to be solved using machine learning. In short, you are making wise assumptions as to how certain factors would affect our target variable and in the process that follows, you try to prove and ...

  14. Hypothesis Testing in Statistics

    Explore hypothesis testing, a fundamental method in data analysis. Understand how to use it to draw accurate conclusions and make informed decisions.

  15. Hypothesis Generation

    Hypothesis generation is the formation of guesses as to what the segment of code does; this step can also guide a re- segmentation of the code. Finally, verification is the process of examining the code and associated documentation to determine the consistency of the code with the current hypotheses.

  16. Hypothesis Testing: 4 Steps and Example

    Hypothesis testing is the process that an analyst uses to test a statistical hypothesis. The methodology depends on the nature of the data used and the reason for the analysis.

  17. Understanding Hypothesis Testing

    Hypothesis testing is an important procedure in statistics. Hypothesis testing evaluates two mutually exclusive population statements to determine which statement is most supported by sample data. When we say that the findings are statistically significant, thanks to hypothesis testing.

  18. Hypothesis Generator

    The Hypothesis Generator is powered by advanced AI models. These models analyze your research question and dataset description, and use their understanding of language, context, and research methodology to generate null and alternative hypotheses that are clear, concise, and testable.

  19. CREATIVE HYPOTHESIS GENERATING IN PSYCHOLOGY: Some Useful Heuristics

    Abstract To correct a common imbalance in methodology courses, focusing almost entirely on hypothesis-testing issues to the neglect of hypothesis-generating issues which are at least as important, 49 creative heuristics are described, divided into 5 categories and 14 subcategories. Each of these heuristics has often been used to generate hypotheses in psychological research, and each is ...

  20. Hypothesis: Definition, Examples, and Types

    A hypothesis is a tentative statement about the relationship between two or more variables. Explore examples and learn how to format your research hypothesis.

  21. What is a Research Hypothesis: How to Write it, Types, and Examples

    Research begins with a research question and a research hypothesis. But what are the characteristics of a good hypothesis? In this article, we dive into the types of research hypothesis, explain how to write a research hypothesis, offer research hypothesis examples and answer top FAQs on research hypothesis. Read more!

  22. CD19 CAR T cells for B cell malignancies: a systematic review and meta

    This meta-analysis is a critical hypothesis-generating instrument, capturing effects in the CD19 CAR T cells literature lacking randomized clinical trials and large observational studies. CD19-targeted chimeric antigen receptors (CAR) T cells are one of the most remarkable cellular therapies for managing B cell malignancies. However, long-term ...

  23. Expectations and Preferences for Digital Cessation Treatment

    Objective: We aimed to collect detailed, hypothesis-generating information about expectations and preferences for cessation digital treatment among older adults who smoke cigarettes. Methods: Semistructured interviews were conducted with adults aged 65+ years currently smoking or who had quit within the past month.

  24. Massively parallel analysis of single-molecule dynamics on next ...

    To comprehensively profile complex single-molecule dynamics at the library scale, we integrated high-throughput single-molecule fluorescence resonance energy transfer (smFRET) microscopy (41-44) with Illumina next-generation sequencing (NGS) (45, 46) ().To immobilize a library of FRET constructs on the surface of an Illumina MiSeq flow cell (), we used a 5′ single-stranded DNA overhang and ...

  25. MIT study explains why laws are written in an incomprehensible style

    That analysis revealed that legal documents frequently have long definitions inserted in the middle of sentences — a feature known as "center-embedding." Linguists have previously found that this kind of structure can make text much more difficult to understand. ... One was the "copy and edit hypothesis," which suggests that legal ...

  26. U.S. power grid added 20.2 GW of generating capacity in the first half

    According to our latest Preliminary Monthly Electric Generator Inventory, developers and power plant owners added 20.2 gigawatts (GW) of utility-scale electric generating capacity in the United States during the first half of 2024.This new capacity is 3.6 GW (21%) more than the capacity added during the first six months of 2023. Based on the most recently reported data, developers and owners ...

  27. Cash-back Fees

    CFPB analysis of the Diary and Survey of Consumer Payment Choice from 2017 to 2022 show that even among consumers with incomes below $50,000, the amount withdrawn at an ATM is more than double the typical cash-back withdrawal amount. Additionally, for the average and median amounts, across all incomes the ATM withdrawal amounts are larger than ...

  28. Electricity explained Electricity generation, capacity, and sales in

    Electricity generation capacity. To ensure a steady supply of electricity to consumers, operators of the electric power system, or grid, call on electric power plants to produce and supply the right amount of electricity to the grid at every moment to instantaneously meet and balance electricity demand.. In general, power plants do not generate electricity at their full capacities at every ...

  29. Analysis: Arlington confrontation isn't Trump's first ...

    Donald Trump's campaign is co-managed by the man who engineered the "swift boating" of John Kerry in 2004, so it should come as no surprise that 20 years later, military service and ...

  30. A semi-vapor electrolysis technology for hydrogen generation from wide

    Here we propose a new semi-vapor electrolysis (SVE) system for cost-effective hydrogen generation that adopts low temperature vapor electrolysis at the anode while maintains liquid water circulation at the cathode. The SVE process requires no additional energy input as compared to conventional liquid water electrolysis while enabling the direct ...