Using AI as a replacement for interviewing actual users is a brilliant idea if you want to look like you made an effort, but are really looking to fill the page with superficial, stereotyped bullshit.
I am backing Niloufar in this. It's not that she "did not use the tool correctly." It's that the tool is doing something inappropriate and unethical.
For design & strategic research we want to build new knowledge about people's approaches outside of what is already experienced or known by a team. To build this new knowledge we need to understand new meaning communicated to us by people who have done a lot of interior cognition (defined as: inner thinking, emotional reactions, personal rules) about the purpose they addressed. ChatGPT is not a source of meaning. ChatGPT is more like a search engine, finding words that have been published in relation to other words, all already existing in the training data. https://medium.com/@indiyoung/insta-personas-synthetic-users-fc6e9cd1c301
Hi Niloufar, thank you for taking the time to review our app.
We obviously disagree with your analysis and your conclusions.
Let's review.
From you post:
Parents were seeking a high quality education for their child, but faced challenges in reaching this goal but had several challenges:
1 - finding useful and relevant information about the schools available
2 - making choices between schools that balanced access to well-resourced schools against other considerations like safety, inclusion, and convenience
3 - overwhelming number of options
Some quotes from the synthetic user interviews you shared:
1 - "This issue is causing me anxiety and stress because I want to make the best decision for my child's education, but I don't know where to turn for reliable information."
" I have been struggling to find information about the different schools, their curriculums, and their reputations."
"This impacts me greatly because I want the best for my daughter, but I don't even know where to start looking for reliable information about schools."
2 - "Finally, I'm also concerned about the safety of the schools in my area, and whether my child will be able to stay safe while attending."
"Lastly, I am concerned about the safety of the schools in the area. There have been news reports of violence in some schools in the past, so I want to make sure the school I choose for my child is safe."
"Additionally, the dashboard may not reflect other factors that are important to me, such as the school's cultural responsiveness or climate. "
"Finally, this dashboard provides statistical data only, and the families need more in-depth information about schools such as school culture or individual teacher quality, etc. which can sometimes be more critical than just data."
"One issue is transportation. I live in a neighborhood with poor public transportation, which makes it difficult for me to visit schools that are located further away. "
"Secondly, can the dashboard also provide information about transportation options for the schools? This may be particularly important for parents who do not have access to a car."
"For example, the dashboard might indicate high test scores, but it might not provide information about the quality of the teachers, extracurricular programs, and other aspects that are important to me as a parent."
3 - "Another challenge is that there are many different types of schools to choose from, and I don't know which one would be the best for my child. For example, there are charter schools, public schools, and private schools, and they all have different strengths and weaknesses. "
"This solution provides a centralized location for me to access information about schools nearby without having to navigate multiple websites or online sources. "
Regarding the average score given by synthetic users ( 3.4/5), if fits your own results since, as you recognize in your paper, "...some parents in our sample found this kind of data useful...".
If you were willing to give synthetic users a chance and see how they really compare, you could have used the problem exploration mode of our app. Maybe that wasn't the goal.
Now, we don't believe this will change yours or anyone's mind. People look at our product and, even before trying it, make a decision of giving it a chance or not.
We don't want user researchers, other product people or academics to stop talking to people. We want to help them be better equipped so, when they do, they can dig deeper and have even more nuanced conversations and explorations.
What we also want is to help people who are priced out of research or too time constraint to do any, to have a real chance at better understanding the people they are building products for. There are too many products and features launched with zero research and we want to help that number reduce.
Even well resourced teams can struggle to find the people they want to help as your paper clearly shows.
Two quotes from it show this really well:
"We met with these partners weekly to identify our research questions and develop a recruitment strategy over the course of **several months** ." (emphasis ours)
"Our sample in this paper is small because we spent **significant time and effort recruiting participants** and building relationships with community groups."
" responses to this survey were very slow, and most parents who completed it did not respond when we tried to set up an interview."
These are the type of problems we are trying to solve.
If you are available, we would welcome constructive feedback how to to make our product better
I am so happy to see this - instead of a hot take, an actual experience with your thoughtful criticisms! Glad that our field has a reference example of the ridiculousness - I love the disdain, but I love substance even more.
I loved this phrase in particular: "The whole point of spending the time to interview people and then spending a lot more time analyzing the large amounts of data gathered, is the ability to connect with them, build trust, dig deeper, ask them to share stories, and learn about their feelings and emotions. Pattern synthesis engines have none of those."
-> Indeed, the fact that good qualitative research takes time, is the point! It's a feature, not a bug! It is -NOT- the part that we should replace. Immersing yourself, getting familiar with the people and the problems, having opportunities for serendipity, and going deep in analysis, is what it's all about.
Using AI to optimise & automate large parts of ResearchOPS? Yes please.
Using AI to replace humans and human insights? F*ck no.
I appreciate people sharing their experience, but it seems to me that the author of this review didn’t really give Synthetic Users a fair shot.
It’s a no brainer that we should continue collecting human insights. However, dismissing synthetic users in such a shallow way does disservice to other scholars who may benefit from using it.
For example, recent studies shows that LLMs were able to replicate hundreds of treatment effects from social science experiments with a correlation of .9: https://www.treatmenteffect.app/
And this is only one of a large number of papers on the topic that shows the potential of these tools to facilitate science.
Even if we end up using human surveys, quickly running pilots for a small fraction of the cost to find the most promising experiments (directions) can save tremendous amount of time and resources. Just this summer we ran 6 pilots for a study, thousands of dollars, and we ran out of funds.
To provide a specific counterpoint, when ChatGPT came out, our school took a stance and much of the discussion was about plagiarism. My students and I used ChatGPT (as a synthetic user) to understand the frustrations of different stakeholders (related to the use of ChatGPT in the classroom). Fast forward to the end of the semester, the conversation completely changed with our dean asking “businesses are expecting us to be the leaders on AI, so what are you doing in the classroom to incorporate AI” (precisely what ChatGPT identified earlier that semester…it was absolutely spot on).
The point here is not that we should replace human interviews completely, but that AI generated insights can be incredibly powerful tool (e.g., quickly and cost-efficiently running pilots, problem exploration, etc.) Like this paper in PNAS that shows that we can even study past civilizations: https://www.pnas.org/doi/10.1073/pnas.2407639121
It’s absolutely incredible that this is even a possibility.
But let’s recap the review. A qualitative researcher spends many months collecting qualitative interviews to understand a problem. She then tries to compare her many months of research to AI generated insights. She spends less than a minute prompting the AI, which produces 6 interviews (no attempt to work with the AI for more than a single initial prompt, understand the other features, probe the AI to get deeper answers, or use her expertise and knowledge to work together with the AI as a co-intelligence). Instead of actually reading the interviews (see Synthetic Users response), the researcher provides a misleading interpretation of the AI findings based on an extremely shallow and brief interaction.
The conclusion: Don’t do this!
Is it really fair to compare months of qualitative work and hundreds of hours of in-person interviews with AI generated responses produced in a less than a minute. In fact, it’s extraordinary that the AI was able to generate so many (very comparable) insights in such a short period of time. How about working together with the AI - as a co-intelligence (these tools lack true understanding and need human input).
The irony is that while the author criticizes AI-generated insights as lacking nuance, her own conclusions about needing “trusting relationships” and “personalized solutions” are extremely general. There's little detail about how exactly to build these relationships, what specific strategies work best, or what resources would be required to implement such solutions at scale. While parent advocates clearly play a valuable role, the review doesn't address the significant resource constraints and scalability challenges of providing personalized human support to every family.
Most importantly, there is no mention of the real purpose and value of these tools – they are not meant to replace human insights and interviews. But they can significantly speed up the process and allow researchers to gain insights quickly. There are many scenarios where cost is prohibitive or there are other forms of constraints where interviews are not even possible.
Perhaps next time when writing a review, the author should spend more than a minute reviewing the tool, especially if they are going to criticize it so harshly.
To be clear, it's not that the product falls short of my expectations, the problem is that it's actually what I expected it would be. I suggest you read the post carefully and seriously reconsider marketing pattern synthesis engines as reasonable replacements for actual human interviews.
Thank you for the review, Niloufar. Its much appreciated. I’m sure AI will play a significant role in research, but it clearly wont be due to this iteration of syntheticusers.
Agreed, I think the key is to think about which parts of user research are repetitive, e.g. getting feedback on interview questions, running basic accessibility heuristics, etc.
I am backing Niloufar in this. It's not that she "did not use the tool correctly." It's that the tool is doing something inappropriate and unethical.
For design & strategic research we want to build new knowledge about people's approaches outside of what is already experienced or known by a team. To build this new knowledge we need to understand new meaning communicated to us by people who have done a lot of interior cognition (defined as: inner thinking, emotional reactions, personal rules) about the purpose they addressed. ChatGPT is not a source of meaning. ChatGPT is more like a search engine, finding words that have been published in relation to other words, all already existing in the training data. https://medium.com/@indiyoung/insta-personas-synthetic-users-fc6e9cd1c301
Hi Niloufar, thank you for taking the time to review our app.
We obviously disagree with your analysis and your conclusions.
Let's review.
From you post:
Parents were seeking a high quality education for their child, but faced challenges in reaching this goal but had several challenges:
1 - finding useful and relevant information about the schools available
2 - making choices between schools that balanced access to well-resourced schools against other considerations like safety, inclusion, and convenience
3 - overwhelming number of options
Some quotes from the synthetic user interviews you shared:
1 - "This issue is causing me anxiety and stress because I want to make the best decision for my child's education, but I don't know where to turn for reliable information."
" I have been struggling to find information about the different schools, their curriculums, and their reputations."
"This impacts me greatly because I want the best for my daughter, but I don't even know where to start looking for reliable information about schools."
2 - "Finally, I'm also concerned about the safety of the schools in my area, and whether my child will be able to stay safe while attending."
"Lastly, I am concerned about the safety of the schools in the area. There have been news reports of violence in some schools in the past, so I want to make sure the school I choose for my child is safe."
"Additionally, the dashboard may not reflect other factors that are important to me, such as the school's cultural responsiveness or climate. "
"Finally, this dashboard provides statistical data only, and the families need more in-depth information about schools such as school culture or individual teacher quality, etc. which can sometimes be more critical than just data."
"One issue is transportation. I live in a neighborhood with poor public transportation, which makes it difficult for me to visit schools that are located further away. "
"Secondly, can the dashboard also provide information about transportation options for the schools? This may be particularly important for parents who do not have access to a car."
"For example, the dashboard might indicate high test scores, but it might not provide information about the quality of the teachers, extracurricular programs, and other aspects that are important to me as a parent."
3 - "Another challenge is that there are many different types of schools to choose from, and I don't know which one would be the best for my child. For example, there are charter schools, public schools, and private schools, and they all have different strengths and weaknesses. "
"This solution provides a centralized location for me to access information about schools nearby without having to navigate multiple websites or online sources. "
Regarding the average score given by synthetic users ( 3.4/5), if fits your own results since, as you recognize in your paper, "...some parents in our sample found this kind of data useful...".
If you were willing to give synthetic users a chance and see how they really compare, you could have used the problem exploration mode of our app. Maybe that wasn't the goal.
But we did and we believe the results align with the results of your own research. Link: https://app.syntheticusers.com/summaries/f055d3c2-2a37-43a0-9b40-fcb3a3baa327
Now, we don't believe this will change yours or anyone's mind. People look at our product and, even before trying it, make a decision of giving it a chance or not.
We don't want user researchers, other product people or academics to stop talking to people. We want to help them be better equipped so, when they do, they can dig deeper and have even more nuanced conversations and explorations.
What we also want is to help people who are priced out of research or too time constraint to do any, to have a real chance at better understanding the people they are building products for. There are too many products and features launched with zero research and we want to help that number reduce.
Even well resourced teams can struggle to find the people they want to help as your paper clearly shows.
Two quotes from it show this really well:
"We met with these partners weekly to identify our research questions and develop a recruitment strategy over the course of **several months** ." (emphasis ours)
"Our sample in this paper is small because we spent **significant time and effort recruiting participants** and building relationships with community groups."
" responses to this survey were very slow, and most parents who completed it did not respond when we tried to set up an interview."
These are the type of problems we are trying to solve.
If you are available, we would welcome constructive feedback how to to make our product better
I am so happy to see this - instead of a hot take, an actual experience with your thoughtful criticisms! Glad that our field has a reference example of the ridiculousness - I love the disdain, but I love substance even more.
I loved this phrase in particular: "The whole point of spending the time to interview people and then spending a lot more time analyzing the large amounts of data gathered, is the ability to connect with them, build trust, dig deeper, ask them to share stories, and learn about their feelings and emotions. Pattern synthesis engines have none of those."
-> Indeed, the fact that good qualitative research takes time, is the point! It's a feature, not a bug! It is -NOT- the part that we should replace. Immersing yourself, getting familiar with the people and the problems, having opportunities for serendipity, and going deep in analysis, is what it's all about.
Using AI to optimise & automate large parts of ResearchOPS? Yes please.
Using AI to replace humans and human insights? F*ck no.
I appreciate people sharing their experience, but it seems to me that the author of this review didn’t really give Synthetic Users a fair shot.
It’s a no brainer that we should continue collecting human insights. However, dismissing synthetic users in such a shallow way does disservice to other scholars who may benefit from using it.
For example, recent studies shows that LLMs were able to replicate hundreds of treatment effects from social science experiments with a correlation of .9: https://www.treatmenteffect.app/
And this is only one of a large number of papers on the topic that shows the potential of these tools to facilitate science.
Even if we end up using human surveys, quickly running pilots for a small fraction of the cost to find the most promising experiments (directions) can save tremendous amount of time and resources. Just this summer we ran 6 pilots for a study, thousands of dollars, and we ran out of funds.
To provide a specific counterpoint, when ChatGPT came out, our school took a stance and much of the discussion was about plagiarism. My students and I used ChatGPT (as a synthetic user) to understand the frustrations of different stakeholders (related to the use of ChatGPT in the classroom). Fast forward to the end of the semester, the conversation completely changed with our dean asking “businesses are expecting us to be the leaders on AI, so what are you doing in the classroom to incorporate AI” (precisely what ChatGPT identified earlier that semester…it was absolutely spot on).
The point here is not that we should replace human interviews completely, but that AI generated insights can be incredibly powerful tool (e.g., quickly and cost-efficiently running pilots, problem exploration, etc.) Like this paper in PNAS that shows that we can even study past civilizations: https://www.pnas.org/doi/10.1073/pnas.2407639121
Or this paper that shows how synthetic experiments can help theorizing: https://www.nber.org/papers/w33033
It’s absolutely incredible that this is even a possibility.
But let’s recap the review. A qualitative researcher spends many months collecting qualitative interviews to understand a problem. She then tries to compare her many months of research to AI generated insights. She spends less than a minute prompting the AI, which produces 6 interviews (no attempt to work with the AI for more than a single initial prompt, understand the other features, probe the AI to get deeper answers, or use her expertise and knowledge to work together with the AI as a co-intelligence). Instead of actually reading the interviews (see Synthetic Users response), the researcher provides a misleading interpretation of the AI findings based on an extremely shallow and brief interaction.
The conclusion: Don’t do this!
Is it really fair to compare months of qualitative work and hundreds of hours of in-person interviews with AI generated responses produced in a less than a minute. In fact, it’s extraordinary that the AI was able to generate so many (very comparable) insights in such a short period of time. How about working together with the AI - as a co-intelligence (these tools lack true understanding and need human input).
The irony is that while the author criticizes AI-generated insights as lacking nuance, her own conclusions about needing “trusting relationships” and “personalized solutions” are extremely general. There's little detail about how exactly to build these relationships, what specific strategies work best, or what resources would be required to implement such solutions at scale. While parent advocates clearly play a valuable role, the review doesn't address the significant resource constraints and scalability challenges of providing personalized human support to every family.
Most importantly, there is no mention of the real purpose and value of these tools – they are not meant to replace human insights and interviews. But they can significantly speed up the process and allow researchers to gain insights quickly. There are many scenarios where cost is prohibitive or there are other forms of constraints where interviews are not even possible.
Perhaps next time when writing a review, the author should spend more than a minute reviewing the tool, especially if they are going to criticize it so harshly.
Loved your write up great to get some perspective on these tools. I have recently done a deep-dive hands on how might we build this to better understand how to create, work and explore synthetic user data in simulated environments. Would love your perspective https://towardsdatascience.com/creating-synthetic-user-research-using-persona-prompting-and-autonomous-agents-b521e0a80ab6
Hi Niloufar, thank you taking the time to review our product.
I understand it falls short of your expectations.
Before I reply to it, would you be so kind to share the 6 interviews you’ve ran? I’d like to have a look at them and the linked paper before replying.
Hi Hugo,
Good idea, I've added a link to the interviews to the post too: https://niloufar.org/wp-content/uploads/2023/04/syntheticinterviews.pdf
To be clear, it's not that the product falls short of my expectations, the problem is that it's actually what I expected it would be. I suggest you read the post carefully and seriously reconsider marketing pattern synthesis engines as reasonable replacements for actual human interviews.
That's what's disingenuous. The marketing doesn't fit what he says in responses. It says "user research without the users" as a tagline. Gross.
Thank you for the review, Niloufar. Its much appreciated. I’m sure AI will play a significant role in research, but it clearly wont be due to this iteration of syntheticusers.
Agreed, I think the key is to think about which parts of user research are repetitive, e.g. getting feedback on interview questions, running basic accessibility heuristics, etc.