There are multiple steps to generate the profiles and persona for this evaluation work, all the scripts are in data_generation directory. Please execute the code under data_generation directory.
- The base persona file is created by chatGPT, but it can be any json file like it as long as it includes "persona_name" and "description" for later use.
- Run
python build_persona.pyorpython build_complex_persona.pyto create realistic persona. - Run
python create_queries_from_persona.pyto create possible queries from each persona. - Run
python get_ddg_results.pyto retrieve top N search results for each query. - Run
python label_queries_and_websites.pyto label category and intent of each query / page. - Run
python refine_queries_and_websites.pyto trim off the less relevant ones. - Run
python synthesize_intermediate_profiles.py --bank-dir "./refined_websites", --output-dir "./refined_records"to synthesize user profile intermediate result as input of evaluation pipeline. - Run
python generate_llm_insights.py --profile-dir "./refined_records" --output-dir "./gpt_insights_from_refined_records"to generate gpt version insight for evaluation.