Data Exploration and Predictive Analytics for Music Publishing
音乐出版的数据探索和预测分析
基本信息
- 批准号:EP/M507076/1
- 负责人:
- 金额:$ 14.83万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2014
- 资助国家:英国
- 起止时间:2014 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The PDRA will liaise with the developers at Sentric Music to ensure a broad array of diverse data sources is linked andpreprocessed in a statistically sound manner, and ensuring the final version of the data are in a format conducive tomachine learning and statistical inference (e.g., unstructured data will need to be pre-parsed into structured data). ThePDRA will need to use a broad suite of "data science" skills to achieve this - including computing skills, as well as statisticalexpertise.The second objective will involve representing the problem from a statistical viewpoint, as a problem of predicting the futurevalue of a quantity of interest (in this case earnings), on the basis of attributes about the artist and/or their songs, such aspast earnings, genre, fan-base, etc. To choose an appropriate model, two types of considerations come into play: theformat of the data, as well as our expectations about the types of relationships we are trying to capture. We discuss both inturn.With regards to data format, this particular application is likely to give rise to a large number of attributes, of various types(e.g., each song, or artist, will be represented in numeric ways, placed into categories, or rated according to possiblydifferent scales, etc.). Automatic feature selection techniques will be required to ensure that information-poor attributes areexcluded from consideration to avoid contaminating the results. Moreover, there is a natural hierarchical structure to thisproblem, introduced by the relationship between an artist and their songs. Both these aspects challenge off-the-shelfstatistical models, and require a bespoke model.With regards to the choice of model, it is known that typically in Big Data, as the data set size increases, so does theheterogeneity in the data, and failing to account for this can lead to over-confident and inaccurate predictions. One solutionis to employ a "divide and conquer" approach by using decision trees, which segment the initial dataset and fit a separatestatistical model in each segment. This approach achieves flexibility without compromising on computational efficiency.Notably, the output of such models remains interpretable by the end user because it closely resembles the manualsegmentation already used extensively in marketing and, currently, by Sentric. The difference is that the segmentationrules are extracted from the data in a principled, automatic fashion. Another consideration in choosing the model is theability for it to output the confidence of its own predictions. Failure to do so can introduce risks since only confidentpredictions should be used for decision-making. Adopting a Bayesian framework is a natural way to achieve this objective.Our favored approach overall is the framework of Bayesian Dynamic Trees, which combines flexibility, statisticalsoundness, scalability using cutting-edge methods, as well as a built-in ability to adapt to data evolution at no extracomputational cost [Anagnostopoulos, 2013]. This framework will have to be extended to handle this problem, to handle thehierarchical relationship between artists and their songs; the diversity of available attributes; and the need to produceforecasts over possibly longer-time horizons.Finally, the PRDA will supervise and contribute to the deployment of the model within Sentric, as well as the design of theUser Interface that will be made available to the artists. The former will involve scalability considerations, and the latter willinvolve innovation in visualisation, and communication of uncertainty.
PDRA 将与 Sentric Music 的开发人员联络,确保以统计上合理的方式链接和预处理各种不同的数据源,并确保数据的最终版本采用有利于机器学习和统计推理的格式(例如,非结构化数据需要预先解析为结构化数据)。 PDRA 将需要使用广泛的“数据科学”技能来实现这一目标 - 包括计算技能以及统计专业知识。第二个目标将涉及从统计角度表示问题,即根据艺术家和/或其歌曲的属性(例如过去的收入、流派、粉丝基础等)预测一定数量的兴趣(在本例中为收入)的未来价值。要选择合适的模型,需要两种类型 考虑因素开始发挥作用:数据的格式,以及我们对试图捕获的关系类型的期望。我们依次讨论两者。关于数据格式,这个特定的应用程序可能会产生大量各种类型的属性(例如,每首歌曲或艺术家将以数字方式表示,放入类别中,或根据可能不同的尺度进行评级等)。需要自动特征选择技术来确保将信息匮乏的属性排除在考虑范围之外,以避免污染结果。此外,这个问题存在一个自然的层次结构,这是由艺术家和他们的歌曲之间的关系引入的。这两个方面都对现成的统计模型提出了挑战,需要定制模型。关于模型的选择,众所周知,通常在大数据中,随着数据集大小的增加,数据的异质性也会增加,如果不考虑这一点,可能会导致过度自信和不准确的预测。一种解决方案是通过使用决策树来采用“分而治之”的方法,该方法对初始数据集进行分段并在每个分段中拟合一个单独的统计模型。这种方法在不影响计算效率的情况下实现了灵活性。值得注意的是,此类模型的输出仍然可以由最终用户解释,因为它非常类似于市场营销中广泛使用的手动细分,目前由 Sentric 使用。不同之处在于,分段规则是以有原则的、自动的方式从数据中提取的。选择模型的另一个考虑因素是它输出其自身预测的置信度的能力。如果不这样做可能会带来风险,因为只有自信的预测才能用于决策。采用贝叶斯框架是实现这一目标的自然方式。总体而言,我们最喜欢的方法是贝叶斯动态树框架,它结合了灵活性、统计可靠性、使用尖端方法的可扩展性,以及无需额外计算成本即可适应数据演化的内置能力[Anagnostopoulos,2013]。这个框架必须扩展来处理这个问题,处理艺术家和他们的歌曲之间的层次关系;可用属性的多样性;最后,PRDA 将监督并促进 Sentric 内模型的部署,以及将提供给艺术家的用户界面的设计。前者将涉及可扩展性考虑,后者将涉及可视化创新和不确定性沟通。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christoforos Anagnostopoulos其他文献
Christoforos Anagnostopoulos的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
相似海外基金
Predictive Drone Control for Interplanetary Exploration
星际探索的预测无人机控制
- 批准号:
RGPIN-2019-05363 - 财政年份:2022
- 资助金额:
$ 14.83万 - 项目类别:
Discovery Grants Program - Individual
Predictive Drone Control for Interplanetary Exploration
星际探索的预测无人机控制
- 批准号:
RGPIN-2019-05363 - 财政年份:2021
- 资助金额:
$ 14.83万 - 项目类别:
Discovery Grants Program - Individual
Exploration of predictive markers of therapeutic response to anti-IL-12/23p40 antibody in patients with Crohn's disease.
探索克罗恩病患者抗 IL-12/23p40 抗体治疗反应的预测标记。
- 批准号:
20K16992 - 财政年份:2020
- 资助金额:
$ 14.83万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Exploration of predictive markers and establishment of preventative and therapeutic measures for chronic postsurgical pain targeting central sensitization mechanisms
针对中枢敏化机制的慢性术后疼痛预测标志物的探索及预防和治疗措施的建立
- 批准号:
20K09206 - 财政年份:2020
- 资助金额:
$ 14.83万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Predictive Drone Control for Interplanetary Exploration
星际探索的预测无人机控制
- 批准号:
RGPIN-2019-05363 - 财政年份:2020
- 资助金额:
$ 14.83万 - 项目类别:
Discovery Grants Program - Individual
Exploration of Epigenetic Profiles in Circulating Tumor DNA to Identify Predictive Cancer Biomarkers
探索循环肿瘤 DNA 中的表观遗传谱以鉴定预测性癌症生物标志物
- 批准号:
439356 - 财政年份:2020
- 资助金额:
$ 14.83万 - 项目类别:
Studentship Programs
Predictive Drone Control for Interplanetary Exploration
星际探索的预测无人机控制
- 批准号:
RGPIN-2019-05363 - 财政年份:2019
- 资助金额:
$ 14.83万 - 项目类别:
Discovery Grants Program - Individual
Predictive Drone Control for Interplanetary Exploration
星际探索的预测无人机控制
- 批准号:
DGECR-2019-00122 - 财政年份:2019
- 资助金额:
$ 14.83万 - 项目类别:
Discovery Launch Supplement
Petrological models for the origins of coloured gems: Developing new predictive tools for gem exploration
有色宝石起源的岩石学模型:开发新的宝石勘探预测工具
- 批准号:
489355-2016 - 财政年份:2018
- 资助金额:
$ 14.83万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Doctoral
Exploration of predictive molecular biomarkers for acquired resistance mechanisms to EGFR tyrosine kinase inhibitors
探索 EGFR 酪氨酸激酶抑制剂获得性耐药机制的预测分子生物标志物
- 批准号:
18K07336 - 财政年份:2018
- 资助金额:
$ 14.83万 - 项目类别:
Grant-in-Aid for Scientific Research (C)