专利名称:EFFICIENT DUPLICATE DETECTION FOR
MACHINE LEARNING DATA SETS
发明人:LEO PARKER DIRAC,ALEKSANDR
MIKHAYLOVICH INGERMAN
申请号:US14569458申请日:20141212
公开号:US20150379430A1公开日:20151231
专利附图:
摘要:At a machine learning service, a determination is made that an analysis to detectwhether at least a portion of contents of one or more observation records of a first data
set are duplicated in a second set of observation records is to be performed. Aduplication metric is obtained, indicative of a non-zero probability that one or moreobservation records of the second set are duplicates of respective observation recordsof the first set. In response to determining that the duplication metric meets a thresholdcriterion, one or more responsive actions are initiated, such as the transmission of anotification to a client of the service.
申请人:AMAZON TECHNOLOGIES, INC.
地址:Reno NV US
国籍:US
更多信息请下载全文后查看
因篇幅问题不能全部显示,请点此查看更多更全内容