collapse
0%
WDescription%
Build stability: All recent builds failed.0
Build History
x
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Changes

#17 (Oct 19, 2021, 6:04:45 PM)

  1. ofer fast gather candidate option as laternative to distsim — carsten.draschner / githubweb
  2. more playground for parameters — carsten.draschner / githubweb
  3. more size prints — carsten.draschner / githubweb
  4. adjust considered features — carsten.draschner / githubweb

#16 (Oct 19, 2021, 12:08:59 AM)

  1. introduce fast minahsh gather candidate approach as alternative to distSim, if only partial weighning are given, fill other with zero but show message — carsten.draschner / githubweb
  2. play around with lmdb — carsten.draschner / githubweb

#15 (Oct 14, 2021, 3:35:15 PM)

  1. sample file and tiny changes — carsten.draschner / githubweb

#14 (Oct 13, 2021, 4:08:53 PM)

  1. scala docs for DaSimEstimator — carsten.draschner / githubweb
  2. wrote first dasim estimator unit test to more easy call it — carsten.draschner / githubweb
  3. new small dataset which can be used for unit test dor dasim unit test — carsten.draschner / githubweb
  4. include opportunity for semantification. fixed a bug causing from too much python like slicing — carsten.draschner / githubweb
  5. call semantification within test — carsten.draschner / githubweb
  6. rremoved debug println — carsten.draschner / githubweb

#13 (Aug 4, 2021, 3:33:04 PM)

  1. - start gather candidate pairs with distsim — carsten.draschner / githubweb
  2. optimized speed of uniwue cadidates by usage of df functionalities instead of dataset options — carsten.draschner / githubweb
  3. - norm scale similarity score — carsten.draschner / githubweb
  4. removed lines of code now refactored into norm method — carsten.draschner / githubweb
  5. print lines for automatic retreived weighting maps — carsten.draschner / githubweb
  6. - offer setters for hyperparameters — carsten.draschner / githubweb
  7. usage of parameters in eval script — carsten.draschner / githubweb

#12 (Aug 2, 2021, 4:58:06 PM)

  1. offer verbose mode — carsten.draschner / githubweb
  2. started with refactoring — carsten.draschner / githubweb
  3. bse class for trying our refactored dasim which might become basis for unit tests — carsten.draschner / githubweb

#11 (Jul 29, 2021, 4:15:03 PM)

  1. - semantification of similarity results — carsten.draschner / githubweb

#10 (Jul 29, 2021, 2:32:13 PM)

  1. - better handling of aggregation of overall similarity value while preserving initial sim values — carsten.draschner / githubweb

#9 (Jul 28, 2021, 9:50:18 PM)

  1. introducing option to norm similarity columns and to weight by importance (for the start) — carsten.draschner / githubweb
  2. introduce all weightning factors and finally aggregate similarity score weighted over all features and so on — carsten.draschner / githubweb
  3. reduce distsim dataframe so only unique pairs stay — carsten.draschner / githubweb
  4. first data gathering for semantification — carsten.draschner / githubweb
  5. outline todos for semantification — carsten.draschner / githubweb

#8 (Jul 26, 2021, 4:33:35 PM)

  1. calculate similarity values and join those into one df s.t. we can later aggregate those — carsten.draschner / githubweb

#7 (Jul 26, 2021, 12:30:09 PM)

  1. string column hanfling by default pipleline as current fallback non implemented word2Vec — carsten.draschner / githubweb

#6 (Jul 26, 2021, 11:29:59 AM)

  1. handling of categorical strings transformed over hashing and IDF (Information Content) weightning — carsten.draschner / githubweb

#5 (Jul 26, 2021, 10:11:24 AM)

  1. started handling of different feature types — carsten.draschner / githubweb
  2. started soe playground for word2vec in spark — carsten.draschner / githubweb
  3. clean up — carsten.draschner / githubweb

#4 (Jul 14, 2021, 11:40:59 PM)

  1. align design to be transformer conform — carsten.draschner / githubweb
  2. integrate smart feature extrator into novel dasim pipeline — carsten.draschner / githubweb
  3. cast numeric values to doubles — carsten.draschner / githubweb
  4. offer first unit test for smartFeatureExtractor — carsten.draschner / githubweb
  5. play a bit around and structure — carsten.draschner / githubweb
  6. also show schema — carsten.draschner / githubweb

#3 (Jul 13, 2021, 11:35:29 PM)

  1. automatic cast dataframe to correpsonding litreal type and split features if needed by their respective datatype — carsten.draschner / githubweb
  2. bring all information to transform — carsten.draschner / githubweb
  3. make components more compact and broader documentation — carsten.draschner / githubweb

#2 (Jul 10, 2021, 1:41:41 PM)

  1. Bumped to a snapshot version of aksw-commons — Claus Stadler / githubweb
  2. first distsim on lmdb with minhash betweeen movies for promising candidates — carsten.draschner / githubweb
  3. play around with different feature extractors and further use DistRDF2ML modules for feature extraction — carsten.draschner / githubweb
  4. start wih pivot nbased feature extracting transformer — carsten.draschner / githubweb