This is a beta version, any bugs please contact me.

Alternative Address: mirror1


How to Cite StructureSelector?

Li YL, Liu JX (2018) StructureSelector: a web based software to select and visualize the optimal number of clusters using multiple methods.Molecular Ecology Resources, 18:176–177. [link]
     And cite the corresponding methods you used.

How to Run StructureSelector?

  • STRUCTURE
    1. Run STRUCTURE program by using different K with certain replicates of different seeds (NB: K should be continuous if you want to calculate DeltaK);
    2. Compress the results (suffix is _f, in the Results folder) into zip format;
    3. Upload the zip files;
    4. Click Run!
  • ADMIXTURE
    1. Run ADMIXTURE by using different K;
    2. You can try certain replicates of different seeds by running such command in linux shell: e.g. To run 10 replicates of each K up to a maximum K value of 6.
      for k in `seq 1 6`; do for i in `seq 1 10`; do cp infile.ped infile.run_$i.ped && admixture --cv -j4 -s $RANDOM infile.run_$i.ped $k \
      | tee infile.run_$i.$k.log && rm infile.run_$i.ped; done; done

      zip -j -q admix_results.zip *.Q *.log #This is what you need to upload
      A shell script run_admix.sh ro run this.
    3. Compress the results (suffix is .Q) into zip format (NB: log files, suffix is .log, e.g. XXX.1.log, are needed if you want to get the likelihood of each run. Structure Selector can also calculate DeltaK based on likelihoods of certain replicates, and print out CV error if available based on the log files);
    4. Prepair a popmap file, or input a vector of population size (e.g. 20,30,40, which means a total of 90 individuals, and the first 20 individuals belong to a population, the next 30 individuals belong to a population......);
    5. Upload the zip and popmap files;
    6. Click Run!
  • fastStructure
    1. Similar to ADMIXTURE:
    2. Run fastStructure with different K;
    3. e.g. try different seeds by running such command in linux shell:
      for k in `seq 1 6`; do for i in `seq 1 10`; do python structure.py -K $k --input=infile --output=infile.run_$i --cv=10 \
      --seed=$RANDOM; done; done

      zip -j -q faststr_results.zip *.meanQ *.log #This is what you need to upload
    4. Compress the results (suffix is .meanQ) into zip format;
    5. Prepair a popmap file, or input a vector of population size;
    6. Upload the zip and popmap files;
    7. Click Run!
  • Other Q-matrix
    1. Rename Q-matrix files like XXX.k.Q, e.g. inputfile.2.Q (2 in here refer to the number of clusters);
    2. Compress the results into zip format;
    3. Prepair a popmap file, or input a vector of population sizes;
    4. Upload the zip and popmap files;
    5. Click Run!

How to Change Options?

  • Threshold (0.5-1) for Puechmaille method
  • The default threshold is 0.5, you can also input a vector of threshold values that are seperated by ;, e.g. 0.5;0.6;0.7;0.8, to try different thresholds (recommended).
  • Group option for Puechmaille method
  • For STRUCTURE, the default groupping options are extracted from the f files when the predefined population options were turned on (check the box "Putative population origin for each individual") in STRUCTURE.
    Or you can upload a popmap file (the order of individuals must be identical to that in Q or f files of input).
    Or you can input a vector of population sizes that are seperated by ,, e.g. 20,20,20,20 (which means a total of 80 individuals, and the first 20 individuals belong to a population, the next 20 individuals belong to a population...), to try different groupping options. This option is similar to Puechmaille's R script [link].
    For ADMIXTURE, fastStructure or other Q-matrix files, a popmap file or vector of population sizes are needed.
    Note: at least two groups (populations) are needed for Puechmaille's methods.

Can StructureSelector handle multiple datasets at one time?

Yes, Structure Selector support multiple datasets in one submit.
There are two types of multiple datasets that are currently supported by Structure Selector (we recommend to use the first type):
  • Multiple datasets are in different subfolders
  • Put the results of multiple datasets in different subfolders;
    Compress all these subfolders into one zip file;
    Upload files and check the box "Input Zip File Contains Multiple Datasets?";
    Run!
    NB: Different datasets are recognized by names of the folder. Only two levels of folders are allowed, say the result files are in the subfolders of second level
    Results of multiple datasets should be the same format.
  • Multiple datasets are in one folder with different file names
  • Compress all the files into one zip file;
    Upload files and check the box "Input Zip File Contains Multiple Datasets?";
    Run!
    NB: Different datasets are recognized by file names, testing functions, deprecated.

CAS Key Laboratory of Marine Ecology and Environmental Sciences
Institute of Oceanology, Chinese Academy of Sciences
7 Nanhai Road, Qingdao 266071
Shandong, China
P: (86) 0532-82898894